Text extraction analysis

Updated on July 5, 2022

Text extraction analysis is the process of extracting named entities from unstructured text such as press articles, Facebook posts, or tweets, and categorizing them. Typically, a named entity is a proper noun that falls into a commonly understood category such as a person, organization, or location. An entity can also be a Social Security number, email address, postal code, and so on.

Auto tags

You can configure a Text Analyzer to automatically detect and mark the most important concepts that are expressed in a document. This option is useful when you want to tag a document with the most relevant words or phrases, create word clouds, or perform faceted search according to semantic categories.

Summarization

You can generate an extractive summary from a large body of text, such as a business report or an email. By using summaries, you can make important business decisions without reading complete documents. Instead, you can examine the summary and the context of the text in the form of extracted topics, entities, intents, and the sentiment.

Text extraction

You can extract keywords and phrases from unstructured text through entity types. An entity type is a keyword or phrase that denotes a person name, organization, location, and so on. You can group similar or related entity types into models.

For each entity type, you can combine the following detection methods for versatile and robust location and classification.

Keywords-based text extraction: You can specify the list of key terms and their synonyms that belong to a particular domain. For example, you can create a list of keywords to track social media messages that pertain to the latest release of a product or a group of products of your competitor.
Pattern extraction: Use pattern extraction models to extract entities whose structures match a specific pattern, for example, postal codes, case numbers, email addresses, and so on. You can select one of the default pattern extraction samples or create custom patterns through the Rule-based Text Annotation (Ruta) script language.
machine learning models: Use machine learning to identify and classify named entities in text. You can select one of the default entity extraction models or create custom models in Prediction Studio by using the conditional random fields (CRF) algorithm.

Configuring text extraction analysis

Previous topic Configuring sentiment analysis
Next topic Configuring text extraction analysis

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Visit the Support Center

Get Started with Community

Text extraction analysis

Auto tags

Summarization

Text extraction

Related articles

Have a question? Get answers now.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

Get Started with Community

Auto tags

Summarization

Text extraction

Related articles

Have a question? Get answers now.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

We'd prefer it if you saw us at our best.