Text analytics features in Pega Platform
The Pega Platform provides a collection of features that you can use to process and structure text data from social media platforms:
- Sentiment analysis – Detects and analyzes the feelings (attitudes, emotions, opinions) that characterize a unit of text, for example, to find out whether a product review was positive or negative.
- Classification analysis – Assigns one or more classes or categories to a text sample to make it easier to manage and sort.
- Intent analysis – Determines whether the content that you analyzed in your application was produced with any underlying intention, for example, whether a person is likely to buy your product or wants to complain.
- Entity extraction analysis – Extracts named entities from text data and assigns the detected entities to predefined categories such as names of organizations, locations, people, quantities, or values.
Depending on the language of the analyzed content, various Pega Platform features help you to obtain accurate analysis results.
To view the full list of supported languages and the corresponding supported features, see Text analysis features per language.
This spreadsheet contains the following columns for each language in the list:
- Sentiment model – Indicates whether you can configure and train a machine learning model for sentiment analysis.
- Stemmer – Indicates whether the process of reducing inflected or derived words to the word stem is part of the text analysis process. For example, for the words fishing, fished, and fisherman, the stem is the word fish.
- Lemmatizer – Indicates whether the process of grouping the inflected forms of a word so that they can be analyzed as a single item (lemma) is part of the text analysis process. For example, the word bad is the lemma of worse and worst.
Stemmer misses the mapping of worse to bad, which requires contextual knowledge about the language that is provided by the lemmatizer. The lemmatizer ensures that the analysis is more accurate, but it requires more real-time processing than the stemmer.
- Word-based – Indicates whether you can perform rule-based classification analysis by uploading a CSV file that contains a taxonomy as part of a Text Analyzer rule.
- Model-based – Indicates whether you can create and use a classification model for text analysis. You create a model by uploading a predefined set of text samples with assigned categories, selecting the training and testing sample, and training that model. You can upload your model as part of a Text Analyzer rule.
- Spelling checker – Indicates whether a Decision Data rule that contains a spelling checker is available.
- Entity extraction
- Rule/script/custom – Indicates whether rule-based entity extraction is available. By using this type of entity extraction, you can extract entities whose names are based on a certain pattern or that are part of a set of dictionary terms (for example, names of Apple products, company ID numbers, and so on).
- Model-based – Indicates whether model-based entity extraction is available. By using this type of entity extraction, you can obtain, from the text, the entities whose names are not limited by any patterns or dictionaries (for example, names of organizations, people, and so on).
- Intent analysis – Indicates whether intent analysis is available.
Published February 9, 2017 — Updated November 20, 2018