Text analytics

You can use Pega Platform to analyze unstructured text that comes in through different channels: emails, social networks, chat channels, and so on. You can structure and classify the analyzed data to derive useful business information to help you retain and grow your customer base.

Model types

Pega Platform provides the following types of models:

  • Categorization models that assign text to a predefined category. The following types of categorization models are available:
    • Topic detection models – Detect the underlying topic of the document. Supports machine learning and rule-based classification that is based on taxonomy keywords. For example, topic detection can determine that the sentence My uPlusTelco laptop is not working, need help! belongs to Customer Service > User Support category.
    • Sentiment analysis models – Detect positive, neutral, or negative sentiment. Support machine learning.
    • Intent detection – Assign intents to text input. For example, intent analysis can detect whether the analyzed text is a complaint or an inquiry.
  • Text extraction models that extract named entities and assign them to predefined categories such as names of organizations, locations, people, and so on.
    • Rule-based text extraction models– Extract entities that follow a specific text pattern, for example, dates, account numbers, emails, and so on. Use Apache RUTA scripts to build rules for text extraction.
    • Machine-learning text extraction models – Build a supervised Conditional Random Fields machine-learning model.
    • Keywords-based text extraction models – Identify a set of keywords that pertain to your use case, for example, different names of the same product.
    • Auto-tags – Dynamically identify the phrases that capture the essence of the text.

Model deployment

You can deploy the models that you built by using Text Analyzer rules. A text analyzer parses text, automatically recognizes the language, and processes the models. A Text Analyzer rule may refer to one or more models or the methods that are listed above.

Considerations for training machine-learning models

To train a machine-learning text analytics model, you must upload training data with sample texts and associated outcomes. For example, for sentiment analysis, these sample records must be associated with a positive, neutral or negative outcome. For text categorization, the outcome must be one of the categories in the taxonomy, and so on. In the process of creating a model, the data is split into a training sample and a test sample. The training sample is used to train the model. The test sample is the hold-out sample that is used to validate the model. When a model is built, you can validate its performance.