Skip to main content


         This documentation site is for previous versions. Visit our new documentation site for current releases.      
 

This content has been archived and is no longer being updated.

Links may not function; however, this content may be relevant to outdated versions of the product.

Analyzing natural language with text analytics

Updated on May 6, 2022
Note: This chapter applies to Pega Platform™ versions 7.4-8.3. For information on text analytics in later versions of Pega Platform, see Analyzing natural language.

Humans can effortlessly interpret a single tweet, but we cannot do it efficiently over a large volume. Businesses are exploring ways to use machine learning to extract meaningful information from a large volume of text messages. These insights help improve business performance and customer experience.

Pega Platform provides the following techniques that you can use to process and structure text data from the Facebook, Twitter, and YouTube social media platforms:

  • Categorization models – The models that assign incoming text to a predefined category. You can build the following types of categorization models in Pega Platform:
    • Sentiment Analysis – Detect and analyze the feelings (attitudes, emotions, opinions) that characterize a unit of text.
    • Topic Detection – Assign one or more classes or categories to a text sample to make that text easier to manage and sort. You can build machine-learning models or you can create rule-based models by creating taxonomies.
    • Intent analysis – Determine a user's intent in social media posts, comments, messages, emails, and so on, to find out whether that user is likely to subscribe to your services, or buy your products.
  • Text extraction models – Extract named entities from text data and assign them to predefined categories, such as names of organizations, locations, people, quantities, or values. You can perform the following types of text extraction:    
    • Rule-based text extraction – Extract entities that follow a specific text pattern, for example, dates, account numbers, emails, and so on. Use Apache RUTA scripts to build rules for text extraction.
    • Model-based text extraction – Build a supervised Conditional Random Fields machine-learning model. 
    • Keywords-based text extraction – Identify a set of keywords that pertain to your use case, for example, different names of the same product.
    • Auto-tags – Dynamically identify the phrases that capture the essence of the text.

To enable text analysis in your application, configure and customize the underlying infrastructure in the form of data sets, text analyzers, and data flows.

For hands-on experience with Text Analytics in Pega Plaform, see the following module on Pega Academy: Text analytics for email routing.

Try text analytics with NLP Sample

Use the NLP (Natural Language Processing) Sample application to explore the natural language processing capabilities of Pega Platform. The application is available as an archive that you can download and install. The application includes the following components:

  • The NLP Sample portal.
  • A set of rules that constitute the text analytics infrastructure in your application, including a sample text analyzer that supports sentiment, classification, entity extraction, and intent analysis. 
  • A collection of taxonomies that demonstrate various use cases for classification analysis, for example, telecom, banking, customer service, and automobile.

You can use the provided rules and text samples for text analysis or you can configure your own rules and models to explore text classification, sentiment analysis, topic detection, and entity extraction.

Thumbnail

Extracting entities in NLP Sample

Learning natural language processing with NLP Sample

Configure text analyzer or text prediction

Configure text analyzer rules or, starting in release 8.6 text predictions, to process content that is extracted from social media (Twitter, Facebook, and YouTube), emails, chat-bot messages, databases, REST APIs, customer support tickets, and so on. Use a variety of tools (such as lexicons, taxonomies, machine learning models) to customize the sentiment, classification, entity extraction, and intent analysis that you want to apply to the text content that interests you. 

Build machine learning models

Become a data scientist and use Pega Platform to employ machine learning in your application. Use the Decision Analytics work area to generate custom models for sentiment and classification analysis. Using a specialized wizard for model creation, you can perform the following actions:

  • Upload the resources that are required to generate the models (for example, a corpus of documents such as emails or tweets, that has already been classified as having positive, neutral, or negative sentiment, for use as a training sample for sentiment model generation).
  • Define the model details and the algorithm that the application uses to train the model: maximum entropy (MaxEnt), Naïve Bayes, or support vector machine (SVM). Depending on your approach and your business goal, you might want to use a specific algorithm type. For example, you can use the SVM algorithm for large sets of training data.
  • Review the model configuration. Use a variety of measures, like F-score, precision, and recall to determine the accuracy of the model. You can also test the generated model against any number of test samples.
  • Export the generated model or upload it to your application for use in text analyzers.

By uploading machine learning models as part of text analyzer rules, you can enhance the accuracy with which text analyzer rules detect sentiment or classify text.

Follow best practices to create accurate machine-learning-models in Pega Platform. For more information, see:

Create and configure data sets for text content

Retrieve the text content that interests you from a variety of sources. You can retrieve text content from Facebook, Twitter, and YouTube social media platforms, emails, or databases, and analyze it in your application. Additionally, by using stream data sets, you can extract instant messages from the WhatsApp service and access blog posts through the Webhose.io platform.

Customize metadata retrieval

Optionally, if you are analyzing text content from Facebook, you can customize your social media data set to retrieve additional metadata, such as user verification information, profile pictures, icons, or other information that is relevant to achieving your business goal. You can configure the metadata retrieval criteria on the Social Media Metadata landing page that is available in applications that have access to the Pega-NLP ruleset.

Combine and process

Arrange the text analyzer and other rules that you created into a processing pattern of a data flow or a process flow.

Thumbnail

Referencing a Text Analyzer rule in a data flow

Data flows offer a flexible solution for combining all your data points into a processing pattern that has a source from which the input data is taken and a destination to which the results are saved. Between the source and destination, you can apply various processing instructions in the form of different shapes. In a data flow that is designed for text analytics of Facebook, Twitter, or YouTube, you can reference a social media data set as the source, apply processing instructions in the form of text analyzers, and save the results into a database, activity, or a JSON file. You can also enrich your data flow with additional shapes, such as filters, to process only negative content or the content from the most influential users, and so on.

You can also reference a text analyzer rule in a process flow by using the Utility shape to analyze the text content of emails or customer support tickets.

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega.com is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us