Analyzing natural language with text analytics
Humans can effortlessly interpret a single tweet, but we cannot do it efficiently over a large volume. Businesses are exploring ways to use machine learning to extract meaningful information from a large volume of text messages. These insights help improve business performance and customer experience.
Pega Platform™ provides the following techniques that you can use to process and structure text data from the Facebook, Twitter, and YouTube social media platforms:
- Categorization models – The models that assign incoming text to a predefined category. You can build the following types of categorization models in Pega Platform:
- Sentiment Analysis – Detect and analyze the feelings (attitudes, emotions, opinions) that characterize a unit of text.
- Topic Detection – Assign one or more classes or categories to a text sample to make that text easier to manage and sort. You can build machine-learning models or you can create rule-based models by creating taxonomies.
- Intent analysis – Determine a user's intent in social media posts, comments, messages, emails, and so on, to find out whether that user is likely to subscribe to your services, or buy your products.
- Text extraction models – Extract named entities from text data and assign them to predefined categories, such as names of organizations, locations, people, quantities, or values. You can perform the following types of text extraction:
- Rule-based text extraction – Extract entities that follow a specific text pattern, for example, dates, account numbers, emails, and so on. Use Apache RUTA scripts to build rules for text extraction.
- Model-based text extraction – Build a supervised Conditional Random Fields machine-learning model.
- Keywords-based text extraction – Identify a set of keywords that pertain to your use case, for example, different names of the same product.
- Auto-tags – Dynamically identify the phrases that capture the essence of the text.
To enable text analysis in your application, configure and customize the underlying infrastructure in the form of data sets, text analyzers, and data flows.
Try text analytics with NLP Sample
Use the NLP (Natural Language Processing) Sample application to explore the natural language processing capabilities of Pega Platform. The application is available as an archive that you can download and install. The application includes the following components:
- The NLP Sample portal.
- A set of rules that constitute the text analytics infrastructure in your application, including a sample text analyzer that supports sentiment, classification, entity extraction, and intent analysis.
- A collection of taxonomies that demonstrate various use cases for classification analysis, for example, telecom, banking, customer service, and automobile.
You can use the provided rules and text samples for text analysis or you can configure your own rules and models to explore text classification, sentiment analysis, topic detection, and entity extraction.
Extracting entities in NLP Sample
Configure text analyzer
Configure Text Analyzer rules to process content that is extracted from social media (Twitter, Facebook, and YouTube), emails, chat-bot messages, databases, REST APIs, customer support tickets, and so on. Use a variety of tools (such as lexicons, taxonomies, machine learning models) to customize the sentiment, classification, entity extraction, and intent analysis that you want to apply to the text content that interests you.
Build machine learning models
Become a data scientist and use Pega Platform to employ machine learning in your application. Use the Decision Analytics work area to generate custom models for sentiment and classification analysis. Using a specialized wizard for model creation, you can perform the following actions:
- Upload the resources that are required to generate the models (for example, a corpus of documents such as emails or tweets, that has already been classified as having positive, neutral, or negative sentiment, for use as a training sample for sentiment model generation).
- Define the model details and the algorithm that the application uses to train the model: maximum entropy (MaxEnt), Naïve Bayes, or support vector machine (SVM). Depending on your approach and your business goal, you might want to use a specific algorithm type. For example, you can use the SVM algorithm for large sets of training data.
- Review the model configuration. Use a variety of measures, like F-score, precision, and recall to determine the accuracy of the model. You can also test the generated model against any number of test samples.
- Export the generated model or upload it to your application for use in text analyzers.
By uploading machine learning models as part of text analyzer rules, you can enhance the accuracy with which text analyzer rules detect sentiment or classify text.
Follow best practices to create accurate machine-learning-models in Pega Platform. For more information, see:
- Machine-learning models for text analytics
- Best practices for creating categorization models
- Best practices for creating extraction models
Create and configure data sets for text content
Retrieve the text content that interests you from a variety of sources. You can retrieve text content from Facebook, Twitter, and YouTube social media platforms, emails, or databases, and analyze it in your application. Additionally, by using stream data sets, you can extract instant messages from the WhatsApp service and access blog posts through the Webhose.io platform.
- About Data Set rules
- About Email Listener data instances
- Creating a Facebook data set
- Creating a YouTube data set
- Tutorial: Analyzing WhatsApp content in NLP Sample in real time
- Tutorial: Analyzing content from Webhose.io in NLP Sample in real time
Customize metadata retrieval
Optionally, if you are analyzing text content from Facebook, you can customize your social media data set to retrieve additional metadata, such as user verification information, profile pictures, icons, or other information that is relevant to achieving your business goal. You can configure the metadata retrieval criteria on the Social Media Metadata landing page that is available in applications that have access to the Pega-NLP ruleset.
Combine and process
Arrange the text analyzer and other rules that you created into a processing pattern of a data flow or a process flow.
Referencing a Text Analyzer rule in a data flow
Data flows offer a flexible solution for combining all your data points into a processing pattern that has a source from which the input data is taken and a destination to which the results are saved. Between the source and destination, you can apply various processing instructions in the form of different shapes. In a data flow that is designed for text analytics of Facebook, Twitter, or YouTube, you can reference a social media data set as the source, apply processing instructions in the form of text analyzers, and save the results into a database, activity, or a JSON file. You can also enrich your data flow with additional shapes, such as filters, to process only negative content or the content from the most influential users, and so on.
You can also reference a text analyzer rule in a process flow by using the Utility shape to analyze the text content of emails or customer support tickets.