Building machine-learning text extraction models
Use Pega Platform machine-learning capabilities to create text extraction models for named entity recognition.
- Define an entity model in which to accommodate the entities trained as a result of machine learning. For more information, see Creating entity models.
- Ensure that the system locale language settings are set to UTF-8.
- Specify a repository for text analytics models. For more information, see Specifying a database for Prediction Studio records.
By using models that are based on the Conditional Random Fields (CRF) algorithm, you can extract information from unstructured data and label it as belonging to a particular group. For example, if the document that you want to analyze mentions Galaxy S8, the text extraction model classifies that as Phone.
- Preparing data for text extraction
In the Source selection step of the text extraction model creation wizard, select the extraction type and provide the data for training and testing of your text extraction model.
- Defining the training set and training the text extraction model
In the Sample construction step of the text extraction model creation wizard, select the data to use to train the model and the data to use to test the model's accuracy. In the Model creation step, build the model.
- Accessing text extraction model evaluation reports
After you build the model, you can evaluate it by using various accuracy measures, such as F-score, precision, recall, and so on. You can view the model evaluation report in the application or you can download that report to your directory. You can also view the test results for each record.
- Saving the text extraction model
After the model has been created, you can export the binary file that contains the model to your directory and store it for future use. You can also create a specialized rule that contains the model. That rule can be used in text analyzers in Pega Platform.
- Analyzing natural language
Effortlessly analyze and extract meaningful information from large volumes of text with the use of text analytics. Based on your findings, you can further improve business performance and customer experience.
- Text analytics accuracy measures
Models predict an outcome, which might or might not match the actual outcome. The following measures are used to examine the performance of text analytics models. When you create a sentiment or classification model, you can analyze the results by using the performance measures that are described below.
- Testing text analytics models
You can perform ad-hoc testing of text analytics models that you created and analyze their performance in real-time, on real-life data.