Creating pattern extraction models

Pattern recognition and extraction in Pega Platform help you to detect all entities whose structure matches a pattern that you define. For example, you can detect and mark strings that contain the at sign (@) and .com as email addresses.

Make sure that the system locale language settings are set to UTF-8.

You write pattern extraction models by using the Apache Ruta script language. For more information, see the official Apache UIMA Ruta Guide and Reference online help. For an example use case, see Creating entity extraction rules for text analytics on the Pega Community.

In the navigation panel of Prediction Studio, click Predictions.
In the header of the Predictions work area, click New > Text extraction.
In the Create Text Extraction Model window, enter the name of the text extraction model.
In the Creation section, select Rule.
In the Language section, expand the drop-down list and select a language for the model.
In the Template field, expand the drop-down list and select a template that contains Apache Ruta script that you can modify.

Note: Use the provided templates only as the starting point for creating your own pattern extraction models.
In the Rule script field, modify the provided Apache Ruta script to create a custom pattern extraction model.
In the Save model section, finalize the creation of the pattern extraction model by providing its application context:
- To use the default rule context for decision data rules that contain sentiment analysis models, select Use default context.
- To specify the Applies to class, ruleset, and ruleset version parameters of the new rule, select Specify context.
Click Create.

Test your model on real-life data.