This content has been archived and is no longer being maintained.

Table of Contents

Article

Improved Free Text Model rule type

The Free Text Model rule type provides multiple enhancements that extend its functionality and improve its performance in Pega 7.2.1.

These enhancements include:

  • Introducing the conditional random field (CRF) as the method for named-entity extraction. The CRF increases the precision of entity recognition and reduces the number of false positives.
  • Support for adding custom entities as Rule-based Text Annotation (Ruta) scripts.
  • Introducing the playback option to retrieve any number of tweets from a particular time period.
  • Introducing the time-based retrieval of posts in Facebook data sets.

Integration of CRF in entity extraction

The Pega 7 Platform now uses the conditional random field (CRF) method for named-entity recognition instead of the OpenNLP method. The CRF method uses a sequence classifier that can perform entity extraction with greater accuracy. It also reduces the number of false positives in entity extraction analysis compared to OpenNLP. The default entity extraction models that support the CRF are pyLocation, pyOrganization, and pyPerson.

Support for custom entities

You can now add custom entities to a Free Text Model rule to create and import entity extraction rules for entities that are part of a specific dictionary (for example, a certain product offering or bundle) or match a certain pattern (for example, a specific type of identity numbers). Adding custom entities eliminates the need to train entity extraction models for such entities, which can be a time-consuming and complex process. You can create each custom entity extraction rule as a Rule-based Text Annotation (Ruta) script and import it into the Pega 7 Platform as part of decision data. The following entity extraction rules are available by default:

  • pyDate
  • pyEmail
  • pySalutation
  • pySSN

Entity extraction rules in a Free Text model

Entity extraction rules section in the Free Text Model rule form

Testing entity extraction rules

Detection of custom entities by a free text model rule

The playback option for Twitter data

You can now use Twitter data sets to retrieve tweets by using the new playbackoption. When you select the playback option, you can define the time period for which you want to retrieve Twitter historical data. You can also specify the maximum number of tweets that you want to retrieve. Use the playback option if a streaming data set fails for any reason, whether because of the API disconnecting, server failure, or for any other scenario where tweets cannot be retrieved in real time.

You cannot run a data flow in real time if the data set that is configured as the source of that data flow has the playback option selected.

Twitter data set

Data recovery options in a Twitter data set

Time-based retrieval of posts from a Facebook data set

You can limit the retrieval of posts by a Facebook data set by using the new search functionality to retrieve Facebook posts that were submitted within a specific period of time instead of all posts that have been submitted since the Facebook page was created. This solution gives you more control over the amount of data that you want to retrieve from a Facebook data set. You can also limit the amount of data that you want to retrieve for failure recovery purposes. For example, if a system outage lasted for one hour, you can configure the data set to retrieve only the posts that were submitted within the last hour.

Search functionality in Facebook data set

Time-based retrieval of posts from a Facebook data set

You can test the data extraction functionality from the Actions menu by clicking Run.

Published May 17, 2016 — Updated August 23, 2018


0% found this useful

Related Content

Have a question? Get answers now.

Visit the Pega Support Community to ask questions, engage in discussions, and help others.