Table of Contents

Defining the training set and training the entity extraction model


Only available versions of this content are shown in the dropdown

In the Sample construction step of the entity extraction model creation wizard, select the data to use to train the model and the data to use to test the model's accuracy. In the Model creation step, build the model.

During the training process of an entity extraction model, the Conditional Random Fields (CRF) algorithm is applied on the training data and the model learns to predict labels. The data that you designate for testing is not used to train the model. Instead, Pega Platform uses this data to compare whether the labels that you defined (for example, Person, Location, and so on) match the labels that the model predicted.

  1. If you want to keep the split between the training and testing data as defined in the file that you uploaded, in the Construct training and test sets using field, select User-defined sampling based on "Type" column.

  2. If you want to ignore the split that is defined in the file and customize that split according to your business needs, perform the following actions:

    1. Select Uniform sampling.

    2. In the Training set field, specify the percentage of records that is randomly assigned to the training sample.

  3. Click Next.

  4. In the Model creation step, make sure that the Conditional Random Fields check box is selected.

  5. Click Next.

    The model training and testing process starts.
Did you find this content helpful?

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.