In the Sample construction step of the text extraction model creation wizard, select
the data to use to train the model and the data to use to test the model's accuracy. In the
Model creation step, build the model.
During the training process of a text extraction model, the Conditional Random Fields
(CRF) algorithm is applied on the training data and the model learns to predict labels. The
data that you designate for testing is not used to train the model. Instead, Pega Platform uses this data to compare whether the labels that you defined (for
example, Person, Location, and so on) match the
labels that the model predicted.
-
If you want to keep the split between the training and testing data as defined
in the file that you uploaded, in the
Construct training and test
sets using
field, select
User-defined sampling based
on "Type" column.
-
If you want to ignore the split that is defined in the file and customize that
split according to your business needs, perform the following actions:
-
Select Uniform sampling.
-
In the
Training set
field, specify the
percentage of records that is randomly assigned to the training
sample.
-
Click
Next.
-
In the Model creation step, make sure that the Conditional Random
Fields check box is selected.
-
Click
Next.
The model training and testing process starts.