You are here: Decision Strategy Manager > Text Analytics > Configuring text analysis models > Defining text analysis sampling

Defining text analysis sampling

This is the second step of the Create text analysis models wizard. In this step, you upload a file containing the data that the system uses to train the text analysis model and determine its accuracy. The training data file consists of records. Each record is a text unit (for example, a tweet, Facebook comment, and so on) with associated data (like the expected result and the type of the record). The system classifies the records as training samples and test samples. The purpose of training samples is to help create and train the text analysis model. The purpose of the test sample is to validate the accuracy of the model that the system created.

  1. Click Choose File and select a .CSV, .XLS, or .XLSX extension file that contains the training data from your directory.
    Note: The training file must contain at least ten records. To ensure the best accuracy of the text analysis, the file should contain the following columns (each column represents a distinct type of information about each record):

    Note: You can download the data source template .XLSX file in which you can place the training data for the analysis.

  2. For the classification analysis: Select Use taxonomy data for training models if you want to include the taxonomy data in the training sample.
  3. Define the training sample details:
  4. Click Generate preview.
  5. In the PREVIEW SAMPLING section, review the Training sample and the Test sample tabs generated for the model. The preview shows up to ten results of the analysis.
  6. Click Next.

Previous: Defining the text analysis models

Next: Reviewing and creating text analysis models