Reviewing the taxonomy for machine learning topic detection
Verify the correctness of the taxonomy of topics that Prediction Studio generated from the training data. If you updated an older version of a model, the taxonomy might include topics from that version. Clean up your model by deleting topics that have no training data, and improve the model's predictions by adding keywords.
Keywords influence the behavior of a machine learning model, but they are not exact rules. The "Should," "Must," and "And" words act as positive features for matching a text to a topic, while the "Not" words act as negative features. The training and testing data have the greatest impact on your machine learning model, while keywords have a smaller impact.
You cannot add topics in this step. If you want to add topics, go back to the Source selection step. For more information, see Uploading data for training and testing of the topic detection model.
In the Taxonomy review wizard step, review the taxonomy details, and then expand the taxonomy to view the topics.The hierarchy of the taxonomy is used to group topics. Do not add training data or keywords to grouping topics.
Review the summary of training and test data for individual topics by selecting the topics in the list.
To add positive or negative features for matching a text to a topic, add keywords to the topic:
Select the topic, and then click the Manage keywords tab.
In the Keywords section, enter keywords to influence the model's predictions.Keywords can be words or phrases. You can enter several keywords in each category.
- Should words
- phone telephone mobile
- And words
To delete topics that do not contain any training data, select a topic, and then click Delete.Topics without any training data might appear in the taxonomy when you start with a keyword-based model, and then update it to a machine learning model. If the training data that you use to train the new model contains a smaller number of topics than the original keyword-based model, only that number of topics get trained, and the remaining topics are without training data.