Tutorial: Configuring a topic detection model for discovering keywords
Create a model for matching a piece of text to a predefined set of topics, that is based on a taxonomy of topics and keywords of different types. Use topic detection to classify text into semantic categories that are related to various domains, for example, customer support or complaint routing.
The uPlusTelco company releases a new product called uPlusPhone10. The company wants to track social media responses to the release of this product and determine which uPlusPhone10 features are most popular.
Creating a topic detection model rule
Use Prediction Studio to configure a uPlusPhone10 topic detection model.
- In Prediction Studio, click New and select Text Categorization.
- In the Create Text Categorization Model window, set the model parameters:
- Name: uPlusPhone10
- Detection type: Topic
- Creation: Use category keywords
- Language: English
- Select the context of the model by specifying the applicable class, ruleset, and ruleset version.
- Click Create.
Creating a topic detection model in Prediction Studio
Defining a taxonomy
After you create a rule that contains a topic detection model, define a taxonomy of topics and associated keywords. Each topic represents a category into which you can classify text. uPlusTelco wants to classify information based on the phone's features, performance, and specifications.
- Add a parent topic by clicking Add top-most.
- Enter Features and click Create.
- Repeat steps 1 and 2 to create the Performance and Specifications parent categories.
- Add child topics that correspond to the features of uPlusPhone10, such as applications, camera, connectivity, and games, by performing the following actions:
- Select the Features topic.
- Click Manage > Add child.
- Enter the name of the new topic and click Add.
- Repeat step 4 to create additional child topics, as shown in the following example:
- For each topic, add keywords that are specific to that topic by performing the following actions:
- Select a topic.
- Optional: For child topics to use keywords that are specific to the parent topic, select Allow sub topics to inherit keywords. For example, if you select this option for the Features topic, the Applications, Camera, Connectivity, and Games topics cannot be detected unless the phrase uPlusPhone10 is detected first, as shown in the following example:
If a parent topic has an empty list of keywords, the topic detection model then automatically finds a match among the child topics.
- Specify Should words, Must words, And words, and Not words that apply to the Camera, Connectivity, and Games topics. For more information about keyword categories, see Defining a taxonomy.
- Click Save.
- Test your taxonomy by performing the following actions:
- Click Actions > Test.
- In the Test window, paste a piece of text in the Sample text box, for example, uPlusPhone10 handset and its applications are quite amazing!
- Click Testand view the results.
Always test your taxonomy to ensure that the text analytics produces the expected results and, if needed, improve the taxonomy by adding more categories or modifying keywords. For example, can you improve the uPlusPhone10 taxonomy to correctly classify the sentence I wish uPlusPhone10 came with more games as belonging to the Features > Games topic instead of to just the Features topic?
See the following example of creating a taxonomy for topic detection:
You created a topic detection model in Prediction Studio and defined a taxonomy of topics with the associated words and phrases that are specific to each topic. You tested the model by applying it to real-life samples and identified areas of improvement to boost the accuracy of your model.
Build a machine-learning (ML) topic detection model. In Pega Platform™, you can use a keyword-based topic detection model in association with an ML-based model to maximize the accuracy and reliability of topic detection. For more information, see Creating machine learning-based topic detection models and Best practices for creating categorization models.
Published June 15, 2018 — Updated October 16, 2018