Table of Contents

Modifying Apache Ruta scripts to extract custom structured entities


Only available versions of this content are shown in the dropdown

After you create a Decision Data rule for entity extraction, customize the existing Apache Ruta script to adjust it to your business needs.

  1. On the Data tab of the Decision Data rule, click Open rule to access Apache Ruta script.

  2. Modify the existing script based on the following example:

    • PACKAGE uima.ruta.example;
    • DECLARE VarA;
    • DECLARE VarB;
    • DECLARE VarC;
    • DECLARE VarD;
    • NUM{REGEXP("(^[0-9]{4})") -> MARK(VarA)}
    • ANY?
    • NUM{REGEXP("(^[0-9]{4})") -> MARK(VarB)}
    • ANY?
    • NUM{REGEXP("(^[0-9]{4})") -> MARK(VarC)}
    • ANY?
    • NUM{REGEXP("([0-9]{4})")-> MARK(VarD),MARK(EntityType,1,7), UNMARK(VarA), UNMARK(VarB), UNMARK(VarC), UNMARK(VarD)};

    Key points from the preceding code example:

    • DECLARE VarA; declares an entity to annotate. In this use case, four strings of numbers that are separated by a delimiter character are needed; therefore four declare statements are included.
    • NUM{REGEXP("([0-9]{4})") -> MARK(VarB)} detects a single character between 0 and 9, repeated four times. When a match is found, the entity is marked. Note that the caret character (^) in the first regular expression asserts that the entity is marked only when its position is at the beginning of the string.
    • ANY? detects whether the entity is separated by any delimiting character, for example, a hyphen (-) or a semicolon (;).
    • MARK(EntityType,1,7) merges all annotations (VarA, ANY?, VarB, ANY?, VarC, ANY?, VarD) into a single entity. For an entity to be detected, matches must be found for all enumerated regular expressions.
    • UNMARK(VarD) unmarks the matched annotation to prevent an overlap with the entity that resulted from the merged annotations. For more information about regular expressions and basic token hierarchy in Apache Ruta scripts, see Apache UIMA Ruta Guide and Reference.
  3. Click Save.

  4. Test whether the script that you entered produces the expected results:

    1. On the Data tab of the Decision Data rule, click Test.

    2. In the Test window, enter or paste your sample text, and then click Test. If your custom script is correct, the detected entity is displayed in the Entity extraction section at the bottom of the Test window.

    Testing entity extraction
In this tutorial, you created a Decision Data rule from an existing one. You also edited the attached Apache Ruta script to extract entities of a specific type to satisfy your business need of finding account numbers in the analyzed text.

    Have a question? Get answers now.

    Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.