This content has been archived.

Installing the Pega OCR component

The Optical Character Recognition (OCR) component allows the system to analyze text contained in image-based email attachments. You use this capability in an Pega Email Bot™ to improve the text analysis of emails from users. The Pega OCR component obtains content from image PDF, JPG, PNG, and TIFF files and converts it into electronic text format. This text is then analyzed as though it were contained in the body of the email. The Pega OCR component also provides PDF file entity highlighting of analyzed documents in an Email Bot.

Installation procedure

You must install the Pega OCR component on premises on a Linux server running an instance of Pega Platform™. You first obtain the ABBYY FineReader Installation file and use it to install the ABBYY FineReader 12 application used for optical character recognition on a Linux server. You must also import the Pega OCR component to a Pega Platform instance running on the same Linux server. You obtain both the ABBYY FineReader installation file and Pega OCR component .zip file from the Pega Marketplace.

The ABBYY FineReader Installation (abbyy_installation_pegaXX_vYYYmmDD.zip) file available from Pega Marketplace consists of the following files and folders:

  • abbyy_installation.sh - The installation script used to install ABBYY FineReader 12.
  • license/*.locallicense - An open license for the ABBYY FineReader 12 application provided by Pegasystems.

The Pega OCR component .zip file available from Pega Marketplace consists of the following files and folders:

  • lib/pega.ocr.component.jar - The JAR file that you must import to Pega Platform. It contains a Pega library used to control the multi-threaded usage of ABBYY FineReader Engine and an ABBYY FineReader 12 library called com.abbyy.FREngine.jar.
  • component - The component that contains the integration bits to allow OCR capability in email file attachments.

Prerequisites

You use the abbyy_installation.sh script to install the ABBYY FineReader 12 application on the following Linux versions, only:

  • Ubuntu/Debian 18.04
  • CentOS 6.9 and 7.0

Although best effort was taken for system changes to be safe, ensure that you have a backup of the system before you run the abbyy_installation.sh script.

Performing the steps described in this tutorial requires basic knowledge of the Linux distribution platform. Consult your Linux administrator for additional help.

Installing ABBYY FineReader 12

You must install the ABBYY FineReader 12 on a Linux server for each Pega Platform instance using an installation script. You can run this script with the following optional additional parameters:

  • -u - Preconfigures the Linux environment for ld.so.config (LD SO config) that is used to hold path settings which point to directories that hold dynamic libraries. LD SO config path modifications are required by ABBYY FineReader 12 so that the application can find all its native libraries in the <path>/ABBYY/Bin folder.
  • -i - Installs the Microsoft TrueType fonts and required libraries.
To access the help for the ABBYY FineReader 12 installation program, run the following command: ./abbyy_installation.sh -h
  1. Log in to a Linux server that is running a Pega Platform instance using the Secure Shell (SSH) protocol.
  2. Extract the files in the abbyy_installation_pega81_vYYYYmmDD.zip file obtained from Pega Marketplace to a directory.
  3. Run the following script as root in one of the following two ways. You must set the installation path so that the web applications of the app server have read access to this path.
    • With optional parameters:./abbyy_installation.sh -c install -u -i -d <abbyy_installation_path>
    • Without parameters:./abbyy_installation.sh -c install -d <abbyy_installation_path>
    When you run the abbyy_installation.sh script with the optional -u or -i parameters, the script modifies the system configuration. If you run the script without the -u and -i parameters, you must install the Microsoft TrueType fonts and update the LD SO config manually. For more information, see Updating LD SO config path manually and Installing Microsoft TrueType fonts manually.
  4. Check whether the installation was successful:
    1. If there are dependency problems, trace what package needs the required dependency and install it.
    2. After you fix all the dependency problems, run the health check of the installation:
      ./abbyy_installation.sh -c check -d <abbyy_installation_path>
  5. Create ABBYY FineReader 12 data and temp folders:
    1. To create the default data directory for ABBYY FineReader 12, run the following command:
      mkdir -p "/var/lib/ABBYY/SDK/12/FineReader Engine"; chown -R <user>:<group> "/var/lib/ABBYY"
      where the <user> and <group> string above must be updated to reflect the Java process owner.
    2. To create the default temp directory for ABBYY FineReader 12, run the following command:
      mkdir -p "/tmp/ABBYY FineReader Engine 12"; chown <user>:<group> "/tmp/ABBYY FineReader Engine 12"
      where the <user> and <group> string above must be updated to reflect the Java process owner.
  6. Restart your Linux system so that Tomcat server configuration is refreshed and the LD SO config changes are applied.
    You may postpone a retstart and perform this after performing a library import that is done in the next section of this guide.
  7. Repeat steps 1 through 6 for each Linux server that contains a Pega Platform instance.

    Installing the Pega OCR component

    Before using the OCR capability in Pega Platform and an Email Bot, install the Pega OCR component for a Pega Platform instance. If the Pega OCR component is not available in Pega Platform, import the component from Pega Marketplace to Pega Platform first.

    1. Log in to Pega Platform.
    2. In Dev Studio, click the name of your application, and click Definition.
    3. In the Enabled components section, make sure that the Pega OCR component is displayed in the list.
      Thumbnail
      Enabled components section - Application rule form
    4. If the Pega OCR component is not listed, perform the following steps to install it.
      1. Click Manage components.
      2. In the Available components section, select the Enabled check box for the Pega OCR component.
        If the Pega OCR component is not displayed in the section:
        1. Obtain a .zip file for the component from Pega Marketplace, for example: Pega OCR Component.zip.
        2. Extract the .zip file contents to get access to the /component folder.
        3. Click Install new to install the file on Pega Platform. If the installation is successful, the component is displayed in the Available components section.
        4. Select the Enabled check box for the Pega OCR component.
      3. Click OK. The Pega OCR component is displayed in the Enabled components section.
      4. Click Save.
    5. Import the pega.ocr.component.jar file that is part of the .zip file that you obtained from Pega Marketplace:
      1. In Dev Studio, click Configure > Application > Distribution > Import.
      2. Click Local file and then click Browse and select the pega.ocr.component.jar file from a directory.
        Thumbnail
        Application Import wizard
      3. Click Next and follow the Import wizard instructions to import the JAR file to the Customer 06-01-01 codeset rule.
    6. Restart the Pega Platform instance to make sure that the imported JAR file is visible in the classpath.
      Make sure to do this last step, otherwise the Pega OCR component will not be fully installed.

    Verifying Pega OCR component installation

    To verify the configuration of the Pega OCR component installation files in Pega Platform:

    1. Log in to Pega Platform.
    2. From the App explorer, search for the Data-AbbyyFineReader rule, and in the Data Model > Data Transform section, open the configureAbbyyFREngine rule.
    3. Verify the following parameters:
      • Param.abbyySdkPath - Specifies the path to the /Bin folder of the ABBYY FineReader 12 installation, for example: <abbyy_installation_path>/FREngine12/Bin.
      • Param.abbyyLicensePath - Specifies the path to the license that is provided by Pegasystems, for example: <abbyy_installation_path>/licenses/pega.locallicense. The license file is automatically installed when you run the script during ABBYY FineReader 12 installation.
      • Param.abbyyDataFolder - Specifies the full path to the ABBYY FineReader 12 data folder. It is created and managed by ABBYY FineReader. The default path is: /var/lib/ABBYY/SDK/12FineReaderEngine.  Make sure that the Java process has read and write access rights to this folder.
      • Param.abbyyTempPath - specifies the full path to the ABBYY FineReader 12 temporary folder. The default path is the following: /tmp/ABBYY FineReader Engine 12. Update the value if you want to use another directory. Make sure that the Java process has read and write access rights to this folder.
      Thumbnail
      configureAbbyyFREngine Data Transform rule
    4. If you modified the configureAbbyyFREngine rule, save the ruleset to your application and check in the changes.

    To verify that the Pega OCR component installation was successful, check whether you can use the OCR capability with an Email Bot.

    1. Log in to Pega Platform.
    2. Create an Email channel to test the Pega OCR component. For more information, see Configuring the Email channel.
    3. In the Additional settings section of the Configuration tab, in the Analyze email attachments list, click Always.
    4. Click Save.
    5. Send a test email that also contains a PDF file attachment with OCR content to the operator email account defined for the Email channel and verify that entities were extracted from the PDF file.

    Troubleshooting Pega OCR component installation

    Failed to compile generated Java in ABBY error

    If you see an error in the logs or tracer that states that the system failed to compile the generated Java in ABBY, make sure you perform step 6 in the Installing the Pega OCR component procedure and Pega Platform is restarted.

    Other errors

    If the test email that you sent was not analyzed correctly or the verification steps described above fail, examine the Pega Platform logs to obtain more information about the problem. Search the Pega Platform log files for the AbbyyFineReader entries.

    Updating LD SO config path manually

    To confgure ld.so.config path manually, run the following command as root or using sudo:

     echo “<local_path>/ABBYY/Bin" > /etc/ld.so.conf.d/abbyy.conf && ldconfig

    Make sure to replace the <local_path> string above with a path to the /ABBYY/Bin folder.

    Perform this step only if you run the abbyy_installation.sh script without the -u and -i parameters.

    Installing Microsoft TrueType fonts manually

    To install Microsoft TrueType fonts manually on a Ubuntu/Debian server, run the following commands as root or using sudo:

    1. apt-get update
    2. apt-get install ttf-mscorefonts-installer -y --force-yes

    On Ubuntu/Debian server, if you see error messages during installation, for example: 'library.so -> dependency libgomp1 not found', you must also install libgomp1. Run the following command as root:
    apt-get install libgomp1

    To install Microsoft TrueType fonts manually on a CentOS server, run the following commands as root or using sudo:

    1. yum install epel-release -y
    2. yum install curl cabextract fontconfig -y
    3. yum install https://downloads.sourceforge.net/project/mscorefonts2/rpms/msttcore-fonts-installer-2.6-1.noarch.rpm -y
    Perform these steps only if you run the abbyy_installation.sh script without the -u and -i parameters.
    Suggest Edit

    100% found this useful

    Have a question? Get answers now.

    Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.