This content has been archived.
Close popover

Troubleshooting agents

Summary

 Agents are internal processes on the server. They run activities in the background according to a schedule (rather than being called by user processing). Several issues can occur with agents:

  • The agent may not run at the appropriate time.
  • The activity called by the agent may not run correctly.
  • The agent and/or activity may run, but an error may occur.

You may see problems with an agent such as the following:

  • The service level rules aren’t working
  • History instances are not being created for work objects

Quick Links

Background

Access Groups

Problem Investigation - Version 4.2

Problem Investigation - Version 5.1

Troubleshooting Strategy

Additional Resources

Need Further Help?


 

 

Suggested Approach

Background

Agents are implemented through two classes:

Agent Queue rules (instances of Rule-Agent-Queue):  These rules specify the activities the agent runs and the interval, in seconds, at which it runs them.  NOTE:  There can be only  one Agent Queue rule in a RuleSet.

For most rules shipped in the Pega-*** RuleSets, if a developer wants to change the rule behavior, the recommended process is to perform a “Save As” on the rule, save it into a custom RuleSet, and make the changes there.  However, this can result in duplicate agent activities being run.  For agents, the recommended procedure is to disable the agent that you want to change, and then create an entirely new agent with the similar functionality in your own RuleSet.

Agent Schedule data objects (instances of Data-Agent-Queue):  For each Agent Queue, Process Commander generates one Agent Schedule for each node in the system.  These records are provided so that developers may customize the Agent Queue records in locked RuleSets.

NOTE:  These will be generated by the system; you should not copy or create these records. 

Three standard agents are shipped with the system:

RuleSet

Agent Queue

Activities

Pega-ProCom

Correspondence and SLA

Email_CheckIncoming

ProcessServiceLevelEvents

SendCorr

GetConvertedPDFsFromPDM*

Pega-IntSvcs

Report errors from COS DATABASE and delete old records

checkPrintErrors

checkFaxErrors

purgeRequestsTable

Pega-RULES

Core Engine Processing Agent

SystemCleaner

SystemPulse

SystemIndexer*

*Available in Version 5.1 only.

Top of Page


Access Groups -- Important

As with many other rules, it is possible to override some Agent Activities and create new versions with custom processing for a application.  However, just as with users, the developer must then give the agent access to the RuleSet where the custom activity was created; if access isn’t given to the agent for the custom activity, the agent will use the (shipped) activity it does have access to, and you will not observe the new processing. 

To give agents access to activities in other RuleSets, it is necessary to add an access group to the Agent Schedule record, including the RuleSet with the agent activity, or change the existing access group to include that RuleSet.  A standard agent access group, PegaRULES:Agents, is shipped with the system, and may be modified with additional Production RuleSets:

zzz

Agent access groups may be specified in:

  • the BATCH Requestor Type record
  • the Agent Schedule instance

As a best practice, enter the agent access group into the BATCH Requestor Type instance:

zzz

While it is possible to include access group information in the Agent Schedule, that is a much more manual system, and prone to user error.  In a standard enterprise-sized installation, separate nodes may be added and removed regularly.  The system will detect these new nodes and automatically create a new Agent Schedule record (among other objects), as a copy of the Agent Queue record.   If there is an access group specified in the Agent Queue record (instance of Rule-Agent-Queue), it will be automatically copied into the Agent Schedule (instance of Data-Agent-Queue).  However, if no access group exists in the Agent Queue, the system cannot fill in the access group for the Agent Schedule. 

Any processing run on this node before the developer changes the Agent Schedule record to include an access group may not have correct access.  Putting the access group on the overall BATCH instance guarantees the appropriate access group will always be applied.

NOTE:  Although the agents will use the access group specified in the BATCH requestor type instance, that access group will not automatically be filled in for each Agent Schedule record.

If you want to have special access granted to just one agent, you can use the Agent Schedule instance for that agent to override the standard access group specified in the BATCH instance.  As the Agent Schedule record is read after the BATCH Requestor Type instance (which is one of the first records read at startup), the access group specified in the Agent Schedule replaces the access group specified in the Requestor Type instance.

zzz

As stated earlier, however, this becomes a labor-intensive process in an enterprise-sized system; the developer must continually be alert to nodes being added into the system.

NOTE:  Beginning in Version 4.2 SmartBuild Release (SP2), it is possible to set up a Process Commander application as either RuleSet-based or Application-based (depending upon whether a Rule-Application instance was defined).  The Requestor Type form above shows an Application-Based Access Group; this instance could also be defined using a List of RuleSets and Rules:

zzz

In this case, instead of adding RuleSets the agents need access to in the access Group, the developer would add them directly to the Starting RuleSets list.  Alternatively, the developer could add them to the agent Access Group and add that Access Group to the Agent Schedule instances, but as explained above, that is manually labor-intensive.

Important Note:

At some sites, having created a successful Process Commander proof-of-concept or user-testing (UAT) application (generally on a single node), the development team then copies the test application over in its entirety to the production system. 

For agents especially, this can cause problems.  Since the production system is a “new” system to the application, a new Agent Schedule record is created for the new node, and any customizations made in the original Agent Schedule are lost.  The production system should be treated as a “new” system, and the access group defined on the BATCH requestor record (as described above). 

In addition, the original Agent Schedule instance from the test system, like all Agent Schedule instances, should not be copied.  As stated above, the Agent Schedule instances should be created by the system from information in the Agent Queue rules.  If the Agent Schedule instance are copied, unpredictable results occur.

Top of Page


Problem Investigation - Version 4.2

Checking the Status of an Agent

In Version 4.2, you can use the System Console (Monitor Servlet) to display the status of agents.

zzz

If there is a green check to the left of the agent entry, then that agent is active.  (It may be running, or it may be waiting to run, but it is “alive.”)  The red “X” to the left of the agent entry indicates that that agent is not active in the system.  Generally, when the red “X” displays for an agent, there will also be an error message in the Exception info column.

Top of Page


Tracing an Agent

NOTE:  Tracer can only be used to trace agents which are running on the same node where the tracing is occurring.  For multi-node systems, where errors may happen on different nodes, it is necessary to access the monitoring system for the node where the error occurred.

Tracing an agent is a different than tracing Process Commander user work, as the agents can run quickly and be gone before Tracer can catch them.  Therefore, To trace the running of an agent, the following steps are recommended:

  1. Check the scheduling of the agent in the Agent Schedule instance.  (If the agent runs once a week, and has just run, it won’t run again for a while.  If the periodicity of the agent is long, temporarily shorten it for troubleshooting.)
  1. Before the agent runs, go to the Agent Status screen and click the “Delay next execution of this queue for Tracer startup” function for the agent in question.  For Version 4.2, this is an icon to the left of the agent entry.

zzz

  1. Go to the Requestor Status page.  Continue to refresh this page until a new requestor is displayed which shows a “Waiting” message in the Last Input column.  (The requestor should show after 60 seconds.)  NOTE:  For agents, the hash name of the requestor should begin with “B” (“batch”). 
  1. Once that requestor is visible, trace the activity execution by clicking the last icon in the row for that activity (“Trace activity execution”).

zzz

  1. Tracer will launch a new window, showing the trace for this process.
  1. Review the trace for errors.

Top of Page


Tracing an Agent Activity with Tracer

If you have tracked the problem down to one agent activity, you can just trace that activity, using the following procedure.

IMPORTANT NOTES:   

  • The process described here will run the agent activity so that it may be traced; however, the agent must run in order to trace it, so whatever processing it is designed to do will also occur.  For this reason, the developer should take care that nothing is run that will cause problems for the application later (sending an odd test email to a customer, for example).  In addition, this process may give unpredictable results if used to trace things like the system pulse; tracing system procedures is not recommended.
  • Agents are designed to run in the background, using their own access groups.  This tracing process will be run by a user in the foreground, using the user’s access group, which may or may not have access to the necessary objects for the activity.  Use caution.
  1. Begin by stopping the agent (so it doesn’t run when you’re trying to troubleshoot it).  From the Agent Status screen, choose the agent entry in question, and stop it by clicking on the middle (trash can) icon for the agent event to Terminate the Queue.
  1. Open the agent activity rule.  (For example, the CheckSLA activity.)
  1. Start the Tracer.  Make sure the RuleSet where the activity is defined is included in the trace.
  1. Click the Run toolbar button
  1. Execute the activity by providing the parameter “Assign-WorkList” and clicking Execute.

zzz

  1. The Tracer will display the execution of the activity steps.

zzz

  1. Re-enable the agent that was stopped in Step 1.

Top of Page

Tracing an Agent Activity in the Log File

In addition to live tracing in the Tracer, data can be sent to the PegaRULES log file. 

To enable logging for any object, the prlogging.xml file must be changed. This file can be found in the PegaRULES Application directory under contextroot/WEB-INF/classes (or in APP-INF for .ear deployments).

Example:  prweb/WEB-INF/classes

To enable logging for an activity, append the category node in the file.  Note that instead of hyphens, underscores are used in this file:

<category name = “Rule_Obj_Activity.activityname”>
<priority value= “debug”/>
</category>
Example:
<category name = “Rule_Obj_Activity.ProcessServiceLevelEvents.Assign_”>
<priority value= “debug”/>
</category>

This causes a great deal of information about this activity to be logged to the PegaRULES log file, including full details on any error. 

If tracing the agent activity does not help, try setting the Log Levels to “debug” for the agents themselves:

com.pega.pegarules.engine.context.Agent
com.pega.pegarules.engine.context.BatchRequestorTask
com.pega.pegarules.engine.context.agent

Important:  Once the debugging is completed, delete this section of the file from the prlogging.xml file, to prevent the PegaRULES Log file from becoming too large, or affecting performance. 

Top of Page

Using Log-Agent for Debugging

NOTES: 

  • This feature is only useful for debugging SLA agents, as the Service Level functionality is the only agent set up to use this log. 
  • Unlike using Tracer, this feature will allow tracing of errors on any node in a multi-node system.

Agents started after the below change will create persistent instances of the Log-Agent class, which can be viewed using a List report.  This report should show all the steps that each agent executes:

“hit deadline time for assignment”
“moved assignment to manager”

etc.

Any errors the agent encounters will also be displayed.

Example:

zzz

Messages:

  • Deadline time processing for work item PTC-2
  • The work object PROCOMTEST PTC-2 could not be opened:  Unable to open an instance using the given inputs:  PROCOMTEST PTC-2
  1. Begin by stopping the agent (so it doesn’t run when you’re trying to troubleshoot it).  From the Agent Status screen, choose the agent entry in question, and stop it by clicking the middle (trash can) icon for the agent event to Terminate the Queue.
  1. Open the Library rule (Rule-Utility-Library instance) named Default in the Pega-RULES RuleSet.
  1. On the Static Variables tab, change the Value field for the DebugAgents  constant to true. Save the form.

zzz

  1. On the Packages tab, click the Generate Library button.*
  1. Restart the agent (or restart the system, if preferred). 
  1. After running the system (and the agents) for some interval, open the List rule Log-Agent.FullList.  Click the Play icon (at the top of the window) to run the report.
  1. Review the report to track the progress of the agent.
When debugging is complete, undo the change in Step 3 to avoid unnecessary processing (which can affect agent performance).  Access the Default Library, change the Value for DebugAgents to false, save the Library form, and then click the Generate Library button.*

*A note about generating libraries:  Clicking this button only generates libraries for the current node.  In a multinode system, to generate all the libraries, either enter each node and click the Generate Libraries button, or stop each node, delete the PegaRULES_Extract_Marker.txt file, and bring the node back up.

Top of Page

Problem Investigation - Version 5.1

Checking the Status of an Agent

For Version 5.1, the agent status is displayed in the System Management Application.  Instead of icons at the side of each agent entry, there are buttons at the top controlling the functionality.

zzz

If there is a green check to the left of the agent entry, then that agent is active.  (It may be running, or it may be waiting to run, but it is “alive.”)  The red “X” to the left of the agent entry indicates that that agent is not active in the system.  Generally, when the red “X” displays for an agent, there will also be an error message in the Exception info column.

For full details on managing agents through this facility, reference the System Management Application Reference Guide.

Top of Page

Tracing an Agent

NOTE:  Tracer can only be used to trace agents which are running on the same node where the tracing is occurring.  For multi-node systems, where errors may happen on different nodes, it is necessary to access the monitoring system for the node where the error occurred.

Tracing an agent is a bit different than tracing Process Commander user sessions, as the agents can run quickly and be gone before Tracer can catch them.  Therefore, To trace the running of an agent, the following steps are recommended:

1.  Check the scheduling of the agent in the Agent Schedule instance.  (If the agent runs once a week, and has just run, it won’t run again for a while.  If the periodicity of the agent is long, temporarily shorten it for troubleshooting.)

2.  Before the agent runs, go to the Agent Status screen and delay the execution of the Agent Queue.  For Version 5.1, click the radio button next to the agent entry, and then click the Delay button at the top of the screen in the Single activity in queue section.

zzz

3.  Go to the Requestor Management page.  Continue to refresh this page until a new requestor is displayed which shows a “Waiting” message in the Last Input column.  (The requestor should show after 60 seconds.)  NOTE:  For agents, the hash name of the requestor should begin with “B” (“batch”). 

4.  Once that requestor is visible, trace the activity execution.  Click the radio button for that activity, and then click the Tracer button at the top of the screen.

zzz

5.  Tracer will open in a new window, showing the trace for this process.

6.  Review the trace for errors.

Top of Page

Tracing an Agent Activity with Tracer

If the problem has been tracked to one agent activity, it is also possible to just trace that activity, using the following procedure.

IMPORTANT:   

  • The process described here will run the agent activity so that it may be traced; however, the agent must run in order to trace it, so whatever processing it is designed to do will also occur.  For this reason, you should take care that nothing is run that will cause problems for the application later (sending an odd test email to a customer, for example).  In addition, this process may give unpredictable results if used to trace things like the system pulse; tracing system procedures is not recommended.
  • Agents are designed to run in the background, using their own access groups.  This tracing process will be run by a user in the foreground, using the user’s access group, which may or may not have access to the necessary objects for the activity.  Use caution.

1.  Begin by stopping the agent (so it doesn’t run when you’re trying to troubleshoot it).  From the Agent Status screen, click the radio button to choose the agent event, and then click the Stop button at the top of the screen in the Single activity in queue section.

zzz

2.  Open the agent activity rule.  (For example, the CheckSLA activity.)

3.  Start the Tracer.  Make sure the RuleSet where the activity is defined is included in the trace.

4.  Click the Run toolbar button..

5.  Execute the activity by providing the parameter “Assign-WorkList” and clicking Execute.

zzz

6.  The Tracer will display the execution of the activity steps.

zzz

7.  Re-enable the agent that was stopped in Step 1.

Top of Page

Tracing an Agent Activity in the Log File

In addition to live tracing in the Tracer, data can be sent to the PegaRULES log file. 

To enable logging for any object, the prlogging.xml file must be changed. This file can be found in the PegaRULES Application directory under contextroot/WEB-INF/classes (or in APP-INF for .ear deployments).

Example:  prweb/WEB-INF/classes

To enable logging for an activity, append the category node in the file.  Note that instead of hyphens, underscores are used in this file:

<category name = “Rule_Obj_Activity.activityname”>
<priority value= “debug”/>
</category>

Example: 

<category name = “Rule_Obj_Activity.ProcessServiceLevelEvents.Assign_”>
<priority value= “debug”/>
</category>

In Version 5.1, it is also possible to access this functionality through the Tools menu, using Logging Level Settings.

zzz 

Fill the Logger Name in with the class name to be traced, fill in DEBUG in the Set Level field, and click the Set Level button.

This will cause a great deal of information about this activity to be logged to the PegaRULES log file, including full details on any error. 

If tracing the agent activity does not help, try setting the Log Levels to “debug” for the agents themselves:

com.pega.pegarules.engine.context.Agent
com.pega.pegarules.engine.context.BatchRequestorTask
com.pega.pegarules.engine.context.agent

Important:  Once the debugging is completed, delete this section of the file from the prlogging.xml file, to prevent the PegaRULES Log file from becoming too large, or affecting performance. 

Top of Page

 Using Log-Agent for Debugging

NOTES: 

  • This feature is only useful for debugging SLA agents, as the Service Level functionality is the only agent set up to use this log. 
  • Unlike using Tracer, this feature will allow tracing of errors on any node in a multi-node system.

Agents started after the below change will create persistent instances of the Log-Agent class, which can be viewed using a List report.  This report should show all the steps that each agent executes:

“hit deadline time for assignment”

“moved assignment to manager”

etc.

Any errors the agent encounters will also be displayed.

Example:

zzz

Messages:

  • Deadline time processing for work item PTC-2
  • The work object PROCOMTEST PTC-2 could not be opened:  Unable to open an instance using the given inputs:  PROCOMTEST PTC-2

1.  Begin by stopping the agent (so it doesn’t run when you’re trying to troubleshoot it).  From the Agent Status window, choose the agent entry in question, and stop it.  In Version 5.1, click the radio button to choose the agent event, and then click the Stop button at the top of the window in the Single activity in queue section.

zzz

2.  Open the Library (Rule-Utility-Library instance) named Default in the Pega-RULES RuleSet.

3.  On the Static Variables tab, change the Value field for the DebugAgents  constant to true. Save the form.

zzz

4.  On the Packages tab, click the Generate Library button.*

5.  Restart the agent (or restart the system, if preferred). 

6.  After running the system (and the agents) for some interval, open the List rule Log-Agent.FullList.  Click the Play icon (at the top of the screen) to run the report.

7.  Review the report to track the progress of the agent.

8.  When debugging is complete, undo the change in Step 3 to avoid unnecessary processing (which can affect agent performance).  Access the Default library, change the Value for DebugAgents to false, save the Library form, and then click the Generate Library button.*

*A note about generating libraries:  clicking this button will only generate libraries for the current node.  In a multi-node system, To generate all the libraries, it is currently necessary to either enter each node and click the Generate Libraries button, or bring the entire system down, delete the PegaRULES_Extract_Marker.txt file, and bring the system back up again.

Top of Page

Troubleshooting Strategy

When working with agents, there are three main points to verify immediately: 

  • Whether all agents are enabled on the system
  • Whether the specific agent is enabled
  • Whether it has access to the appropriate RuleSets

Verify all agents are enabled on the system

Version 4.2

There is an agent enablement setting in the pegarules.xml file.  The enable setting in the agent section of this file should be set to true:

<!--This section provides basic configuration settings for all systemagents. The agents themselves are maintained in Rule-Agent-Queueand Data-Agent-Queue instances.-->
     <node name="agent">
          <map>
              <!--                    Globally Enable/Disable Agents               -->
               <entry key="enable" value="true"/>
          </map>
     </node>

Version 5.1

There is an agent enablement setting in the prconfig.xml file.  The enable setting in the agent section of this file should be set to true:

<env name="agent/enable" value="true" />

Top of Page

Verify that the specific agent is enabled

To confirm whether an agent activity can even execute, check the Agent Schedule (Data-Agent-Queue) instance.

zzz

In the above example, the only agent activity which is enabled is the SendCorrespondence activity.  The SLAs for this system (ProcessServiceLevelEvents) would not be firing.

There are also agent enablement settings in the pegarules.xml or prconfig.xml file.  The enable setting in the agent section of these files should be set to true.

Top of Page


Verify that the agent has access to all RuleSets it requires

As described in the Access Groups section above, the agent must have access to the agent activity to run it.  If a developer has overridden the SendCorr activity (for example), and saved it into their MyCo RuleSet, then the Pega-ProCom agent must have the MyCo RuleSet in its Access Group (either through the BATCH Requestor Type or through the Agent Schedule instance for the node).

This is the single most common agent problem. 

Top of Page

Possible agent error messages

There are a number of possible error messages that may display when running agents, including:

  • SLA agent – error on assignment
  • java.util.ConcurrentModificationException
  • InvalidParameterException:  Class not defined in dictionary:  History-classname
  • PRRuntimeException:  Failed to update the PegaRULES index – Lock obtain timed out
  • PRAppRuntimeException:  A commit cannot be performed

These errors would be displayed in the PegaRULES log.  In addition, they would be displayed in the agent monitoring screen (see the Checking the Status of an Agent) section.

If your system has not displayed any errors, but there seems to be some problem with an agent not running, continue in the Additional steps if no agent error displays section.

SLA agent - error on assignment

If the SLA agent is unable to work on an assignment for some reason, an error will appear on that assignment in the user’s worklist:

zzz

There are a series of these errors:

Error

Description

Error:  Agent Work Object  (version 4.2)

Error:  Agent Work Open  (version 5.1)

Agent is unable to open the work item.

Error:  Agent Work Security

Agent is unable to open the work item due to the security settings.

Error:  Agent Work Modify Security

Agent is unable to obtain a lock on the work item due to lacking security access to the modify setting

Error:  Agent Cover Object  (version 4.2)

Error:  Agent Cover Open  (version 5.1)

Agent is unable to open the cover work item.

Error:  Agent Cover Security

Agent is unable to open the cover work item due to security settings.

Error:  Agent Cover Modify Security

Agent is unable to obtain a lock on the cover of the work item due to lacking security access.

Error:  Agent Exception

Agent has encountered an error which does not fall into one of the specific categories.

Error:  Agent Rule-Obj-Flow Not Found

The Rule-Obj-Flow the agent is trying to execute is not found.

Error:  Agent Assign Task Deleted

The assignment task is not found in the Rule-Obj-Flow that the agent is trying to execute.

Important:  If the system is running BRE processing, and there is no worklist, then these errors will not appear. 

In this case, developers must be extremely careful to make sure that agents – especially the SLA agent – have the appropriate access to their activities.  If, for example, the SLA agent is supposed to run an activity in the MyCo RuleSet for each work object that reaches its goal time, but does not have access to the MyCo RuleSet, then the activity will not run for each work object that reaches the goal, and no error will be displayed. 

Even if the agent is later given access to the MyCo RuleSet, all the work objects which have passed their goal time prior to the fix will never have the SLA Activity run.  The developer would have to adjust the access for the SLA agent, and then manually run the activity for each work object that was missed during the problem period.

java.util.ConcurrentModificationException

Update:  This issue is fixed in Version 4.2 SP6 and in Version 5.1.

When a work item was being processed, and for some reason the transaction had to be rolled back, the agent wouldn’t handle that rollback, but would give a ConcurrentModificationException error:

16:11:14,849 [1129T051114.755 GMT)] (          engine.context.Agent) WARN   - runEnd():
Assign-Corr.SendCorr; problem discovered.
16:11:14,866 [1129T051114.755 GMT)] (          engine.context.Agent) ERROR  - Agent
"Pega-ProCom" activity "Assign-Corr.SendCorr" disabled due to execution errors
16:11:17,184 [    WebContainer : 3] (          engine.context.Agent) INFO   -
Pega-ProCom #1: Restarting queue.
16:11:19,204 [1129T051117.359 GMT)] (y.ExecuteSLA.Assign_WorkBasket) ERROR  - **
The cover open count is incorrect; it is -1
16:11:19,206 [1129T051117.359 GMT)] (          engine.context.Agent) ERROR  -
Batch activity "Assign-.ProcessServiceLevelEvents" threw:
(BE32F9082BEB0F12B6E6CA78B34D42A93)
com.pega.pegarules.pub.PRRuntimeError: PRRuntimeError
        at com.pega.pegarules.engine.context.PRThreadImpl.runActivitiesAlt(PRThreadImpl.java(Compiled Code))
        at com.pega.pegarules.engine.context.PRThreadImpl.runActivities(PRThreadImpl.java(Inlined Compiled Code))
        at com.pega.pegarules.engine.context.Agent$BatchRequestorTask.run(Agent.java(Compiled Code))
        at com.pega.oswego.concurrent.PooledExecutor$Worker.run(PooledExecutor.java(Compiled Code))
        at java.lang.Thread.run(Thread.java(Compiled Code))
Caused by: java.util.ConcurrentModificationException
        at java.util.HashMap$HashIterator.nextEntry(HashMap.java(Compiled Code))
        at java.util.HashMap$KeyIterator.next(HashMap.java(Compiled Code))
        at com.pega.pegarules.engine.database.DatabaseImpl.rollback(DatabaseImpl.java:2075)

RESOLUTION:  If this error is encountered, a recommendation should be made to upgrade to Version 4.2 SP6.  If that is not possible, a hotfix is available.

InvalidParameterException:  Class not defined in dictionary:  History-classname

Agents can be set up to write History records when work items are processed.  If the History- class is not set up correctly, then the following error may occur when correspondence is sent or other agents are running:

Exception at 20060320T043646.485 GMT: (unknown)
com.pega.pegarules.pub.clipboard.InvalidParameterException: Class not defined in dictionary:
History-QA-Work.  Details: Invalid value for aClassName passed to PRThread.createPage
 at com.pega.pegarules.engine.context.PRThreadImpl.createPage(PRThreadImpl.java(Compiled Code))

RESOLUTION: Create the appropriate History- classes in the system.

PRRuntimeException:  Failed to update the PegaRULES index - Lock obtain timed out  

This is an error related to the Lucene Full-Text Search capabilities, which are updated by the System Pulse agent.  The Lucene indexing creates lock files in the process of updating its indexes.  These lock files are text files written to disk:

  • in Version 4.2:  stored in the Java temporary directory
  • beginning in Version 4.2 Service Pack 5:   stored in the PegaRULES temp directory (usually designated with the explicitTempDir setting in the pegarules.xml file)

The files are named as follows:

       lucene-hashcode-write.lock

Example:

        lucene-23yu9d87132b783g425c1b2cf5643w901-write.lock

Lucene releases this lock after the index has been updated, by deleting the .lock file. 

Occasionally, there is an error if, for some reason, this .lock file is not deleted.  (This may occur if, for example, Process Commander is terminated when Lucene is in the middle of writing to an index, or some other system error occurs.)  When the system is restored, the indexing operations will fail; the next time indexing is attempted, Lucene tries to take out a lock on the index file, and fails, as there is already an existing “lock” on the file – the .lock file already exists on the disk.  Lucene will wait some amount of time for the index to be “free” and the lock to be “released” (deleted), which will never happen, as it was left by mistake, and will eventually time out with the error; to get around this error, delete the specified xxx-write.lock file from the specified folder on disk.

If the System Pulse agent runs when this error exists, then that agent will also error out:

Exception at 20060817T191621.504 GMT:(unknown) com.pega.pegarules.pub.PRRuntimeException: Failed to update the PegaRULES index – Lock obtain timed out: Lock@C:\Programs\jakarta-tomcat-4.1.27\work\Standalone\localhost\pr3web\lucene-559cf06b06ba234ec8a590c2b97e2bb9-write.lock at com.pega.pegarules.engine.search.RuleIndexer.indexPages(RuleIndexer.java:1132) at com.pega.pegarules.engine.search.RuleIndexer.updateIndex(RuleIndexer.java:996) at com.pega.pegarules.engine.runtime.Executable.updateIndex(Executable.java:5189) . . . 8 more at java.lang.Thread.run(Thread.java:534)Caused by: java.io.IOException: Lock obtain timed out: Lock@C:\Programs\jakarta-tomcat-4.1.27\work\Standalone\localhost\pr3web\lucene-559cf06b06ba234ec8a590c2b97e2bb9-write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:58) at org.apache.lucene.index.IndexReader.aquireWriteLock(IndexReader.java:408)... 15 more

The System Pulse will then be disabled until the you manually enable it again.

RESOLUTION:  Delete the Lucene .lock file, and then manually restart the System Pulse agent.

PRAppRuntimeException:  A commit cannot be performed

This error occurs when there is a corrupt work object present in the system.  The agent cannot execute its processing (either an SLA, or sending correspondence) on this work object, due to the corruption issue.   RuntimeException errors will occur (see below).

RESOLUTION:  The corrupt work object must be isolated and deleted, so the agent will not be stopped by trying to process it.

Example of SLA Agent Error:

15:51:49,890 [0701T225148.031 GMT)] (          engine.context.Agent) ERROR  - Batch
activity "Assign-.ProcessServiceLevelEvents" threw:
(B28A3ECFDF21AEC1F3DFC85407BAB6229)
com.pega.pegarules.pub.PRRuntimeError: PRRuntimeError
 at com.pega.pegarules.engine.context.PRThreadImpl.runActivitiesAlt(PRThreadImpl.java:1088)
 at com.pega.pegarules.engine.context.PRThreadImpl.runActivities(PRThreadImpl.java:950)
 at com.pega.pegarules.engine.context.Agent$BatchRequestorTask.run(Agent.java:2577)
 at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(PooledExecutor.java:727)
 at java.lang.Thread.run(Thread.java:534)
Caused by: com.pega.pegarules.pub.PRRuntimeException: Encountered database exception when
 committing operation <update instance  OPS-CUSTSERV-PA SPA-195 not only if new>
 at com.pega.pegarules.engine.database.DatabaseImpl.performOps(DatabaseImpl.java:1870)

Example of SendCorr Agent Error:

 Exception at 20060609T171430.903 GMT:
(unknown)
com.pega.pegarules.pub.PRAppRuntimeException: A commit cannot be performed
because a deferred save of instance CUST-WORK-CPM-INTERACTION I-16660
failed: code: SQLState: Message:
at
com.pegarules.generated.activity.ra_activity_assign_corr_executesendactivity_
c62724bff3064defd1a17a14f5d80b8e.step6_circum0(ra_activity_assign_corr_executesendactivity_
c62724bff3064defd1a17a14f5d80b8e.java:519)
at
com.pegarules.generated.activity.ra_activity_assign_corr_executesendactivity_
c62724bff3064defd1a17a14f5d80b8e.perform(ra_activity_assign_corr_executesendactivity_
c62724bff3064defd1a17a14f5d80b8e.java:152)

Top of Page

Additional steps if no agent error displays  

If the agent is not running correctly, but no specific error displays, follow these additional troubleshooting steps:

  • Determine whether custom agents are present
  • Determine whether the agents are running
  • Determine whether the particular agent executed
  • Determine whether the activity executed
Determine whether custom agents are present

As with most rules, the Agent Queue rules may be saved into a custom RuleSet and modified.  As stated in the Background, this is not a recommended process for agents, as it can result in duplicate agents running,.  If this occurs, troubleshooting must determine what behavior is expected. 

  • Are you expecting the agent to follow the standard (shipped) Agent Queue settings, and is instead getting the custom settings?
  • Are you expecting the agent to follow the new customized Agent Queue settings, and is instead getting the base Agent Queue behavior?

If you expect standard processing, and you determine that custom agents are present, you should disable the custom agents, or delete them from the system if they are no longer used.

If you expect customized processing, and are getting the base agent behavior, then check the agent access group settings to make sure that the agents have access to the customized rules.  (See the Access Groups section of this Play for details.)

When you upgrade a system from one Process Commander version to another, there may be issues with custom agents or duplicated agents.

Determine whether the agents are running

Check to make sure all the agents are running. 

  • If an agent stopped due to an error, then even after fixing the error, the agent must be manually restarted.  (The only way to automatically start an agent presently is to stop and start the entire system.)   Agents may be manually started by using the System Console (Version 4.2) or System Management Application (5.1).
  • If for some reason, the database was stopped and then restarted, then all the agents will need to be restarted.  (They would all be disabled due to connection-related exceptions, as they were not able to reach the database while it was down.)
Determine whether the particular agent executed

Tracing the agent execution can show whether the agent executed or not.  (See the Tracing an Agent section of this Play for details.) 

Other places to check include

  • looking at the monitoring tools to see if an error is displayed for that agent
Determine whether the activity executed

For many agent activities which act on work objects, a history entry is put into the work object.  For work objects which should have had an SLA fired or other agent processing, check the history listing of that work object to see if there are messages there.

zzz

If the messages are present in the history listing, then the activity is firing, and some other issue is occurring. 

If the messages are not present, then the activity itself should be investigated, to see whether there is an issue there.  Run the activity (see the Tracing an Agent Activity section) to see if errors occur.

Another way to test whether an activity is firing is to add a first step to the activity that will write a line to the PegaRULES log file stating that Activity XXX ran with a date/timestamp of XXX.   This can be done directly, or by using the oLog.infoForced() method in a Java step:

oLog.infoForced(“***DEBUG MESSAGE ***”)

This tells the prlogging.xml file to write this information to the PegaRULES log file, and will automatically include the activity name and timestamp in the log entry.

Top of Page

SLA Agent is not firing when expected

When enabling the Pega-ProCom agent, you may want to have custom scheduling:  “only on business days,” or “only run at night.”  During the setup of this agent, you can choose a  Recurring pattern, and then define a custom recurrence.

zzz

Important points to check when troubleshooting an SLA agent include:

Recurrence start time

Obvious though it may seem, the Start date or time cannot be in the past.  If the recurrence start time has already occurred, this agent will never fire.

Business Days vs regular days

The days for the Service Level agent may be set in the agent description.  They may also be set when defining the Service Level itself:

zzz

The concept of “business days” is different from standard days, and the system will calculate them differently.  When “business days” are checked, the system will use information from a calendar of workdays and holidays, which is defined by a Calendar data object (Data-Admin-Calendar instance). 

The Calendars must be defined for more than one year; otherwise, when the year turns, the definitions are missing and errors will result.

Server clock time

There are several different systems involved in a Process Commander application setup:  the application server, the database server, and possible external servers.  Each of these may have a different “clock time” set as the internal time.  The servers may be in different time zones, or their internal time may just be set in error.  When troubleshooting an agent problem, check for differences in system times.

Top of Page

Additional Resources

See the following documentation:

  • Administration and Security Guide for Version 5.1, chapter 7
  • System Management Application Reference Guide

Top of Page

Need Further Help?

If you have followed this Support Play, but still require additional help, contact Global Customer Support by logging a Service Request.

Top of Page

Suggest Edit

75% found this useful

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.