Support Article
Campaign enters into inconsistent state after node failure
SA-66411
Summary
When performing a Failover test for Pega Marketing, the following issues occur:
- The data flow is Paused.
- The Run schedule (PegaMKT-Work-ProgramRun instance) remains in the Running state (this is visible in: Pega Marketing portal > Campaigns > Run schedule)
- The corresponding System-Queue-ProgramRun instance remains in the Processing state (this does not change to to Success or Broken)
Error Messages
Not Applicable
Steps to Reproduce
- Set up a server with two data nodes.
- Start a scheduled Campaign run.
- Wait till the data flow starts.
- Stop the Java process that executes the Pega application on the second node (kill -9).
- Restart the web server.
Root Cause
The ProcessProgramRun agent executed the Campaigns. On killing the node where the agent is executed, the control over the Campaign run process is lost. Therefore, user intervention is required.
Resolution
Here’s the explanation for the reported behavior:
When a node crash involves Pega Marketing (PM) agents, the Campaign run continues to be in the Running state.
The PR-xxx data flow run is paused.
When the server is restarted or if the Pega Marketing portal is available, the user must click the Stop button manually on the program run affected by the system crash.
When the Stop action is submitted,
- the program run item is set as Stopped.
- the corresponding PR-xxx data flow run also enters the Stopped state (from the Paused state).
Published August 19, 2019 - Updated December 2, 2021
Have a question? Get answers now.
Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.