Support Article

VBD status is "JOINING_FAILED" and real time data flows are down

SA-64885

Summary

Visual Business Director (VBD) status is JOINING_FAILED and realtime data flows fail when they are reactivated.

Error Messages

Pega logs display repeated messages as below:

[XXXX]] [STANDARD] [ ] [ ] (tor$QueueBasedDataFlowExecutor) ERROR - Unexpected error occurred during event processing: null
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
[...]

INFO [PERIODIC-COMMIT-LOG-SYNCER] Server.java:225 - Stop listening for CQL clients

ERROR [PERIODIC-COMMIT-LOG-SYNCER] CommitLog.java:398 - Failed to persist commits to disk. Commit disk failure policy is stop; terminating thread

org.apache.cassandra.io.FSWriteError: java.io.IOException: No space left on device

at org.apache.cassandra.db.commitlog.CommitLogSegment.sync(CommitLogSegment.java:329) ~[apache-cassandra-2.1.14.jar:2.1.14]

at org.apache.cassandra.db.commitlog.CommitLog.sync(CommitLog.java:195) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:81) ~[apache-cassandra-2.1.14.jar:2.1.14]

at java.lang.Thread.run(Thread.java:745) [na:1.8.0_73]

Caused by: java.io.IOException: No space left on device

at java.nio.MappedByteBuffer.force0(Native Method) ~[na:1.8.0_73]

at java.nio.MappedByteBuffer.force(MappedByteBuffer.java:203) ~[na:1.8.0_73]

at org.apache.cassandra.db.commitlog.CommitLogSegment.sync(CommitLogSegment.java:315) ~[apache-cassandra-2.1.14.jar:2.1.14]

... 3 common frames omitted

Steps to Reproduce

Add the VBD node. The node enters the JOINING_FAILED status and realtime data flows enter the FAILED status when reactivated.

Root Cause

A defect or configuration issue in the operating environment:

The issue occurred because Cassandra was Compacting the data. The Commit disk failure policy caused Cassandra to shutdown (as a safety mechanism to protect data integrity).

The diskspace used (on the fileystem '/appvol/pega' - where Cassandra data was stored) was ~50 GB on this filesystem. The size of the filesystem was 100 GB. Hence, ~50% of the filesystem was used.

The 'wasadm' user (that runs Pega or Cassandra) did not have a 'ulimit' imposed. As a result, a quota was not enforced. Hence, Cassandra Compaction caused an increase in diskspace because the existing data was copied.

Based on the error message:

Cassandra had insufficient disk space to perform Compaction of data.
Cassandra had shutdown because it could not Compact the data.
The additional Pega logging 'spike' was caused because Cassandra had shutdown. Hence, the error messages were repeated in the Pega log files.
The filesystem which Cassandra used to store data was ~50% full of a 100 GB total.

Resolution

Perform the following local-change:

Increase the diskspace from 100 GB (of which ~50 GB is used) to 200 GB.
Restart the application
Compact the data.

Tags:

Pega Platform 7.2.2

Pega Platform

Platform

Published October 20, 2018 - Updated October 8, 2020

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Visit the Collaboration Center

Get Started with Community

COVID-19 Employee Safety and Business Continuity Tracker

VBD status is "JOINING_FAILED" and real time data flows are down

Summary

Error Messages

Steps to Reproduce

Root Cause

Resolution

Tags:

Have a question? Get answers now.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

Get Started with Community

COVID-19 Employee Safety and Business Continuity Tracker

VBD status is "JOINING_FAILED" and real time data flows are down

Summary

Error Messages

Steps to Reproduce

Root Cause

Resolution

Tags:

Have a question? Get answers now.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

We'd prefer it if you saw us at our best.