Skip to main content

This content has been archived and is no longer being updated. Links may not function; however, this content may be relevant to outdated versions of the product.

Support Article

VBD status is "JOINING_FAILED" and real time data flows are down

SA-64885

Summary



Visual Business Director (VBD) status is JOINING_FAILED and realtime data flows fail when they are reactivated.


Error Messages



Pega logs display repeated messages as below:

[XXXX]] [STANDARD] [ ] [ ] (tor$QueueBasedDataFlowExecutor) ERROR  - Unexpected error occurred during event processing: null
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)
[...]

INFO  [PERIODIC-COMMIT-LOG-SYNCER] Server.java:225 - Stop listening for CQL clients

ERROR [PERIODIC-COMMIT-LOG-SYNCER] CommitLog.java:398 - Failed to persist commits to disk. Commit disk failure policy is stop; terminating thread

org.apache.cassandra.io.FSWriteError: java.io.IOException: No space left on device

      at org.apache.cassandra.db.commitlog.CommitLogSegment.sync(CommitLogSegment.java:329) ~[apache-cassandra-2.1.14.jar:2.1.14]

      at org.apache.cassandra.db.commitlog.CommitLog.sync(CommitLog.java:195) ~[apache-cassandra-2.1.14.jar:2.1.14]      at org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:81) ~[apache-cassandra-2.1.14.jar:2.1.14]

      at java.lang.Thread.run(Thread.java:745) [na:1.8.0_73]

Caused by: java.io.IOException: No space left on device

      at java.nio.MappedByteBuffer.force0(Native Method) ~[na:1.8.0_73]

      at java.nio.MappedByteBuffer.force(MappedByteBuffer.java:203) ~[na:1.8.0_73]

      at org.apache.cassandra.db.commitlog.CommitLogSegment.sync(CommitLogSegment.java:315) ~[apache-cassandra-2.1.14.jar:2.1.14]

      ... 3 common frames omitted


Steps to Reproduce



Add the VBD node. The node enters the JOINING_FAILED status and realtime data flows enter the FAILED status when reactivated.


Root Cause



A defect or configuration issue in the operating environment:

The issue occurred because Cassandra was Compacting the data.
The Commit disk failure policy caused Cassandra to shutdown (as a safety mechanism to protect data integrity).

The diskspace used (on the fileystem '/appvol/pega' - where Cassandra data was stored) was ~50 GB on this filesystem. The size of the filesystem was 100 GB. Hence, ~50% of the filesystem was used.

The 'wasadm' user (that runs Pega or Cassandra) did not have a 'ulimit' imposed. As a result, a quota was not enforced. Hence, Cassandra Compaction caused an increase in diskspace because the existing data was copied.

Based on the error message:
  1. Cassandra had insufficient disk space to perform Compaction of data.
  2. Cassandra had shutdown because it could not Compact the data.
  3. The additional Pega logging 'spike' was caused because Cassandra had shutdown. Hence, the error messages were repeated in the Pega log files.
  4. The filesystem which Cassandra used to store data was ~50% full of a 100 GB total.

Resolution

Perform the following local-change:

  1. Increase the diskspace from 100 GB (of which ~50 GB is used) to 200 GB.
  2. Restart the application
  3. Compact the data.

 

Published October 20, 2018 - Updated October 8, 2020

Was this useful?

0% found this useful

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega Community has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice
Contact us