Skip to main content

This content has been archived and is no longer being updated. Links may not function; however, this content may be relevant to outdated versions of the product.

Support Article

DNodes Failing to Connect

SA-16675

Summary



On Decisioning platform which is hosting its Next Best Offer (NBO) service:

There are 4 nodes, the current status of nodes in the
DNode Cluster Management landing page has two nodes joining, and two nodes joined. These nodes that are ‘joining’ have been in this state for the last 3 days. The ones joining are on the same physical box suggesting there is some sort of connectivity issue to the other box:
• NODE 1 (xx.xxx.xx.xxx), Online-Joining
• NODE 2 (xx.xxx.xx.xxx), Online-Joining 

• NODE 3(xx.xxx.xx.xxx), Online-Normal
• NODE 4 (xx.xxx.xx.xxx), Online-Normal 

 

Error Messages



2015-10-02 18:21:22,399 [ WRITE-/xx.xxx.xx.xxx] [ STANDARD] [ ] [ ] (sandra.service.CassandraDaemon) ERROR - Exception in thread Thread[WRITE-/1xx.xxx.xx.xxx,5,main] 
java.lang.NoClassDefFoundError: org.xerial.snappy.Snappy (initialization failure) 
at java.lang.J9VMInternals.initialize(J9VMInternals.java:175) 
at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79) 
at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:66) 
at org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:383) 
at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:147) 
Caused by: 
org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] no native library is found for os.name=Linux and os.arch=ppc64 
at org.xerial.snappy.SnappyLoader.findNativeLibrary(SnappyLoader.java:460) 
at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:318) 
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229) 
at org.xerial.snappy.Snappy.<clinit>(Snappy.java:48) 
at java.lang.J9VMInternals.initializeImpl(Native Method) 
at java.lang.J9VMInternals.initialize(J9VMInternals.java:235) 
at org.apache.cassandra.io.compress.SnappyCompressor.create(SnappyCompressor.java:45) 
at org.apache.cassandra.io.compress.SnappyCompressor.isAvailable(SnappyCompressor.java:55) 
at org.apache.cassandra.io.compress.SnappyCompressor.<clinit>(SnappyCompressor.java:37) 
at java.lang.J9VMInternals.initializeImpl(Native Method) 
at java.lang.J9VMInternals.initialize(J9VMInternals.java:235) 
at org.apache.cassandra.config.CFMetaData.<clinit>(CFMetaData.java:82) 
at java.lang.J9VMInternals.initializeImpl(Native Method) 
at java.lang.J9VMInternals.initialize(J9VMInternals.java:235) 
at org.apache.cassandra.config.KSMetaData.systemKeyspace(KSMetaData.java:81) 
at org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:496) 
at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:132) 
at java.lang.J9VMInternals.initializeImpl(Native Method) 
at java.lang.J9VMInternals.initialize(J9VMInternals.java:235) 
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:216) 
at org.apache.cassandra.service.CassandraDaemon.init(CassandraDaemon.java:381) 
at com.pega.dsm.dnode.impl.cassandra.Cassandra.startCassandra(Cassandra.java:108) 
at com.pega.dsm.dnode.impl.cassandra.Cassandra.bootstrap(Cassandra.java:85) 
at com.pega.dsm.dnode.api.DNodeBootstrap.bootstrap(DNodeBootstrap.java:34) 
at com.pega.dsm.dnode.api.DNodeServiceListener$1.run(DNodeServiceListener.java:54) 
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:483) 
at java.util.concurrent.FutureTask.run(FutureTask.java:274) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627) 
at java.lang.Thread.run(Thread.java:798)


Steps to Reproduce

  1. After starting all nodes, go to the DNode Cluster Management landing page on one node.
  2. Add other nodes to the DNode infrastructure by clicking "Add node" and selecting the node.
  3. Refresh the landing page.

Initially the status of the node is still ONLINE-JOINING but once the node becomes part of the DNode cluster infrastructure the statusshould be ONLINE-NORMAL. The problem nodes have never achieved a status of ONLINE-NORMAL and this is not expected.


Root Cause



A defect or configuration issue in the operating environment where the Snappy compression library that came with Cassandra doesn’t support the underlying architecture of Linux OS and ppc64 arch and prevented successful clustering.

Resolution



Perform the following local-change: add the env setting below in prconfig.xml for all nodes to turn off compression for traffic between nodes.

<env name="dnode/yaml/internode_compression" value="none" />

This is a cassandra.yaml configuration setting that is modifiable via prconfig.xml, full details for the setting are found in the cassandra documentation.

internode_compression 
(Default: all ) Controls whether traffic between nodes is compressed. The valid values are:
all: All traffic is compressed.
dc : Traffic between data centers is compressed.
none : No compression.


After adding the env setting in prconfig.xml the following actions then need to be performed:
  1. Shutdown each node.
  2. Delete the "DSS" and “prpc” folders from Pega temp directory on all nodes.
  3. Restart each node.
  4. Connect to one node and using the DNode Cluster Management landing page, add each of the other nodes and verify that the status of ONLINE-NORMAL is displayed after a short time and that ownership is evenly distributed across all ONLINE-NORMAL nodes.

 

Suggest Edit

Published January 31, 2016 - Updated October 8, 2020

Did you find this content helpful? Yes No

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

We'd prefer it if you saw us at our best.

Pega Community has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice
Contact us