Support Article

One server cannot connect to DNode

SA-26994

Summary



On a multinode PRPC env with 6 nodes, adding a server has DNode is successful for all but one server. As a result, the DNode for this cluster is stuck in Joining state.
 

Error Messages



java.lang.RuntimeException: java.io.IOException: Cannot proceed on repair because a neighbor (-ip address) is dead: session failed
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)


Steps to Reproduce



On a multinode system, Add each server has a Dnode server


Root Cause



A defect or configuration issue in the operating environment . recently a hotfix HFix-28585 was released for DSM. this hotfix has implementation to perform repair when cluster size changes, After the hotfix is installed, there is a a special instructions to follow in term of restart procedure. restarting all nodes at once will exhibit this error.

Resolution



For a fresh Setup, please follow the below steps.
  1. Decommission all Dnodes from DNode Cluster management page (skip if the decommision action is unavailable)
  2.  Delete dynamic system settings like "dnode/<NodeID>/enableAtStartup"
  3. Shutdown all servers
  4. Delete the PegaTempDir of each server, (this also deletes 'prpc' directory that hold cassandra/dnode data.)
  5. Start the servers one by one, and add to DNode cluster, starting only a new server once the latest server has been added as a DNode.

 

Published August 19, 2016

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.