Support Article
One server cannot connect to DNode
SA-26994
Summary
On a multinode PRPC env with 6 nodes, adding a server has DNode is successful for all but one server. As a result, the DNode for this cluster is stuck in Joining state.
Error Messages
java.lang.RuntimeException: java.io.IOException: Cannot proceed on repair because a neighbor (-ip address) is dead: session failed
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
Steps to Reproduce
On a multinode system, Add each server has a Dnode server
Root Cause
A defect or configuration issue in the operating environment . recently a hotfix HFix-28585 was released for DSM. this hotfix has implementation to perform repair when cluster size changes, After the hotfix is installed, there is a a special instructions to follow in term of restart procedure. restarting all nodes at once will exhibit this error.
Resolution
For a fresh Setup, please follow the below steps.
- Decommission all Dnodes from DNode Cluster management page (skip if the decommision action is unavailable)
- Delete dynamic system settings like "dnode/<NodeID>/enableAtStartup"
- Shutdown all servers
- Delete the PegaTempDir of each server, (this also deletes 'prpc' directory that hold cassandra/dnode data.)
- Start the servers one by one, and add to DNode cluster, starting only a new server once the latest server has been added as a DNode.
Published August 20, 2016 - Updated October 8, 2020
Have a question? Get answers now.
Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.