Support Article
Server down cannot start Dnode server in production
SA-31790
Summary
Marketing server does not startup. Is not responding when attempting start D-Node.
Error Messages
No error messages but JBOSS app server is waiting.
Steps to Reproduce
Attempt to restart Dnode server.
Root Cause
A defect or configuration issue in the operating environment:
During initialization of Cassandra the last message coming out in the PegaRULES or Appserver log is
( internal.mgmt.PREnvironment) INFO - Starts Initializing Search Infrastructure
(.internal.PRSearchProviderImpl) INFO - Initialized full text search functionality for this node.
( internal.mgmt.PREnvironment) INFO - Ends Initializing Search Infrastructure
( dnode.api.DNodeBootstrap) INFO - Starting D-Node service
Take a thread dump at this time - for this case you will see a thread like this
"your_host" #148 prio=5 os_prio=0 tid=0x00007fea7413b800 nid=0x1d7e waiting on condition [0x00007fea8d660000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000074a04cf78> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at org.apache.cassandra.dht.RangeStreamer.fetch(RangeStreamer.java:256)
at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:84)
at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1002) At this point DEBUG was set on the following classes in the prlogging.xml file
org.apache.cassandra.streaming - logger.debug("Requesting from {} ranges {}", source, StringUtils.join(ranges, ", "));
org.apache.cassandra.dht -logger.debug(String.format("Removed %s/%s as a %s source; remaining is %s", %n));
This produces this output:
cassandra.dht.RangeStreamer) DEBUG - Removed /your_ip.25/system_auth as a BOOTSTRAP source; remaining is 15
cassandra.dht.RangeStreamer) DEBUG - Removed /your_ip.192/system_auth as a BOOTSTRAP source; remaining is 14
cassandra.dht.RangeStreamer) DEBUG - Removed /your_ip.26/system_auth as a BOOTSTRAP source; remaining is 13
Grepping for ip addresses yields this:
/your_ip.25/system_auth
/your_ip.192/system_auth
/your_ip.26/system_auth
/your_ip.191/system_auth
/your_ip.183/system_auth
/your_ip.42/system_auth
/your_ip.192/data
/your_ip.166/system_auth
/your_ip.43/system_auth
/your_ip.26/data
/your_ip.191/data
/your_ip.183/data
/your_ip.42/data
/your_ip.166/data
/your_ip.43/data
Eliminating duplicates and comparing with known IP addresses for nodes in the cluster, it was determined that your_ip.153 did not respond back.
It was not in the debug log indicating that it responded and it was not removed from the list.
That it did not respond caused the node attempting to start to hang.
Resolution
Perform the following local-change:
Restart the node that was not responding, your_ip.153. Then restart the node that was hung, and the restart is successful.
Published August 23, 2017 - Updated December 2, 2021
Have a question? Get answers now.
Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.