Skip to main content

This content has been archived and is no longer being updated. Links may not function; however, this content may be relevant to outdated versions of the product.

Support Article

Server down cannot start Dnode server in production

SA-31790

Summary



Marketing server does not startup. Is not responding when attempting start D-Node.


Error Messages



No error messages but JBOSS app server is waiting.


Steps to Reproduce



Attempt to restart Dnode server.


Root Cause



A defect or configuration issue in the operating environment:

During initialization of Cassandra the last message coming out in the PegaRULES or Appserver log is

( internal.mgmt.PREnvironment) INFO - Starts Initializing Search Infrastructure
(.internal.PRSearchProviderImpl) INFO - Initialized full text search functionality for this node.
( internal.mgmt.PREnvironment) INFO - Ends Initializing Search Infrastructure
( dnode.api.DNodeBootstrap) INFO - Starting D-Node service


Take a thread dump at this time - for this case you will see a thread like this

"your_host" #148 prio=5 os_prio=0 tid=0x00007fea7413b800 nid=0x1d7e waiting on condition [0x00007fea8d660000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000074a04cf78> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at org.apache.cassandra.dht.RangeStreamer.fetch(RangeStreamer.java:256)
at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:84)
at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1002)
At this point DEBUG was set on the following classes in the prlogging.xml file

org.apache.cassandra.streaming - logger.debug("Requesting from {} ranges {}", source, StringUtils.join(ranges, ", "));
org.apache.cassandra.dht -logger.debug(String.format("Removed %s/%s as a %s source; remaining is %s", %n));


This produces this output:
cassandra.dht.RangeStreamer) DEBUG - Removed /your_ip.25/system_auth as a BOOTSTRAP source; remaining is 15
cassandra.dht.RangeStreamer) DEBUG - Removed /
your_ip.192/system_auth as a BOOTSTRAP source; remaining is 14
cassandra.dht.RangeStreamer) DEBUG - Removed /
your_ip.26/system_auth as a BOOTSTRAP source; remaining is 13


Grepping for ip addresses yields this:
/your_ip.25/system_auth
/
your_ip.192/system_auth
/
your_ip.26/system_auth
/
your_ip.191/system_auth
/
your_ip.183/system_auth
/
your_ip.42/system_auth
/
your_ip.192/data
/
your_ip.166/system_auth
/
your_ip.43/system_auth
/
your_ip.26/data
/
your_ip.191/data
/
your_ip.183/data
/
your_ip.42/data
/
your_ip.166/data
/
your_ip.43/data


Eliminating duplicates and comparing with known IP addresses for nodes in the cluster, it was determined that your_ip.153 did not respond back.
It was not in the debug log indicating that it responded and it was not removed from the list.
That it did not respond caused the node attempting to start to hang.


Resolution



Perform the following local-change:

Restart the node that was not responding,
your_ip.153. Then restart the node that was hung, and the restart is successful.

Published August 23, 2017 - Updated December 2, 2021

Was this useful?

0% found this useful

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega Community has detected you are using a browser which may prevent you from experiencing the site as intended. To improve your experience, please update your browser.

Close Deprecation Notice
Contact us