Support Article

Server down cannot start Dnode server in production

SA-31790

Summary

Marketing server does not startup. Is not responding when attempting start D-Node.

Error Messages

No error messages but JBOSS app server is waiting.

Steps to Reproduce

Attempt to restart Dnode server.

Root Cause

A defect or configuration issue in the operating environment:

During initialization of Cassandra the last message coming out in the PegaRULES or Appserver log is

( internal.mgmt.PREnvironment) INFO - Starts Initializing Search Infrastructure
(.internal.PRSearchProviderImpl) INFO - Initialized full text search functionality for this node.
( internal.mgmt.PREnvironment) INFO - Ends Initializing Search Infrastructure
( dnode.api.DNodeBootstrap) INFO - Starting D-Node service

Take a thread dump at this time - for this case you will see a thread like this

"your_host" #148 prio=5 os_prio=0 tid=0x00007fea7413b800 nid=0x1d7e waiting on condition [0x00007fea8d660000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000074a04cf78> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at org.apache.cassandra.dht.RangeStreamer.fetch(RangeStreamer.java:256)
at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:84)
at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1002) At this point DEBUG was set on the following classes in the prlogging.xml file

org.apache.cassandra.streaming - logger.debug("Requesting from {} ranges {}", source, StringUtils.join(ranges, ", "));
org.apache.cassandra.dht -logger.debug(String.format("Removed %s/%s as a %s source; remaining is %s", %n));

This produces this output:
cassandra.dht.RangeStreamer) DEBUG - Removed /your_ip.25/system_auth as a BOOTSTRAP source; remaining is 15
cassandra.dht.RangeStreamer) DEBUG - Removed /your_ip.192/system_auth as a BOOTSTRAP source; remaining is 14
cassandra.dht.RangeStreamer) DEBUG - Removed /your_ip.26/system_auth as a BOOTSTRAP source; remaining is 13

Grepping for ip addresses yields this:
/your_ip.25/system_auth
/your_ip.192/system_auth
/your_ip.26/system_auth
/your_ip.191/system_auth
/your_ip.183/system_auth
/your_ip.42/system_auth
/your_ip.192/data
/your_ip.166/system_auth
/your_ip.43/system_auth
/your_ip.26/data
/your_ip.191/data
/your_ip.183/data
/your_ip.42/data
/your_ip.166/data
/your_ip.43/data

Eliminating duplicates and comparing with known IP addresses for nodes in the cluster, it was determined that your_ip.153 did not respond back.
It was not in the debug log indicating that it responded and it was not removed from the list.
That it did not respond caused the node attempting to start to hang.

Resolution

Perform the following local-change:

Restart the node that was not responding, your_ip.153. Then restart the node that was hung, and the restart is successful.

Tags:

Pega Marketing

Pega Marketing 7.13

User Experience

Published August 23, 2017 - Updated December 2, 2021

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.

Visit the Collaboration Center

COVID-19 Employee Safety and Business Continuity Tracker

Server down cannot start Dnode server in production

Summary

Error Messages

Steps to Reproduce

Root Cause

Resolution

Tags:

Have a question? Get answers now.

The Power of Pega Resources

Experience the benefits of Pega Community when you log in.

COVID-19 Employee Safety and Business Continuity Tracker

Server down cannot start Dnode server in production

Summary

Error Messages

Steps to Reproduce

Root Cause

Resolution

Tags:

Have a question? Get answers now.

The Power of Pega Resources

Experience the benefits of Pega Community when you log in.

We'd prefer it if you saw us at our best.