Support Article

JVM deadlock or a hang in some threads

SA-16116

Summary



User reported that they have noticed a JVM deadlock on the threads and their PRPC instance was hung in Pega 7.1.8.


Error Messages



In the hung threads the lock was held by the below thread;
"WebContainer : 17" Id=175646 in WAITING on lock=java.util.concurrent.locks.ReentrantLock$NonfairSync@8a1113b (running in native)
BlockedCount : 67, BlockedTime : -1, WaitedCount : 277930, WaitedTime : -1
at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:197)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:845)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:878)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1208)
    at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:225)
    at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:301)
    at com.pega.pegarules.priv.factory.AbstractContainerFactory.releaseObject(AbstractContainerFactory.java:326)
    at com.pega.pegarules.priv.factory.AbstractContainerFactory.releaseObject(AbstractContainerFactory.java:294)
    at com.pega.pegarules.priv.factory.StringBuilderFactory.release(StringBuilderFactory.java:96)
    at com.pega.pegarules.data.internal.clipboard.PropertyReferenceImpl.toString(PropertyReferenceImpl.java:3331)
    at com.pega.pegarules.data.internal.clipboard.PropertyReferenceImpl.toString(PropertyReferenceImpl.java:3285)
    at com.pega.pegarules.data.internal.clipboard.PropertyReferenceImpl.toString(PropertyReferenceImpl.java:3269)
    at com.pega.pegarules.data.internal.clipboard.VirtualClipboardPropertyImpl.getReferenceObject(VirtualClipboardPropertyImpl.java:1562)


Steps to Reproduce



a) Create a data page, at node level, call an activity to load some data.
b) Log in to the instance with multiple operators simultaneously, you would see data page is being loaded for all such operators.


Root Cause



There was a race condition where one thread has finished loading the page and is about to do two things: 

1. Release the lock
2. Put the page in directory.

As soon as it release the lock, other thread grabs this lock and try to search the page in directory, it does not find any because the PUT operation has been completed yet by First Thread. So it tries to reload the same page again and succeeds.

Resolution



Hotfix-24551 is provided which consider following two operations atomic before releasing the lock:

1. Loads data page.
2. Places the page into directory.

 

Published January 31, 2016 - Updated October 8, 2020

Have a question? Get answers now.

Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.