Node Level Data Page Causes Process To Hang
SummaryThere is a section that shows a list with source of node level data page. Sometimes, this screen hangs. Occurance is sporadic.
This usually occurs in the peak hours and after the server is restarted.
Error MessagesProcess hung exception in WAS logs.
Steps to Reproduce
- Have a node level data-page.
- When the users using it exceeds, then the screen to displaying the data-page hangs.
Root CauseA defect in Pegasystems’ code or rules. While loading a node level data page in a thread we acquire a lock on a processing valve and release the lock after loading. All other thread who tries to load the same page has to wait for the lock held by former(first) thread. After the page is ready , FIRST thread releases the lock and then PUTS this page into the directory.
The put operation is happening outside the lock window.
Since the lock is free now, any thread can take it and start the loading the same page if it is already NOT PRESENT in directory.
There is a race condition where one thread has finished loading the page and is about to do two things
- Release the lock
- Put the page in directory.
As soon as it release the lock, other thread grabs this lock and try to search the page in directory , it doesn't find any because the PUT operation has been completed yet by First Thread.
So it tries to reload the same page again and succeeds.
Need to make following two operations atomic:
- Loading of data page
- Putting the page into directory.
Then release the lock.
Published July 22, 2016 - Updated July 29, 2016