SendCorr Agent hung and unresponsive in Production
The Pega-ProCom Send Corr agent (responsible for running every 30 seconds and sending out correspondence from the PR_SYS_QUEUES table) was observed to be hung in Production environment. Thread dumps confirmed that the agent thread was continually found (taking multiple thread dumps at 60 second intervals) - stuck at the same point in processing.
All attempts to stop this agent have been unsuccessful - including attempts to stop or interrupt the requestor via SMA. Eventually the JVM was restarted and the agent did not re-hang and completed its processing of the queue.
In the thread dump - the agent is shown at this point where it is attempting to connect to the email server via smtps protocol (and the connection attempt waits forever):
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at java/net/SocketInputStream.socketRead0(Native Method)
4XESTACKTRACE at java/net/SocketInputStream.read(SocketInputStream.java:140(Compiled Code))
4XESTACKTRACE at com/sun/mail/util/TraceInputStream.read(TraceInputStream.java:110(Compiled Code))
4XESTACKTRACE at java/io/BufferedInputStream.fill(BufferedInputStream.java:229(Compiled Code))
4XESTACKTRACE at java/io/BufferedInputStream.read(BufferedInputStream.java:248(Compiled Code))
4XESTACKTRACE at com/sun/mail/util/LineInputStream.readLine(LineInputStream.java:88(Compiled Code))
4XESTACKTRACE at com/sun/mail/smtp/SMTPTransport.readServerResponse(SMTPTransport.java:1589(Compiled Code))
4XESTACKTRACE at com/sun/mail/smtp/SMTPTransport.openServer(SMTPTransport.java:1369)
4XESTACKTRACE at com/sun/mail/smtp/SMTPTransport.protocolConnect(SMTPTransport.java:412)
4XESTACKTRACE at javax/mail/Service.connect(Service.java:288)
4XESTACKTRACE at javax/mail/Service.connect(Service.java:169(Compiled Code))
4XESTACKTRACE at com/pegarules/generated/SendEmailMessage_060301_QsQ8aHDPfHDWdFzadW7ALQ.SendEmailMessage06_03_01(SendEmailMessage_060301_QsQ8aHDPfHDWdFzadW7ALQ.java:653(Compiled Code))
Steps to Reproduce
Not known - assumed issue on the email server side that prevents a successful connection from being established.
Root cause of this issue was not determined - we do not know why the connection attempt hangs. However, a PRPC change has been implemented which will allow stalled connection attempts to time out (allowing the agent to exit, and then retry upon its next wakeup interval).
The root cause of this problem is in a third-party product integrated with PRPC. The connection between the app server and mail server is hanging for unknown reasons, which causes the connection attempt to wait (and the default wait period is "unlimited").
A PRPC change has been implemented which will allow stalled connection attempts to time out (allowing the agent to exit, and then retry upon its next wakeup interval).
This issue is addressed by hotfix item HFix-10070. This change allows for a new default (60 seconds) for smtp operations (connect and send), with a Dynamic System Setting that can be adjusted to allow for longer timeout.