Pega agent is getting hung and failed to do its job
Various threads are not doing their (various) intended jobs. The agents are not going down completely and they are unable to stop/interrupt them from System Management Console (attempts to do this will have no impact).
Steps to Reproduce
The root cause of this problem is in a third-party product integrated with PRPC. Inspection of thread dumps show that the agent threads were stuck within the Rule-Utility-Function SendEmailMessage - specifically in the portion of that RUF that attempts to establish a connection to the smtp email server. Within the javamail API, the smtp.connect() method has an unlimited default timeout value, so when this connect attempt hangs (for unknown reasons) - the thread will wait forever (and it is NOT in a state where it can be stopped or interrupted - this is a non-blocking operation).
This issue is resolved by hotfix item HFix-7384 - which introduces a change to the Rule-Utiliyt-Function SendEmailMessage which will put a more reasonable timeout (default to 60 seconds, configurable by a System Setting). With the hotfix applied, if the communicaiton with the email server hangs, the prpc thread will time out (and throw an exception) - with the expectation that the communication problem is sporadic and will succeed with the next retry of the same agent queue.