Support Article
Hazelcast timeout occurs on multiple nodes
SA-59955
Summary
Pega nodes hang sporadically when configuring a multiple node cluster. Hazelcast timeout occurs on multiple nodes.
Error Messages
[erClockSynchDaemon-0] [STANDARD] [ ] [ ] ( spi.impl.BasicInvocation) WARN - [xyz]:5701 [2daa41af6f9d6825ddcdf7697eb3f0ca] [3.4.1] No response for 120000 ms. BasicInvocationFuture{invocation=BasicInvocation{ serviceName='hz:impl:executorService', op=Operation{serviceName='hz:impl:executorService', callId=26516, invocationTime=1527167115304, waitTimeout=-1, callTimeout=60000}, partitionId=-1, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[xyz]:5701, backupsExpected=0, backupsCompleted=0}, response=null, done=false}
Steps to Reproduce
Configure a three nodes cluster on Pega 7.2.2.
Root Cause
A defect or configuration issue in the operating environment.
Thread dumps were generated manually. Multiple threads were blocked in the Oracle JDBC driver code as below:
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
...
at oracle.jdbc.driver.T4CTTIoping.doOPING(T4CTTIoping.java:50)
at oracle.jdbc.driver.T4CConnection.doPingDatabase(T4CConnection.java:5221)
- locked <0x00000006c9bf9658> (a oracle.jdbc.driver.T4CConnection)
at oracle.jdbc.driver.PhysicalConnection.pingDatabase(PhysicalConnection.java:7015)
at oracle.jdbc.driver.PhysicalConnection.pingDatabase(PhysicalConnection.java:7036)
at oracle.jdbc.driver.OracleConnection.isValid(OracleConnection.java:222)
at org.apache.tomcat.dbcp.dbcp2.DelegatingConnection.isValid(DelegatingConnection.java:916)
at org.apache.tomcat.dbcp.dbcp2.PoolableConnection.validate(PoolableConnection.java:282)
The impacted environment's driver is Oracle 12.1.0.2.0 which causes issues. For Oracle 12c, using the following version can cause synchronization failure when an offline attachment is added.
Resolution
Perform the following local-change:
- Update the driver to Oracle 12.2.0.1.
- Add validationQuery="select 1 from dual" parameter to prevent the Oracle code from invoking isValid.
- Add the following parameter to avoid potential issues with firewall.
testOnBorrow=true,validationQueryTimeout=10,testWhileIdle=true,timeBetweenEvictionRunsMillis=30
<Resource name="jdbc/PegaRULES"
auth="Container"
type="javax.sql.DataSource"
driverClassName="oracle.jdbc.OracleDriver"
url="jdbc:oracle:thin:@localhost:1521/xgc.pega"
username=""
password=""
maxActive="100"
maxIdle="30"
maxWait="10000"
validationQuery="select 1 from dual"
testOnBorrow=true
validationQueryTimeout=10
testWhileIdle=true
timeBetweenEvictionRunsMillis=30
/>
Published December 29, 2018 - Updated October 8, 2020
Have a question? Get answers now.
Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.