Support Article
one node Crash in production once we upgraded to was8 java1.6
Summary
Node crashed after upgrade to WAS 8.
Error Messages
[9/23/14 7:55:22:059 EDT] 0000001e ThreadMonitor W WSVR0605W: Thread "WebContainer : 42" (00000083) has been active for 694103 milliseconds and may be hung. There is/are 3 thread(s) in total in the server that may be hung.
"WebContainer : 42" J9VMThread:0x0000000047014300, j9thread_t:0x000001003D8D2140, java/lang/Thread:0x00000007267C7BD0, state:R, prio=5
(native thread ID:0x1B1036B, native priority:0x5, native policy:UNKNOWN)
Java callstack:
at java/net/SocketInputStream.socketRead0(Native Method)
at java/net/SocketInputStream.read(SocketInputStream.java:140(Compiled Code))
at oracle/net/ns/Packet.receive(Packet.java:283(Compiled Code))
at oracle/net/ns/DataPacket.receive(DataPacket.java:103(Compiled Code))
at oracle/net/ns/NetInputStream.getNextPacket(NetInputStream.java:230(Compiled Code))
at oracle/net/ns/NetInputStream.read(NetInputStream.java:175(Compiled Code))
at oracle/jdbc/driver/T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:104(Compiled Code))
at oracle/jdbc/driver/T4CMAREngine.getNBytes(T4CMAREngine.java:1490(Compiled Code))
at oracle/jdbc/driver/T4C8TTILobd.unmarshalLobData(T4C8TTILobd.java:450(Compiled Code))
at oracle/jdbc/driver/T4C8TTILob.readLOBD(T4C8TTILob.java:767(Compiled Code))
at oracle/jdbc/driver/T4CTTIfun.receive(T4CTTIfun.java:358(Compiled Code))
at oracle/jdbc/driver/T4CTTIfun.doRPC(T4CTTIfun.java:191(Compiled Code))
at oracle/jdbc/driver/T4C8TTILob.read(T4C8TTILob.java:146(Compiled Code))
at oracle/jdbc/driver/T4CConnection.getBytes(T4CConnection.java:2344(Compiled Code))
(entered lock: oracle/jdbc/driver/[email protected], entry count: 1)
at oracle/sql/BLOB.getBytes(BLOB.java:331(Compiled Code))
at oracle/sql/BLOB.getBytes(BLOB.java:217(Compiled Code))
at com/pega/pegarules/engine/database/PageDatabaseMapper.getStreamBytes(PageDatabaseMapper.java:2061(Compiled Code))
Root Cause
The root cause for both occurrences is the indefinite hang waiting on DB socket reads, the call stacks are different. In other words, this hang can occur from any part of the application execution when interacting with database (e.g. PRRequestorImpl or FUACache). Since, various parts of the application methods are synchronized for thread safety, hang is inevitable if the synchronized method is waiting for DB indefinitely.
The network and DB team need to investigate this using network/DB monitoring tools.
As a local-change to avoid hang, the following Oracle DB datasource property can be set.
- oracle.jdbc.ReadTimeout
While the hung thread indicates its waiting on socket read connected to database, it’s difficult to explain if this is due to lost connection or slow response from database. The following artefacts will need to be collected when this issue occurs the next time.
- netstat –ano|grep “JVM process ID”
- Thread dump via kill -3
- database AWR, ADDM reports from the database for 30 min time period around the hang timestamp
- If the DBA can identify/verify if there is a session open to DB from the hung JVM. If yes, any details for all sessions that can be collected
Published January 31, 2016 - Updated October 8, 2020
Have a question? Get answers now.
Visit the Collaboration Center to ask questions, engage in discussions, share ideas, and help others.