Javacore can be a powerful tool to diagnose why an application powered by IBM JVM is not very responsive. With javacore, we can see which thread is currently running at the time of dumping.
As an example, I analyzed a hanging syncNode command. The syncNode refreshes a dependent profile information from the 'master' deployment manager profile.
Typically, syncNode command succeeds within a minute.
In my case, 20 minutes had passed and it was still running.
To analyze, I produced 3 javacores every two minutes by sending a SIGQUIT (by running kill -3) signal to the syncNode process.
These javacores were then used as inputs to IBM Thread and Monitor Dump Analyzer for Java bundled with IBM Support Assistant.
![]() |
| Threads comparison of 3 javacores in IBM Thread and Monitor Dump Analyzer |
From the picture above, you can see that the main thread is getting the localhost information with InetAddress.getLocaHost. The getLocalHost() method, in turn, resolves the hostname from the IP address.
Why does the getLocalHost() span over 6 minutes snapshot time?
If we take a closer look, we can see that the look up exploits dual-IP stack implementation of IBM JVM. In this case, the syncNode command tries to lookup its own hostname. However, it starts with IPv6 look up and the DNS does not respond. So this command hangs. This is the DNS problem but I worked around by configuring resolver to only use IPv4 lookup i.e. set netsvc.conf to hosts=local4,bind4 instead of hosts=local,bind.
Backup plan: native core file + jextract to the rescue
Now we see the value of the javacore.
Unfortunately, there are cases when a java process hangs but kill -3 doesn't produce a javacore.
(Examples of these cases include a native memory issue where we simply do not have sufficient memory to satisfy just dumping a javacore).
With IBM, you are not running out of tools. IBM JVM equips you with a jextract tool. The jextract command analyzes native (core) dump of the JVM process and generate the output file together with the dump itself.
As the analysis is JVM build-specific; ideally, we should run jextract on the same problematic machine which generates the core dump. Of course, the dump must be complete for jextract to be useful. On AIX, you should enable full core dump.
Please allow me to show the value of jextract in the real-world. It comes to help my customer whose application intermittently hangs from time to time for no apparent reason.
We ran ps -ef command to confirm the process is alive.
We checked SystemOut.
> Nothing is printed.
We tried to kill -3 but no javacore was produced.
We followed the mustgather when kill -3 produces no javacore.
We have to wait for the problem to surface again to continue with this mustgather. Then, we had jextract output ready to proceed.
We use "IBM Monitoring and Diagnostic Tools - Dump Analyzer" (DTFJ) to analyze jextract output. We are able to see from all jextract snapshots which thread was hung. In our case, the current thread is always 2162735.
Java Thread: SoapConnectorThreadPool : 1 (com.ibm.ws.util.ThreadPool$Worker@0x070000003960EA20)
System thread ID: 2162735 JNI: 0x000000011788DC00
State: 0x00000005 (ALIVE,RUNNABLE) Priority: 5
WAS thread-ID: 0x00000024
Currently waiting on: (none)
Currently holding monitors: NLS hash table ID=0x0000000110014A30 ((Raw monitor - no Java object))
Currently holding monitors: (un-named monitor @0x700000005d65500 for object @0x700000005d65500) ID=0x0700000005D65500 (java.net.InetAddress$Lock@0x0700000005D65500)
Java Stack Area: 0x0000000117B7D208-0x0000000117B81A37 (18 KB) "JavaStackSection for JavaThread: SoapConnectorThreadPool : 1"
Java Stack Frames:
ip=0x00000006/sp=0x0000000117B80A68: java.net.Inet6AddressImpl.lookupAllHostAddr()
ip=0x000000011173FEE0/sp=0x0000000115BF28B0: java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
ip=0x000000011166CED8/sp=0x0000000115BF2940: java.net.InetAddress.getAddressFromNameService(InetAddress.java:1297)
ip=0x0000000111765074/sp=0x0000000117B812C8: java.net.InetAddress.getLocalHost(InetAddress.java:1475)
ip=0x00000001117B2418/sp=0x0000000117B993D0: org.apache.soap.util.mime.MimeUtils.getUniqueValue(MimeUtils.java:85)
ip=0x00000001117B2238/sp=0x0000000117B99448: org.apache.soap.rpc.SOAPContext.setRootPart(SOAPContext.java:383)
ip=0x00000001117B2098/sp=0x0000000117B99548: org.apache.soap.rpc.SOAPContext.setRootPart(SOAPContext.java:426)
ip=0x0000000111719014/sp=0x00000000: org.apache.soap.rpc.SOAPContext.setRootPart(SOAPContext.java:401)
ip=0x0000000111719014/sp=0x0000000117B99650: com.ibm.ws.management.connector.soap.SOAPResponse.writeSOAPResponse(SOAPResponse.java:74)
ip=0x000000011171328C/sp=0x0000000117B99838: com.ibm.ws.management.connector.soap.SOAPConnector.service(SOAPConnector.java:280)
ip=0x07000000423F0BE9/sp=0x0000000117B99858: com.ibm.ws.management.connector.soap.SOAPConnection.handleRequest(SOAPConnection.java:64)
ip=0x07000000423EE3C1/sp=0x0000000117B99978: com.ibm.ws.http.HttpConnection.readAndHandleRequest(HttpConnection.java:742)
ip=0x000000011170A628/sp=0x0000000117B99A60: com.ibm.ws.http.HttpConnection.run(HttpConnection.java:528)
ip=0x0700000040EDE902/sp=0x0000000113CFB490: com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1497)
Native Thread: 2162735
Associated Java Thread: SoapConnectorThreadPool : 1 (com.ibm.ws.util.ThreadPool$Worker@0x070000003960EA20)
Properties:
processor on which I'm bound: 0
wait channel: 0
current/last signal taken: 0
scheduling policy: 52
current effective priority: 0
suspend count: 1
thread state: 5
type of thread wait: 1
thread flags: c10404
processor usage: ffff0002
Native Stack Area: 0x0000000110000000-0x000000011CE6C55F (211377 KB) "stack section at 110000000"
Native Stack Frames:
ip=0x09000000004972E4/sp=0x0000000118543260: /usr/lib/libpthreads.a(shr_xpg5_64.o)::._validate_local_lock+0x3c
ip=0x0900000000052C90/sp=0x00000001185433A0: /usr/lib/libc.a(shr_64.o)::_$STATIC+0x48
ip=0x090000000004FFA4/sp=0x0000000118543480: /usr/lib/libc.a(shr_64.o)::+0x4c
ip=0x09000000028E1774/sp=0x00000001185434F0: /usr/IBM/WebSphere/AppServer/java/jre/bin/libj9prt23.so::_$STATIC+0x14c
ip=0x09000000028E1088/sp=0x0000000118543560: /usr/IBM/WebSphere/AppServer/java/jre/bin/libj9prt23.so::.j9mem_startup+0x38
ip=0x09000000028E1348/sp=0x0000000118543600: /usr/IBM/WebSphere/AppServer/java/jre/bin/libj9prt23.so::.checkTagSumCheck+0x28
ip=0x09000000028EEBDC/sp=0x00000001185436A0: /usr/IBM/WebSphere/AppServer/java/jre/bin/libj9prt23.so::.parse_catalog+0x47c
ip=0x09000000028EF9B4/sp=0x0000000118543C00: /usr/IBM/WebSphere/AppServer/java/jre/bin/libj9prt23.so::.j9nls_set_catalog+0x24
ip=0x09000000028EFED0/sp=0x0000000118543CB0: /usr/IBM/WebSphere/AppServer/java/jre/bin/libj9prt23.so::.j9nls_get_variant+0x10
ip=0x09000000028EFA44/sp=0x0000000118543D50: /usr/IBM/WebSphere/AppServer/java/jre/bin/libj9prt23.so::.j9nls_set_catalog+0xb4
ip=0x09000000029736F0/sp=0x0000000118543DC0: /usr/IBM/WebSphere/AppServer/java/jre/bin/libj9dmp23.so::.rasDumpEnableHooks_34_29+0xd8
ip=0x090000000288C29C/sp=0x0000000118544100: /usr/IBM/WebSphere/AppServer/java/jre/bin/libj9vm23.so::.generateSystemDump+0xac
ip=0x09000000028EC0B8/sp=0x00000001185451E0: /usr/IBM/WebSphere/AppServer/java/jre/bin/libj9prt23.so::.masterSynchSignalHandler+0xd0
ip=0x090000000288C594/sp=0x0000000118545560: /usr/IBM/WebSphere/AppServer/java/jre/bin/libj9vm23.so::.writeVMInfo_18_2+0x13c
ip=0x09000000028EC32C/sp=0x0000000118545630: /usr/IBM/WebSphere/AppServer/java/jre/bin/libj9prt23.so::.initializeSignalTools+0x138
ip=0x000044F4/sp=0x0000000118547340: <unknown location>
ip=0xE9E1FFD8/sp=0x09000000028852CC: <unknown location>
From the above stack trace, it is crystal clear that the hanging code is not application code. It's time to get help from IBM.
Using the above information, IBM support further identifies the incident is from AIX defects below:
IZ66709 : CORE DUMP AFTER REFERENCING THE ADDR AFTER GETADDRINFO
IZ68793 : GETGRENT_R CAUSES SIGSEGV WHEN INVOKED BY MULTIPLE THREADS
After we upgraded AIX, the problem is gone.

No comments:
Post a Comment