/ PRSTAT Solaris – Pinpoint high CPU Java VM Threads fast ~ Java EE Support Patterns

12.12.2011

PRSTAT Solaris – Pinpoint high CPU Java VM Threads fast

The Solaris OS prstat command is widely used but are you really using it to its full potential?

Are you struggling with a high CPU utilization of your middleware Java VM processes but unable to understand what is going on?

How about extracting the proof and facts about the root cause for your clients and / or application development team?

If you are a Java EE production middleware or application support individual then this article is mainly for you and will demonstrate why you should add this command immediately to your list of OS command “friend” list and learn this analysis approach.

This article will provide with a step by step tutorial along with an example to achieve those goals.

For an AIX OS prstat command equivalent and strategy, please visit the other article below:

Step #1 – Generate a PRSTAT snapshot of your affected Java process

Execute the prstat command on the Java process after extracting the Java PID.
You can also run it several iterations in order to identify a pattern.

prstat -L -p <PID> 1 1

Example: prstat -L -p 9116 1 1

PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/LWPID
  9116 bea      3109M 2592M sleep   59    0  21:52:59 8.6% java/76
  9116 bea      3109M 2592M sleep   57    0   4:28:05 0.3% java/40
  9116 bea      3109M 2592M sleep   59    0   6:52:02 0.3% java/10774
  9116 bea      3109M 2592M sleep   59    0   6:50:00 0.3% java/84
  9116 bea      3109M 2592M sleep   58    0   4:27:20 0.2% java/43
  9116 bea      3109M 2592M sleep   59    0   7:39:42 0.2% java/41287
  9116 bea      3109M 2592M sleep   59    0   3:41:44 0.2% java/30503
  9116 bea      3109M 2592M sleep   59    0   5:48:32 0.2% java/36116
……………………………………………………………………………………

-         PID corresponds to your Solaris Java process ID
-         CPU corresponds to the CPU utilization % of a particular Java Thread
-         PROCESS / LWPID corresponds to Light Weight Process ID e.g. your native Java Thread ID

In our example, prstat did identify Thread #76 as our top CPU contributor with 8.6% utilization.

** Please note that If you use regular thread libraries (i.e., you do not have /usr/lib/lwp in the LD_LIBRARY_PATH) in Solaris, a LWP is not mapped to OS thread directly. In that situation, you will have to also take a pstack snapshot (in addition to prstat) from the process.

Step #2 – Generate a Thread Dump snapshot of your affected Java process

Great now you have one or a list of high CPU Thread contributors from your Java process but what should you do next?

PRSTAT data of your Java VM process is nothing without a Thread Dump data snapshot.

It is very important that you generate a Thread Dump via kill -3 <Java PID> at the same time or immediately after the prstat execution.

The Thread Dump will allow you to correlate the native Thread Id with the Java Thread Stack Trace and allow you to understand what high CPU consumption this Thread is currently involved with (infinite or heavy looping, GC Thread, high IO / disk access etc.).

Example: kill -3 9116

The command will generate a JVM Thread Dump from your Java process or middleware standard output log and typically starting with Full thread dump Java HotSpot(TM)… (HotSpot VM format).

Step #3 – Thread Dump and PRSTAT data correlation

It is now time to correlate the generated prstat and Thread Dump data.

First, you have to convert the PRSTAT Thread Id decimal format to HEXA format.

In our example, Thread ID #76 (decimal format) corresponds to 0X4C (HEXA format)

The next step is to simply search within your Thread Dump data and search for a Thread with native Id corresponding to the PRSTAT Thread Id (HEXA f format).

nid=<prstat Thread ID HEXA format>

In our example, such Thread was found as per below (complete case study is also available from this Blog):

"pool-2-thread-1" prio=10 tid=0x01952650 nid=0x4c in Object.wait() [0x537fe000..0x537ff8f0]
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:474)
        at weblogic.common.CompletionRequest.getResult(CompletionRequest.java:109)
        - locked <0xf06ca710> (a weblogic.common.CompletionRequest)
        at weblogic.diagnostics.archive.wlstore.PersistentStoreDataArchive.readRecord(PersistentStoreDataArchive.java:657)
        at weblogic.diagnostics.archive.wlstore.PersistentStoreDataArchive.readRecord(PersistentStoreDataArchive.java:630)
        at weblogic.diagnostics.archive.wlstore.PersistentRecordIterator.fill(PersistentRecordIterator.java:87)
        at weblogic.diagnostics.archive.RecordIterator.fetchMore(RecordIterator.java:157)
        at weblogic.diagnostics.archive.RecordIterator.hasNext(RecordIterator.java:130)
        at com.bea.wli.monitoring.alert.AlertLoggerImpl.getAlerts(AlertLoggerImpl.java:157)
        at com.bea.wli.monitoring.alert.AlertLoggerImpl.updateAlertEvalTime(AlertLoggerImpl.java:140)
        at com.bea.wli.monitoring.alert.AlertLog.updateAlertEvalTime(AlertLog.java:248)
        at com.bea.wli.monitoring.alert.AlertManager._evaluateSingleRule(AlertManager.java:992)
        at com.bea.wli.monitoring.alert.AlertManager.intervalCompleted(AlertManager.java:809)
        at com.bea.wli.monitoring.alert.AlertEvalTask.execute(AlertEvalTask.java:65)
        at com.bea.wli.monitoring.alert.AlertEvalTask.run(AlertEvalTask.java:37)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
        at java.util.concurrent.FutureTask.run(FutureTask.java:123)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
        at java.lang.Thread.run(Thread.java:595)

Step #4 – Analyze the identified Thread Stack trace for root cause

At this point you have down 50% of the work done and should have one or a list of Threads contributing to your high CPU problem.

The final step is for you to analyze the Thread Stack Trace (from bottom - up) and determine what type of processing / computing culprit Thread(s) are involved in.

Find below common high CPU Thread scenarios:

-         Heavy or excessive garbage collection (Threads identified are the actual GC Threads of the HotSpot VM)
-         Heavy or infinite looping (application or middleware code problem, corrupted & looping non Thread safe HashMap etc.)
-         Excessive IO / disk activity (Excessive Class loading or JAR file searches)

Depending how your organization and client is structured; you can also provide this data to your middleware vendor directly (Oracle support, IBM support, Red Hat etc.) or development team for further analysis.

Final words

I hope this article has helped you understand how you can combine the powerful Solaris prstat and Thread Dump analysis to allow you to pinpoint high CPU Java Threads problem.

Please feel free to post any comment or question if you need any help.

3 comments:

Post a Comment