/ Connection pool leak using Weblogic 9.0 ~ Java EE Support Patterns

2.01.2011

Connection pool leak using Weblogic 9.0

This case study describes the complete steps from root cause analysis to resolution of a Weblogic 9.0 production JDBC Pool leak problem.  A summary of the general guidelines to prevent and resolve these types of problem are also provided at the end of this post.

Environment specifications

·         Java EE server: Weblogic Server 9.0
·         Hardware: Sun Fire E25K
·          OS: Solaris 9
·         JDK: Java 1.5.0_03, Sun Microsystems Inc.
·         JVM settings: -XX:MaxPermSize=256m -Xms256m -Xmx512m
·         RDBMS: Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production
·         Platform type: Middle tier  - Web Service provider & batch processing


Monitoring and troubleshooting tools

·         JVM Thread Dump (Sun JDK 1.5 format)
·          Weblogic – Connection Profile Leak (runtime debugging flag)
·          Remote JMX client (to trigger JDBC profile data logging)
·         Eclipse 3.3 IDE (code analysis)
·         AWR report (Oracle10g reporting tool)

Problem overview

-          Problem type: <BEA-000627> <Reached maximum capacity of pool "<Pool name>"

Maximum number of JDBC connections reached from the primary application JDBC Pool causing major timeout and slowdown to the upstream applications (client applications invoking the Weblogic Web Services).

<BEA-000627> <Reached maximum capacity of pool "<Pool name>", making "-48" new resource instances instead of "1".>

Gathering and validation of facts

As usual, a Java EE problem investigation requires gathering of technical and non technical facts so we can either derived other facts and/or conclude on the root cause. Before applying a corrective measure, the facts below were verified in order to conclude on the root cause:

·         Recent change of the platform? No, the problem had been present since more than 1 year; workaround did involve restarting all Weblogic servers on a daily basis.
·          <Reached maximum capacity of pool description? This indicates the Weblogic JDBC Pool is reaching its maximum capacity, currently setup at 15. This was also confirmed by looking at the Weblogic console; JDBC Data Source monitoring tab.
·         Any upstream and/or downstream system problem reported? No problem reported to any downstream application. Also, no problem or slowdown was reported of the database used by this JDBC Pool. Oracle10g AWR report did not show any problem with long running queries etc.
·         Is JDBC Pool usage reaching maximum value very quickly or slowly? It was observed that the JDBC Pool usage (Action Connection Current Count) was increasing slowly and on a regular basis e.g. ~ 1-2 connections every hour.


Thread Dump analysis

Several Thread Dump snapshots were captured before restarting the Weblogic server and were analysed. Additional findings and facts were derived from thread dump analysis:

·         Total Weblogic created Thread count: 137
·         Total Weblogic active Thread count: 127 (~93% of current capacity)
·         Weblogic idle Thread count was: 10
·         127 threads waiting to acquire a JDBC Connection from the primary application JDBC Pool


"[ACTIVE] ExecuteThread: '40' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x02da09a8 nid=0x60 in Object.wait() [0xbaa7d000..0xbaa7fb10]
                at java.lang.Object.wait(Native Method)
                - waiting on <0xda1674b0> (a weblogic.jdbc.common.internal.ConnectionPool)
                at weblogic.common.resourcepool.ResourcePoolImpl.reserveResourceInternal(ResourcePoolImpl.java:446)
                - locked <0xda1674b0> (a weblogic.jdbc.common.internal.ConnectionPool)
                at weblogic.common.resourcepool.ResourcePoolImpl.reserveResource(ResourcePoolImpl.java:295)
                at weblogic.jdbc.common.internal.ConnectionPool.reserve(ConnectionPool.java:450)
                at weblogic.jdbc.common.internal.ConnectionPool.reserve(ConnectionPool.java:341)
                at weblogic.jdbc.common.internal.ConnectionPoolManager.reserve(ConnectionPoolManager.java:83)
                at weblogic.jdbc.jta.DataSource.getXAConnectionFromPool(DataSource.java:1429)
                at weblogic.jdbc.jta.DataSource.refreshXAConnAndEnlist(DataSource.java:1249)
                at weblogic.jdbc.jta.DataSource.getConnection(DataSource.java:430)
                at weblogic.jdbc.jta.DataSource.connect(DataSource.java:386)
                at weblogic.jdbc.common.internal.RmiDataSource.getConnection(RmiDataSource.java:351)
………………………………………………………………………………………………………………………

Thread Dump analysis, facts and conclusions

Thread dump analysis did confirm the additional facts below:

·         93% of the Weblogic Threads were waiting to acquire a JDBC Connection
·         Not a single Thread involved in a database transaction was found; quite surprising as normally we would expect a 1on1 relationship : 1 active thread : 1 active JDBC Connection

·         Conclusion #1: Reported HTTP timeouts from the upstream applications are caused by the Web Service Weblogic Threads waiting / hanging for too long to acquire a JDBC Connection from the primary application JDBC Pool.
·         Conclusion #2: The fact that no active Thread involved in a database transaction was found from Thread Dump analysis combine with a full JDBC Pool depletion does clearly suggest a JDBC Pool leak at the root of the problem.

JDBC Pool leak analysis


A JDBC Pool leak problem can be quite tricky to pinpoint. The good news is that Weblogic provides a JDBC Profile Connection Leak functionality, which is basically a mechanism that records a Thread Stack Trace snapshot at the JDBC Connection reservation time. This mechanism was used successfully in our situation to pinpoint the source of the JDBC Pool leak.

Step #1: Enable JDBC Profile Connection Leak

1)       Navigate from WL console to Domain > Services > JDBC > Data Sources > DS Name > Connection Pool > Advanced
2)       Check the following option as per below screenshot



Step #2: Monitor and gather data during problem replication


In order to view the Weblogic JDBC profile data, we used our enterprise home made JMX client to trigger a remote dump of such runtime data within the managed server log. This was achieved via the MBean operation dumpPool of the JDBCDataSourceRuntimeMBean.

** Please note that WL Diagnostic Framework (WLDF) can also be used for that purpose **

// Initialize JDBCServiceRuntime MBean query
CanonicalObjectName canonicalObjName = CanonicalObjectNameObjectCacheFactory.getInstance().
getCanonicalObjectName("com.bea:ServerRuntime="+instanceName+",Name="+instanceName+",Type=JDBCServiceRuntime", null);

// obtain list of JDBC Data Source MBeans
Object []test = (Object[])adapter.getJMXService().getJMXConnection().getMBeanAttribute(canonicalObjName, "JDBCDataSourceRuntimeMBeans");

// finally, filter to proper JDBC Pool name and invoke the JDBC Profile Data dump method remotely
adapter.getJMXService().getJMXConnection().invokeMBeanMethod(canonicalObjNameTmp, "dumpPool", null, null);

** You can consult the Oracle online MBean reference guide for more detail on how to access and invoke remote WL MBeans
**
You can then expect the following data generation from the Weblogic managed server log. Such Thread Stack Trace snapshot data provides you with crucial information on what application and/or third party code to look at for potential JDBC coding issue:

<Dumping Resource Pool:<Pool Name>>
<Resource Pool:<Pool name>:dumpPool currCapacity = 15>
<Resource Pool: :<Pool name>r:dumpPool dumping reserved resources, #entries = 15>

                at weblogic.jdbc.common.internal.ConnectionEnv.setup(ConnectionEnv.java:246)
                at weblogic.common.resourcepool.ResourcePoolImpl.reserveResource(ResourcePoolImpl.java:307)
                at weblogic.common.resourcepool.ResourcePoolImpl.reserveResource(ResourcePoolImpl.java:285)
                at weblogic.jdbc.common.internal.ConnectionPool.reserve(ConnectionPool.java:448)
                at weblogic.jdbc.common.internal.ConnectionPool.reserve(ConnectionPool.java:341)
                at weblogic.jdbc.common.internal.ConnectionPoolManager.reserve(ConnectionPoolManager.java:83)
                at weblogic.jdbc.jta.DataSource.getXAConnectionFromPool(DataSource.java:1429)
                at weblogic.jdbc.jta.DataSource.refreshXAConnAndEnlist(DataSource.java:1249)
                at weblogic.jdbc.wrapper.JTAConnection.getXAConn(JTAConnection.java:191)
                at weblogic.jdbc.wrapper.JTAConnection.checkConnection(JTAConnection.java:68)
                at weblogic.jdbc.wrapper.Statement.checkStatement(Statement.java:250)
                at weblogic.jdbc.wrapper.Statement.doClose(Statement.java:342)
                at weblogic.jdbc.wrapper.Statement.close(Statement.java:329)
                at <application code>.close()



Analysis of the Thread Stack Trace did reveal:

·         The JDBC Pool leak did originate from our application code
·         A closer look at the Thread Stack Trace did reveal that Weblogic itself did trigger a new XA JDBC Connection creation during the closure process of the original JDBC Connection. It did suggest that no active XA Connection was present at the time of our application code trying to close the JDBC Statement object.

Root Cause

Review of the affected application code (exposed as per the above JDBC Profile Data) did reveal:

·         Our application code was attempting to close the JDBC related objects twice (ResultSet, Statement, Connection) in some scenarios
·         The actual attempt of our application code to close a second time the associated Statement was triggering an actual leaked JDBC Connection. This is explained by the fact that Weblogic requires an active XA Connection (associated to the active JTA transaction) in order to explicitly close any ResultSet/Statement. Since the XA Connection object was already closed from the first application code call, Weblogic was forced to create a new XA Connection in order to close the Statement object, triggering the actual JDBC Connection leak

Solution

2 areas were looked at during the code fix exercise:

·         Removal of any duplicate ResultSet, Statement and Connection object closure
·         JDBC resource closure was moved and centralized to a finally{} block at the end of the Java method and try/catch block

finally {
       if (rs != null)   try {rs.close();}catch(Exception ignore){}
       if (stmt != null) try {stmt.close();}catch(Exception ignore){}
       if (conn != null) try {conn.close();}catch(Exception ignore){}
}

The code fix was tested and deployed to our production environment one week ago. Since then, the JDBC Pool monitoring is no longer showing any increase of connection usage during the day. The JDBC Pool leak is now resolved.

JDBC Pool leak general guidelines and best practices

The following are high level best practices to prevent and address JDBC Pool leaks:

·         Perform regular code walkthrough of any new code with an increased focus for any new code dealing with JDBC and ensure proper resources closure as per above code snippet (preventive measures).
·         Perform regular monitoring of your production Weblogic JDBC Pool utilization, ensure no JDBC Pool leak is present (proactive monitoring). Same exercise should be done in your load testing environment before deploying a major release.
·          If any JDBC Pool leak is found, take advantage of the WL Profile Connection Leak mechanism to pinpoint the root cause as per the above explanations

0 comments:

Post a Comment