/ July 2011 ~ Java EE Support Patterns

7.30.2011

Weblogic authentication denied – WLS 10

Problem description

The Weblogic admin server and / or managed server(s) are unable to start properly and throwing an authentication denied error message.

Weblogic errors observed

Error #1

<Jul 30, 2011 3:13:18 AM PST> <Critical> <WebLogicServer> <BEA-000386> <Server subsystem failed.

Reason: weblogic.security.SecurityInitializationException: Authentication denied: Boot identity not valid;
The user name and/or password from the boot identity file (boot.properties) is not valid.
The boot identity may havenbeen changed since the boot identity file was created. Please edit and update
the boot identity file with the proper values of username and password. The first time the updated boot identity file
is used to start the server, these new values are encrypted.

weblogic.security.SecurityInitializationException: Authentication denied: Boot identity not valid;
The user name and/or password from the boot identity file (boot.properties) is not valid.
The boot identity may have been changed since the boot identity file was created. Please edit and update the boot identity file
with the proper values of username and password. The first time the updated boot identity file is used to start the server,
these new values are encrypted.

at weblogic.security.service.CommonSecurityServiceManagerDelegateImpl.doBootAuthorization(Unknown Source)
at weblogic.security.service.CommonSecurityServiceManagerDelegateImpl.initialize(Unknown Source)
at weblogic.security.service.SecurityServiceManager.initialize(Unknown Source)
at weblogic.security.SecurityService.start(SecurityService.java:141)
at weblogic.t3.srvr.SubsystemRequest.run(SubsystemRequest.java:64)
Truncated. see log file for complete stacktrace

Error #2

<Jul 30, 2011 5:11:55 AM PST> <Critical> <Security> <BEA-090403> <Authentication for user <user> denied>
<Jul 30, 2011 5:11:55 AM PST> <Critical> <WebLogicServer> <BEA-000386> <Server subsystem failed.
Reason: weblogic.security.SecurityInitializationException: Authentication for user <user> denied
weblogic.security.SecurityInitializationException: Authentication for user <user> denied

at weblogic.security.service.CommonSecurityServiceManagerDelegateImpl.doBootAuthorization(Unknown Source)
at weblogic.security.service.CommonSecurityServiceManagerDelegateImpl.initialize(Unknown Source)
at weblogic.security.service.SecurityServiceManager.initialize(Unknown Source)
at weblogic.security.SecurityService.start(SecurityService.java:141)
at weblogic.t3.srvr.SubsystemRequest.run(SubsystemRequest.java:64)
Truncated. see log file for complete stacktrace

Possible root causes and solutions


Root cause #1

The Weblogic boot.properties file is corrupted or contains invalid principal and credentials

Solution >> boot.properties reset

·         Backup and clear the cache and data directories under <WL Domain>/servers/<Admin & Managed server>
·         Recreate boot.properties (put back your plain text username and password) under <WL Domain>/servers/<Admin & Managed server>/security directory and restart the affected server(s)

Root cause #2

The Weblogic boot.properties file is valid but the security realm is corrupted or in an invalid state

Solution >> Weblogic Admin username and password reset

·         Backup your Weblogic server domain
·         Rename or delete <WL Domain>/security/DefaultAuthenticatorInit.ldift
·         Run the following Java command:
        java weblogic.security.utils.AdminAccount <new-admin-user-name> <new-admin-user-pwd> <<WL Domain>/security >
·         Delete the contents inside the file boot.properties under <WL Domain>/servers/< AdminServer>/security
·         Add the following contents inside the boot.properties
        username=<new-admin-user-name>
        password=<new-admin-user-pwd>
·         Backup and delete the folder: <WL Domain>/servers/<AdminServer>/data/ldap
·          Restart your Weblogic server 

Free Thread Dump analyzer

This post is to inform you that I currently offer free Thread Dump analysis via email or real-time case study from my Blog. I currently have 10 years of experience with Java EE and Thread Dump analysis from different JVM vendors (Sun, IBM, JRockit etc.).

A JVM Thread Dump provides crucial information on your Java EE server (Weblogic, JBoss, WAS etc.) and application Thread processing and health. Thread dump analysis is one of the most important skills to acquire for any Java EE production support person.

Why is my consulting service free? My goal is to simply create a solid knowledge base of Thread Dump problem patterns on my Blog and share with Java EE production support individuals.

Thread Dump analysis – a simple 3 steps process

1)       Email me @phcharbonneau with your Thread Dump snapshot data (captured during your incident). Please also provide the specifications of your environment (Java EE server vendor & version, Java VM vendor and version etc.) and a high level description of your problem
2)       Wait for my initial reply. I will do my best to provide you with a quick answer in less than 24 hours along with possible root causes
3)       With your approval, I may create a Thread Dump problem pattern case study on my Blog to share your Java EE problem along with its resolution to my other readers with no detail on your application name, code or client information

Thank you and looking forward to hear from your problems!

Regards,
P-H

Free Thread Dump analysis

This post is to inform you that I currently offer free Thread Dump analysis via email or real-time case study from my Blog. I currently have 10 years of experience with Java EE and Thread Dump analysis from different JVM vendors (Sun, IBM, JRockit etc.).

A JVM Thread Dump provides crucial information on your Java EE server (Weblogic, JBoss, WAS etc.) and application Thread processing and health. Thread dump analysis is one of the most important skills to acquire for any Java EE production support person.

Why is my consulting service free? My goal is to simply create a solid knowledge base of Thread Dump problem patterns on my Blog and share with Java EE production support individuals.

Thread Dump analysis – a simple 3 steps process

1)       Email me @phcharbonneau with your Thread Dump snapshot data (captured during your incident). Please also provide the specifications of your environment (Java EE server vendor & version, Java VM vendor and version etc.) and a high level description of your problem
2)       Wait for my initial reply. I will do my best to provide you with a quick answer in less than 24 hours along with possible root causes
3)       With your approval, I may create a Thread Dump problem pattern case study on my Blog to share your Java EE problem along with its resolution to my other readers with no detail on your application name, code or client information

Thank you and looking forward to hear from your problems!

Regards,
P-H

7.26.2011

Weblogic PermGen space

This article will provide you some background on the Java HotSpot PermGen space and a 4 steps process on how you can monitor and tune your PermGen space for your Weblogic Java EE server.

Java HotSpot PermGen overview

The HotSpot PermGen (Permanent Generation) is a memory space specific to the Sun Java HotSpot VM and also applicable for any other Java EE server using the HotSpot VM such as Red Hat JBoss or IBM WAS. It is a special memory space for Class descriptors and other internal JVM objects.

Please refer to my other Blog article on the Java HotSpot VM memory space breakdown for a quick visual overview of all HotSpot memory spaces.

If you are facing any PermGen related problem such as OutOfMemoryError, please refer to my Blog article OutOfMemoryError PermGen patterns for description and troubleshooting of the most common Weblogic PermGen issues.

PermGen tuning step #1 – Identify your JVM vendor

First, please make sure that PermGen space is actually applicable for your Java EE environment. JVM’s such as IBM SDK and Oracle JRockit do not have an actual PermGen space. For those VM’s, the memory is allocated either to the Java Heap or the native C Heap.

** PermGen space is only applicable for the Sun Java HotSpot VM 1.4, 1.5 & 1.6. There are actually some requests for enhancements for JDK 1.7 that may actually eliminate the PermGen and use the native C memory instead  for the VM's representation of the Java program (Class descriptors / code).  **

PermGen tuning step #2 – Review your current PermGen space settings

The minimum and maximum PermGen space configuration is achieved via the following parameters:

## Example: PermGen space with a minimum size of 256 MB
-XX:PermSize=256m

## Example: PermGen space with a maximum size of 256 MB
-XX:MaxPermSize=512m

The default Weblogic PermGen space settings are normally found from the commEnv.sh / commEnv.cmd scripts and located under:

<WL11g HOME>/wlserver_10.3/common/bin/




When using Weblogic Nodemanager to start and stop your managed servers (used by most Weblogic production systems), PermGen settings can be updated via the Weblogic console under >>> Home / Managed server 1...n / Server Start / Arguments field:


PermGen tuning step #3 – Review your current PermGen runtime footprint

PermGen space capacity and footprint can be analyzed by turning on verbose GC and monitoring the Full GC collections on a daily basis. This will allow you to determine your current PermGen footprint and identify any potential issue such as PermGen memory leak.

Verbose GC for the HotSpot VM can be enabled by simply adding the parameters below to your start-up VM parameters:

## Example: Enable verbose GC and print the output to a custom log
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/apps/domains/app/gc/gc_s1.log


You can also use the free open source tool GC Viewer to graphically review your Java Heap memory footprint from the GC output data.

A bigger PermGen space requirement is to be expected for applications using a significant amount of code and class-loader trees e.g. many EAR files and / or applications using frameworks that depend heavily on dynamic class-loading / Refection API.

PermGen tuning step #4 – Tune your PermGen space, if required

Increasing your maximum capacity of the PermGen space is often required but you first need to ensure that you are not facing a memory leak such as class-loader leak. Please review your verbose GC output very closely and ensure that it is not growing constantly on a daily basis. You may need several days to confirm a PermGen memory leak since applications such Weblogic portal using dynamic JSP reload may increase the PermGen utilization over time (until all your JSP’s are compiled; loaded to their class-loader).

In order to increase your PermGen space to let’s say 512 MB, simply edit the Weblogic commEnv.sh or commEnv.cmd script and change the default from 128m to 512m or add / modify the MaxPermSize to 512m (-XX:MaxPermSize=512m) via the Server Start arguments (as per step #2 Weblogic console snapshot) if using Nodemanager.

** A Weblogic managed server(s) restart will be required in order to activate your new PermGen space settings. **

7.24.2011

ORA-01115 IO error reading solution

This case study describes the complete root cause analysis and resolution of an Oracle 10gR2 database problem (ORA-01115) affecting a Weblogic Integration 9.2 Java EE application.

Environment specifications

-          Java EE server: Oracle Weblogic Integration 9.2 MP2
-          Database server: Oracle 10gR2
-          Middleware OS: Sun Solaris 5.10
-          Database server OS: IBM AIX 5.3 TL5
-          JDK: Sun Java HotSpot(TM) Server VM Version 1.5.0_10-b03
-          Platform type: BPM

Problem overview

-      Problem type: java.io.IOException: [BEA][Oracle JDBC Driver][Oracle]ORA-01115: IO error reading block from file 30 (block # 255462)

An IO Exception error was observed from the Weblogic Integration server logs. This Exception was thrown from the remote Oracle database when trying to insert data within WLI internal tables.

Gathering and validation of facts

As usual, a Java EE problem investigation requires gathering of technical and non technical facts so we can either derived other facts and/or conclude on the root cause. Before applying a corrective measure, the facts below were verified in order to conclude on the root cause:

·       What is the client impact? HIGH (full outage of our application)
·       Recent change of the affected platform? No
·       Any recent traffic increase to the affected platform? No
·       Since how long this problem has been observed?  This error suddenly appeared for the first time early in the morning
·       What is the health of the Oracle database server? Our Oracle DBA did confirm a problem regarding access to one of our Oracle data file (/oracle/db/oracle_data01.dbf')
·       Did a restart of the Weblogic Integration server resolve the problem? No

·         Conclusion #1: The problem seems to be isolated on the Oracle database server side and related to a possible IO problem accessing one of the Oracle internal data files.

Weblogic log file analysis: ORA-01115!

A first analysis of the problem was done by reviewing the Weblogic Integration managed server log errors.

java.io.IOException: [BEA][Oracle JDBC Driver][Oracle]ORA-01115: IO error reading block from file 30 (block # 255462)
ORA-01110: data file 30: '/oracle/db/oracle_data01.dbf'
ORA-27091: unable to queue I/O
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 110: Media surface error
Additional information: 7
Additional information: 255461
Additional information: -1

        at weblogic.jdbc.base.BaseBlobOutputStream.write(Unknown Source)
        at weblogic.jdbc.base.BaseBlobOutputStream.write(Unknown Source)
        at com.bea.wli.knex.runtime.core.bean.BMPContainerBean$OracleTableAccess.doStoreByInsert(BMPContainerBean.java:860)
        at com.bea.wli.knex.runtime.core.bean.BMPContainerBean.doInsert(BMPContainerBean.java:1835)
        at com.bea.wli.knex.runtime.core.bean.BMPContainerBean.ejbStore(BMPContainerBean.java:1787)
        at com.bea.wli.bpm.runtime.ProcessContainerBean.ejbStore(ProcessContainerBean.java:204)
        at com.bea.wli.bpm.runtime.ProcessContainerBean.ejbStore(ProcessContainerBean.java:191)
        at com.bea.wlwgen.PersistentContainer_sq78z1_Impl.ejbStore(PersistentContainer_sq78z1_Impl.java:191)
        at weblogic.ejb.container.manager.ExclusiveEntityManager.beforeCompletion(ExclusiveEntityManager.java:609)
        at weblogic.ejb.container.manager.ExclusiveEntityManager.beforeCompletion(ExclusiveEntityManager.java:580)
        at weblogic.ejb.container.internal.TxManager$TxListener.beforeCompletion(TxManager.java:1028)
        at weblogic.transaction.internal.ServerSCInfo$CallBeforeCompletionsAction.run(ServerSCInfo.java:1529)
        at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:363)
        at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:147)
        at weblogic.transaction.internal.ServerSCInfo.doBeforeCompletion(ServerSCInfo.java:1161)
        at weblogic.transaction.internal.ServerSCInfo.callBeforeCompletions(ServerSCInfo.java:1143)
        at weblogic.transaction.internal.ServerSCInfo.startPrePrepareAndChain(ServerSCInfo.java:115)
        at weblogic.transaction.internal.ServerTransactionImpl.localPrePrepareAndChain(ServerTransactionImpl.java:1288)
        at weblogic.transaction.internal.ServerTransactionImpl.globalPrePrepare(ServerTransactionImpl.java:2100)
        at weblogic.transaction.internal.ServerTransactionImpl.internalCommit(ServerTransactionImpl.java:259)
        at weblogic.transaction.internal.ServerTransactionImpl.commit(ServerTransactionImpl.java:227)
        at weblogic.ejb.container.internal.MDListener.execute(MDListener.java:463)
        at weblogic.ejb.container.internal.MDListener.transactionalOnMessage(MDListener.java:335)
        at weblogic.ejb.container.internal.MDListener.onMessage(MDListener.java:291)
        at weblogic.jms.client.JMSSession.onMessage(JMSSession.java:4072)
        at weblogic.jms.client.JMSSession.execute(JMSSession.java:3962)
        at weblogic.jms.client.JMSSession$UseForRunnable.run(JMSSession.java:4490)
        at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
        at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
        at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)

As you can see, the ORA-01115 error was thrown (along with ORA-01110 ORA-27091 ORA-27072) during the insert operation of WLI data. An AIX related error was also thrown (Media surface error).

-          Conclusion #2: The problem seems to be related to a physical inaccessibility (OS and / or disk) of this critical Oracle data file. A logical file corruption problem was also kept in mind as a possible root cause.

OS problem replication: The media surface is damaged!

After the initial log review and problem isolation, an AIX system admin was engaged to further troubleshoot this problem.  A simple copy of the affected Oracle data file was attempted:

:> cp /oracle/db/oracle_data01.dbf .
cp: /oracle/db/oracle_data01.dbf: The media surface is damaged.


This simple problem replication step did confirm that the AIX OS also had lost control of the file.
The following AIX command was also executed in order to review the AIX OS error report:

errpt –a
…………………………………………………………………………………..
Description
DISK OPERATION ERROR

Probable Causes
MEDIA

User Causes
MEDIA DEFECTIVE

        Recommended Actions
        FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
        PERFORM PROBLEM DETERMINATION PROCEDURES

Failure Causes
MEDIA
DISK DRIVE

        Recommended Actions
        FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
        PERFORM PROBLEM DETERMINATION PROCEDURES
…………………………………………………………………………………..

-          Conclusion #3: At this point, it was determined that the problem was very likely related to a physical problem with the SAN / RAID cluster. A technician was dispatched to perform further analysis of our RAID (Redundant Array of Independent Disks) environment.

Root cause: RAID disk failure!

Health check of the RAID environment did confirm that one of the 7 available disks failed earlier in the morning; just a few minutes before the first occurrence of the ORA-01115 error in the Weblogic logs.

Unfortunately, the disk failure triggered corruption of the RAID parity; affecting one of our Oracle data file. New files could be written to the RAID (6 disks left) but some files were not available due to parity corruption.

A disk failure has negative impact on parity. A disk can fail for various reasons such as the disk may be overused and due to excess variation in the power supply, it may result in the failure of the disk. Failure to replace it immediately can cause adverse affects on the system performance and may even damage the disks which are used in backup. The RAID parity provides protection against drive failures but if it encounters a parity error, it will start writing a bad data on the drive.

In our case, the technician was able to resolve the parity corruption without having to restore any data from the backup tapes. Resolution of the parity automatically allowed Oracle to gain back control of the affected date file; without any Oracle listener or database restart.

A complete Weblogic restart was performed as the final resolution step in order to re-initialize the affected WLI JDBC Connection pools.

Conclusion and recommendations

-          When facing ORA-01115 with Oracle, please ensure to do a fast and complete root cause analysis
-          Avoid taking unnecessary and non efficient resolution steps such as an early Weblogic restart or Oracle database restart