Wednesday, February 2, 2011

Database hang: how to collect data for troubleshooting?

Here are some good Metalink Notes related to this hang issue:


A. How to collect 2 Hanganalyze and 2 Systemstate dumps.?
Using SQL*Plus connect as SYSDBA using the following command:
sqlplus " / as sysdba"
If there are problems making this connection then in 10gR2 and above, the sqlplus "preliminary connection" can be used :
sqlplus -prelim " / as sysdba"



Do this 2 times in 2 separate windows, creating 2 sqlplus sessions (SQL1 and SQL2)
In SQL1 gather the hanganalyze by executing the following:

SQL1> oradebug setmypid SQL1> oradebug unlimit; SQL1> oradebug hanganalyze 3
In SQL2 gather the systemstates by executing the following:

SQL2> oradebug setmypid SQL2> oradebug unlimit; SQL2> oradebug dump systemstate 266
Gather a second hang analyze having waited at least 1 minute to give time to identify process state changes. In SQL1 execute the following:

SQL1> oradebug hanganalyze 3


In SQL2 execute the following to collect a second systemstate dump:

SQL2> oradebug setmypid SQL2> oradebug unlimit; SQL2> oradebug dump systemstate 266
If you are using systemstate level 266 and it is taking much longer than expected to generate the dump file, then end this systemstate dump and try level 258.


Purpose

This bulletin describes the methods and tools used in diagnosing database hanging issues and performance issue).
These issues may be due to tuning problems, design problems or Oracle bugs such as latching issues.
We determine the cause by narrowing down the problem and getting as much information as possible WHILE THE DATABASE IS HANGING.

The following diagnostic steps are discussed.

1. Describe the Problems. Under all situations go through this sections and describe the problem.
2 Look for errors.
3. Do the requested queries.
4. Gather OS level data.
5. Get systemstate and hanganalyze dumps.
6. Get BSTAT-ESTAT OR STATPACK output.
7. Generate a core dump and run a debugger (contact support first).
8. If certain process get PROCESSSTATE dumps
9. Check List for diagnostics.

Note: It may not be necessary to shutdown the database to stop the hanging. But if you MUST shutdown before speaking to support, please attempt to get these diagnostics so that we can debug the problem.
Without these diagnostics it may be impossible for us to determine the cause of the problem.




Introduction

HANGFG (Hang file generator) is a series of unix shell scripts used to automate the generation and collection of hanganalyze and  systemstate trace files. HANGFG generates and collects hang trace files based on the impact of taking diagnostic traces on a system which is already in a degraded state. The overall decision on what level of impact the user can afford is left up to the user when he runs HANGFG, as the level of impact is passed in as an argument to the tool. HANGFG is also capable of making this decision for the user if the user selects light or medium impact (option 1 or 2) as an argument to the tool.  HANGFG is RAC aware and can run in either a RAC or non RAC environment.

Overview

One of the major problems with obtaining the necessary diagnostic information to diagnose database hanging problems is having the necessary diagnostic data collected while the problem is actually occurring. Additionally, the necessary diagnostic data is seldom collected because of the time it takes to react to recognizing there is a problem, trying to determine what kind of data to collect, and knowing how to collect the data. Frequently, the problem has passed or the database has to be shutdown to correct the problem. HANGFG automates the generation and collection of hang diagnostic traces. The user need only run one diagnostic, HANGFG, during the database hang. The commands to issue the generation of diagnostic traces are performed by HANGFG, thus freeing the user from having to know and issue the individual cryptic commands to the database to generate these traces. HANGFG will also look to see how badly a system is degraded before issuing any commands that could degrade the system even further. Finally HANGFG will collect all the trace files that are produced from issuing these commands as well as any other oracle trace file that gets updated from the time the first hang diagnostic is performed. This greatly increases the probability that all the necessary hang related diagnostic traces are collected at one time thereby reducing the amount of pinging between the user and Oracle support for requests for additional diagnostic data requests.


Labels: ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home