Oracle Explorer: Database hang: how to collect data for troubleshooting?

Here are some good Metalink Notes related to this hang issue:

Required Support Diagnostics for Hanging Databases [ID 452358.1]

When database hangs, following information will be required by Oracle Support:

A. 2 Hanganalyze and 2 Systemstate dumps.
B. Provide snapshots of database performance
C. Provide an up to date RDA.

A. How to collect 2 Hanganalyze and 2 Systemstate dumps.?
Using SQL*Plus connect as SYSDBA using the following command:

sqlplus " / as sysdba"

If there are problems making this connection then in 10gR2 and above, the sqlplus "preliminary connection" can be used :

sqlplus -prelim " / as sysdba"

Do this 2 times in 2 separate windows, creating 2 sqlplus sessions (SQL1 and SQL2)
In SQL1 gather the hanganalyze by executing the following:

     SQL1> oradebug setmypid  
     SQL1> oradebug unlimit;
     SQL1> oradebug hanganalyze 3

In SQL2 gather the systemstates by executing the following:

     SQL2> oradebug setmypid
     SQL2> oradebug unlimit;
     SQL2> oradebug dump systemstate 266

Gather a second hang analyze having waited at least 1 minute to give time to identify process state changes. In SQL1 execute the following:

     SQL1> oradebug hanganalyze 3

In SQL2 execute the following to collect a second systemstate dump:

     SQL2> oradebug setmypid
     SQL2> oradebug unlimit;
     SQL2> oradebug dump systemstate 266

If you are using systemstate level 266 and it is taking much longer than expected to generate the dump file, then end this systemstate dump and try level 258.

Diagnosing Database Hanging Issues [ID 61552.1]

Purpose

This bulletin describes the methods and tools used in diagnosing database hanging issues and performance issue).
These issues may be due to tuning problems, design problems or Oracle bugs such as latching issues.
We determine the cause by narrowing down the problem and getting as much information as possible WHILE THE DATABASE IS HANGING.

The following diagnostic steps are discussed.

1. Describe the Problems. Under all situations go through this sections and describe the problem.
2 Look for errors.
3. Do the requested queries.
4. Gather OS level data.
5. Get systemstate and hanganalyze dumps.
6. Get BSTAT-ESTAT OR STATPACK output.
7. Generate a core dump and run a debugger (contact support first).
8. If certain process get PROCESSSTATE dumps
9. Check List for diagnostics.

Note: It may not be necessary to shutdown the database to stop the hanging. But if you MUST shutdown before speaking to support, please attempt to get these diagnostics so that we can debug the problem.
Without these diagnostics it may be impossible for us to determine the cause of the problem.

HANGFG User Guide [ID 362094.1]

Introduction

HANGFG (Hang file generator) is a series of unix shell scripts used to automate the generation and collection of hanganalyze and systemstate trace files. HANGFG generates and collects hang trace files based on the impact of taking diagnostic traces on a system which is already in a degraded state. The overall decision on what level of impact the user can afford is left up to the user when he runs HANGFG, as the level of impact is passed in as an argument to the tool. HANGFG is also capable of making this decision for the user if the user selects light or medium impact (option 1 or 2) as an argument to the tool. HANGFG is RAC aware and can run in either a RAC or non RAC environment.

Overview

One of the major problems with obtaining the necessary diagnostic information to diagnose database hanging problems is having the necessary diagnostic data collected while the problem is actually occurring. Additionally, the necessary diagnostic data is seldom collected because of the time it takes to react to recognizing there is a problem, trying to determine what kind of data to collect, and knowing how to collect the data. Frequently, the problem has passed or the database has to be shutdown to correct the problem. HANGFG automates the generation and collection of hang diagnostic traces. The user need only run one diagnostic, HANGFG, during the database hang. The commands to issue the generation of diagnostic traces are performed by HANGFG, thus freeing the user from having to know and issue the individual cryptic commands to the database to generate these traces. HANGFG will also look to see how badly a system is degraded before issuing any commands that could degrade the system even further. Finally HANGFG will collect all the trace files that are produced from issuing these commands as well as any other oracle trace file that gets updated from the time the first hang diagnostic is performed. This greatly increases the probability that all the necessary hang related diagnostic traces are collected at one time thereby reducing the amount of pinging between the user and Oracle support for requests for additional diagnostic data requests.

Taking Systemstate Dumps when You cannot Connect to Oracle [ID 121779.1]
Interpreting HANGANALYZE trace files to diagnose hanging and performance problems [ID 215858.1]
Data Gathering for Troubleshooting RAC Issues [ID 556679.1]
CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide [ID 330358.1]
Script to Collect RAC Diagnostic Information (racdiag.sql) [ID 135714.1]
- This script is intended to provide a user friendly guide to troubleshoot RAC hung sessions or slow performance scenerios. The script includes information to gather a variety of important debug information to determine the cause of a RAC session hang. The script will create a file called racdiag_.out in your local directory while dumping hang analyze dumps in the user_dump_dest(s) and background_dump_dest(s) on all nodes. If you are using OpenVMS, see Note 316838.1 for a VMS specific script.
  
  This script is a one-time data capture. Procwatcher ( Note: 459694.1 ) is a more powerful tool written that could be used to monitor your system regularly (instead of a one-time data capture).
Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware Processes [ID 459694.1]
- Procwatcher is a tool to examine and monitor Oracle database and clusterware processes at an interval. The tool will collect stack traces of these processes using Oracle tools like oradebug short_stack and/or OS debuggers like pstack, gdb, dbx, or ladebug and collect SQL data if specified.
- This tool is for Oracle representatives and DBAs looking to troubleshoot a problem further by monitoring processes. This tool should be used in conjunction with other tools or troubleshooting methods depending on the situation.
- Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware Processes
  - # This script will find clusterware and/or Oracle Background processes and collect
  - # stack traces for debugging. It will write a file called procname_pid_date_hour.out
  - # for each process. If you are debugging clusterware then run this script as root.
  - # If you are only debugging Oracle background processes then you can run as root or oracle.

Labels: db hang, RAC

Oracle Explorer

Wednesday, February 2, 2011

Database hang: how to collect data for troubleshooting?

Purpose

Introduction

Overview

0 Comments:

Post a Comment

About Me

Previous Posts