RAC database hang issue
One of two-nodes Oracle 10.2.04 RAC database hanged and restarted around 1:20 AM this morning. According to the trace files, alert log files and other log files, here are some error messages reported in those files:
- Received ORADEBUG command 'dump errorstack 1' from process Unix process pid: 16209, image: *** 2011-02-02 10:21:13.667 ksedmp: internal or fatal error
- Tue Feb 1 16:52:47 2011 IPC Send timeout detected. Receiver ospid 10555 MMNL absent for 2001 secs; Foregrounds taking over MMNL absent for 2001 secs; Foregrounds taking over Tue Feb 1 17:40:21 2011 Errors in file /oracle/admin/imsr/bdump/imsr1_lmd0_10555.trc:
- Tue Feb 1 23:15:27 2011
Trace dumping is performing id=[cdmp_20110201162127]
Wed Feb 2 01:10:51 2011
Errors in file /oracle/admin/imsr/bdump/imsr1_pmon_10547.trc:
ORA-00482: LMD* process terminated with error
Wed Feb 2 01:21:46 2011
PMON: terminating instance due to error 482
Wed Feb 2 01:21:46 2011
System state dump is made for local instance
System State dumped to trace file /oracle/admin/imsr/bdump/imsr1_diag_10549.trc
Wed Feb 2 01:21:51 2011
Instance terminated by PMON, pid = 10547
Wed Feb 2 01:21:52 2011
Instance terminated by USER, pid = 16125
Wed Feb 2 01:21:54 2011
Starting ORACLE instance (normal)
- Wed Feb 2 09:52:33 2011
Thread 1 advanced to log sequence 12344 (LGWR switch)
Current log# 2 seq# 12344 mem# 0: /oradata/imsr/redo02.log
Thread 1 cannot allocate new log, sequence 12345
Checkpoint not complete
Current log# 2 seq# 12344 mem# 0: /oradata/imsr/redo02.log
Wed Feb 2 09:52:42 2011
Thread 1 advanced to log sequence 12345 (LGWR switch)
Current log# 1 seq# 12345 mem# 0: /oradata/imsr/redo01.log
Wed Feb 2 10:08:26 2011
IPC Send timeout detected. Receiver ospid 16215
Wed Feb 2 10:21:13 2011
Errors in file /oracle/admin/imsr/bdump/imsr1_lmd0_16215.trc:
Wed Feb 2 10:27:38 2011
Trace dumping is performing id=[cdmp_20110202094318]
- Waiting for instances to leave:
- IPC Send timeout detected.Sender: ospid 11471
Receiver: inst 1 binc 1824903189 ospid 1621
- Wed Feb 2 10:16:22 2011
MMNL absent for 1807 secs; Foregrounds taking over
Wed Feb 2 10:16:33 2011
Waiting for instances to leave:
Here are some Metalink notes and articles I found related to the above error messages:
- Queue Monitor Process: Architecture and Known Issues [ID 305662.1]
- CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide [ID 330358.1]
- Script to Collect RAC Diagnostic Information (racdiag.sql) [ID 135714.1]
- 'IPC Send Timeout Detected' errors between QMON Processes after RAC reconfiguration [ID 458912.1]
- Receive Messages MMNL Absent for 4159 Secs; Foregrounds Taking Over in Alert.log [ID 567562.1]
- "MMNL absent for %u secs; Foregrounds taking over" Messages in Alert.log [ID 465891.1]
- Message In Alert Log: Mmnl Absent For XXXX Secs [ID 462402.1]
- DATABASE CRASH WITH SGA_TARGET [ID 747812.1]
- RAC环境出现Waiting for instances to leave信息(二)
Based on the above links, it's likely that followings are the causes of this database hang problem:
1 MAXBYTES is smaller than BYTES
set lines 300
col file_name format a50
select file_name, tablespace_name, bytes/1024/1024, maxbytes/1024/1024 from dba_data_files;
col file_name format a50
select file_name, tablespace_name, bytes/1024/1024, maxbytes/1024/1024 from dba_data_files;
2 Hit Oracle bugs (very likely)
3 Automatic SGA setting caused crash
To be continued.....................
0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home