Wednesday, February 2, 2011

RAC database hang issue

One of two-nodes Oracle 10.2.04 RAC database hanged and restarted around 1:20 AM this morning.  According to the trace files, alert log files and other log files, here are some error messages reported in those files:

  • Received ORADEBUG command 'dump errorstack 1' from process Unix process pid: 16209, image: *** 2011-02-02 10:21:13.667
  • ksedmp: internal or fatal error
    In alert_node1.log file:
       
    • Tue Feb  1 16:52:47 2011
    • IPC Send timeout detected. Receiver ospid 10555 MMNL absent for 2001 secs; Foregrounds taking over MMNL absent for 2001 secs; Foregrounds taking over Tue Feb 1 17:40:21 2011 Errors in file /oracle/admin/imsr/bdump/imsr1_lmd0_10555.trc:
    • Tue Feb 1 23:15:27 2011

      Trace dumping is performing id=[cdmp_20110201162127]

      Wed Feb 2 01:10:51 2011

      Errors in file /oracle/admin/imsr/bdump/imsr1_pmon_10547.trc:

      ORA-00482: LMD* process terminated with error

      Wed Feb 2 01:21:46 2011

      PMON: terminating instance due to error 482

      Wed Feb 2 01:21:46 2011

      System state dump is made for local instance

      System State dumped to trace file /oracle/admin/imsr/bdump/imsr1_diag_10549.trc

    • Wed Feb 2 01:21:51 2011

      Instance terminated by PMON, pid = 10547

      Wed Feb 2 01:21:52 2011

      Instance terminated by USER, pid = 16125

      Wed Feb 2 01:21:54 2011

      Starting ORACLE instance (normal)
    • Wed Feb 2 09:52:33 2011

      Thread 1 advanced to log sequence 12344 (LGWR switch)

      Current log# 2 seq# 12344 mem# 0: /oradata/imsr/redo02.log

      Thread 1 cannot allocate new log, sequence 12345

      Checkpoint not complete

      Current log# 2 seq# 12344 mem# 0: /oradata/imsr/redo02.log

      Wed Feb 2 09:52:42 2011

      Thread 1 advanced to log sequence 12345 (LGWR switch)

      Current log# 1 seq# 12345 mem# 0: /oradata/imsr/redo01.log

      Wed Feb 2 10:08:26 2011

      IPC Send timeout detected. Receiver ospid 16215

      Wed Feb 2 10:21:13 2011

      Errors in file /oracle/admin/imsr/bdump/imsr1_lmd0_16215.trc:

      Wed Feb 2 10:27:38 2011

      Trace dumping is performing id=[cdmp_20110202094318]
    In alert_node2.log file:

    • Waiting for instances to leave: 
    • IPC Send timeout detected.Sender: ospid 11471

      Receiver: inst 1 binc 1824903189 ospid 1621
    • Wed Feb 2 10:16:22 2011


      MMNL absent for 1807 secs; Foregrounds taking over


      Wed Feb 2 10:16:33 2011


      Waiting for instances to leave:


    Here are some Metalink notes and articles I found related to the above error messages:

    Based on the above links, it's likely that followings are the causes of this database hang problem:
    1 MAXBYTES is smaller than BYTES
    set lines 300
    col file_name format a50
    select file_name, tablespace_name, bytes/1024/1024, maxbytes/1024/1024 from dba_data_files;

    2 Hit Oracle bugs (very likely)

    3 Automatic SGA setting caused crash

    To be continued.....................


    Labels: ,

    0 Comments:

    Post a Comment

    Subscribe to Post Comments [Atom]

    << Home