Friday, November 4, 2011

Some posts about RAID 5 vs. RAID 10

Recently there is an interesting discussion in Linkin about the impact of using RAID 5 or RAID 10 in Oracle RAC environment, here is the Linkin post: RAID 5 and RAID 10 intermixed. To understand more about differences and pros & cons between RAID 5 and 10, I googled and found some posts for further readings:

  1. RAID 10 vs RAID 5: Performance, Cost, Space, and HA
  2. RAID 5 vs RAID 10: Recommended RAID For Safety and Performance
  3. RAID 0, RAID 1, RAID 5, RAID 10 Explained with Diagrams
  4. Configuration Considerations for Improving PowerVault 220S Performance
  5. More Spindles Means More IOPS? RAID10 vs RAID50

Labels:

2 More Good Blog Posts from Enkitec about Oracle Database Appliance

Here are two more good blog posts from Enkitec engineers about the new Oracle Database Appliance:

  1. Oracle Database Appliance (ODA) Installation / Configuration
  2. Inside the Oracle Database Appliance - Part 1
Hopefully the budget request for Oracle Databae Appliance will be approved for the next fiscal year, then I would have a chance to play with it. 

Labels:

Enkitec Blog: "Make bash scripts crontab-compatible" by Toby Marks

Here is another helpful blog post from Enkitec Blog:
Make bash scripts crontab-compatible

Labels: ,

Paul Elbow's crsstat script: Improved Formatting of crs_stat on 10g and 11g

Today I found a very helpful script from Paul Elbow's blog to format crsctl output:
My crsstat script: Improved Formatting of crs_stat on 10g and 11g


Labels: , , ,

Wednesday, November 2, 2011

ORA-12520: TNS:listener could not find available handler for requested type of server

Today I encountered a strange intermittent database connection problem when trying to connect to a database via SQL*Plus, the error I got is:
ORA-12520: TNS:listener could not find available handler for requested type of server

According to this Oracle note: Intermittent TNS-12516 or TNS-12519 Errors When Connecting Via Oracle Net [ID 240710.1] , it's likely that lack of server processes is the cause of the problem.  So I checked the system parameters settings of this database:
SQL> show parameters process;


NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
processes                            integer     150        


SQL> select * from v$resource_limit where resource_name in ('processes','sessions');


RESOURCE_NAME                  CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION                       LIMIT_VALUE
------------------------------ ------------------- --------------- ---------------------------------------- ----------------------------------------
processes                                       78             150        150                                      150
sessions                                        93             172        264                                      264


The process value is set to the default one by another DBA who created this database.  And also at the same time, there is an import running in the database, and it's using parallelism of 8. Do the settings of parallel related parameters have impact on generating this error? I don't know, I need to do more research about that.

SQL> show parameters parallel


NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback         string      LOW
parallel_adaptive_multi_user         boolean     TRUE
parallel_automatic_tuning            boolean     FALSE
parallel_degree_limit                string      CPU
parallel_degree_policy               string      MANUAL
parallel_execution_message_size      integer     16384
parallel_force_local                 boolean     FALSE
parallel_instance_group              string
parallel_io_cap_enabled              boolean     FALSE
parallel_max_servers                 integer     135
parallel_min_percent                 integer     0


NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
parallel_min_servers                 integer     0
parallel_min_time_threshold          string      AUTO
parallel_server                      boolean     FALSE
parallel_server_instances            integer     1
parallel_servers_target              integer     384
parallel_threads_per_cpu             integer     2
recovery_parallelism                 integer     0


Anyway, it seems the low value of "process" parameter a likely cause of this problem.







Labels: , ,

Tuesday, November 1, 2011

ORA-28002: the password will expire within 1 days

A user complained that his password was to expire in 1 day, so I changed the "PASSWORD_LIFE_TIME" to be "unlimted", but the user still got following message when logged into his account:


SQL> conn  user1/passwd@db1
C:\Users\pc1>sqlplus user1/passsd@db1


SQL*Plus: Release 11.2.0.1.0 Production on Tue Nov 1 16:22:54 2011


Copyright (c) 1982, 2010, Oracle.  All rights reserved.


ERROR:
ORA-28002: the password will expire within 1 days


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options


I was puzzled by this error message. I thought the "PASSWORD_LIFE_TIME" value changed to "unlimited" would resolve this problem. After searching Oracle Support, I found this note which explains why it still happens even after "PASSWORD_LIFE_TIME" value changed to "unlimited":

ORA-28002 On User Connection Immediately After PASSWORD_LIFE_TIME Changed [ID 162818.1]


"The PASSWORD_LIFE_TIME limit of a profile is measured from the last time an account's password was changed or the account creation time if the password has never been changed. These are the dates USER$.PTIME and USER$.CTIME respectively.

It is NOT measured from the current time, as may be initially thought.
Therefore, any accounts affected by the changed profile whose last password change time was more than PASSWORD_LIFE_TIME days ago immediately expire and enter their grace period on their next connection, issuing the ORA-28002 warning.
"

SQL> ALTER PROFILE test LIMIT PASSWORD_LIFE_TIME UNLIMITED;



This change only affects accounts who have not entered their grace period (and started getting ORA-28002 on connection). Accounts that have entered their grace period will have to change their passwords."


Here is what happend in one of my 11202 RAC database:
1 Check user account status:

SQL> select username, profile,account_status,expiry_date, created 
from dba_users 
where default_tablespace  not in ('SYSAUX','SYSTEM') 
--and expiry_date is not null 
and account_status not like '%LOCKED' order by expiry_date;





2 Change profile
SQL> alter profile default LIMIT PASSWORD_LIFE_TIME UNLIMITED;


3 Re-check account status

SQL> select username, profile,account_status,expiry_date, created 
from dba_users 
where default_tablespace  not in ('SYSAUX','SYSTEM') 
--and expiry_date is not null 
and account_status not like '%LOCKED' order by expiry_date;







Related Notes:

  1. Oracle Password Management Policy [ID 114930.1]
  2. BUG:1231172 - ORA-28003 WHEN CHANGING PASSWORD FOR A USER
  3. NOTE:1079860.6 - ORA-28011 Password Expiry Date is Reached But Reset to NULL
  4. NOTE:1083889.6 - ORA-00931: missing identifier when PASSWORD_VERIFY_FUNCTION = UNLIMITED
  5. NOTE:124648.1 - ORA-28003, ORA-20001, ORA-20002, ORA-20003, ORA-20004 After Running utlpwdmg.sql
  6. NOTE:139676.1 - ORA-28007: the password cannot be reused
    NOTE:139748.1 - Demonstrates the use of the new Oracle OCI8 OCIPasswordChange function (SCR 631)
  7. NOTE:162818.1 - ORA-28002 On User Connection Immediately After PASSWORD_LIFE_TIME Changed
  8. NOTE:260111.1 - How to Interpret the ACCOUNT_STATUS Column in DBA_USERS
  9. NOTE:98481.1 - How to Keep the Same Password when Expiry Time is Reached and Change is Required


Labels: ,

Check 11202 RAC system recorded network interface name


Recently I was required to update our 11202 two nodes RAC on OEL 5.5 public interface name. After interface name change, here are the 2 steps I did to check if the system got the interface name correctly:
[oracle@node2 ~]$
[oracle@node2 ~]$ oifcfg getif
eth3  192.168.206.20  global  cluster_interconnect
bond0  192.168.100.0  global  public


[oracle@node2 ~] more /opt/app/11.2.0/grid/gpnp/linkin2/profiles/peer/profile.xml
IP="192.168.206.20" Adapter="eth3" Use="cluster_interconnect"/>Network id="net2" IP="192.168.100.0" Adapter="bond0" Use="public"/>
:ASM-Profile id="asm" DiscoveryString="/dev/mapper/*" SPFile="+OCR_VOTE/linkincluster/asmparameterfile/registry.253.764469115"/>
CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/>
ithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/>
t="gpnp orcl xsi"/>DO5mYZjrYv+5t15UH8zedF0o3O4=
nfo>fyCiD46yDKsjcjpyoFYIuoZ1+sEvE72XPKAOu5gSfLjllUFVgNNM3EW8EiyUhvbmhHgxLcQ5ZL8iTdmhDtun53gbcNVc5RVP82B32WH1qzJUaDLx3MXQsZcO3/wmkyxUo7azsCosrk2WWwYeSqUl4YiArTbv3WPMXwfU9TdFyHM=
reValue>

Labels: ,

Rman backup retention policy and recovery window

This is a place holder for me to understand RMAN backup retention policy and recovery window. I configured daily incrementatl level 1 backup from Monday to Friday, and incremental level 0 backup on Saturday, and retention policy is set to 7 days but when I checked the RMAN backup files on OS level, there are files older than 7 days still existing in the server.  I think this Oracle notes may explain why there are backup files older than the 7 days retention policy existed in the server:

  1. Rman backup retention policy [ID 462978.1]
  2. Frequently asked questions on Rman backup retention policy [ID 463875.1]
  3. NOTE:206862.1 - How does the RMAN Retention Policy Obsolete Incremenatal Backupsets and Archivelog Files
  4. NOTE:351455.1 - Oracle Suggested Strategy & Backup Retention
  5. NOTE:434345.1 - How Do I Set A Retention Policy For Tape Backups And Disk Backups Differently?
  6. NOTE:452529.1 - Recovery catalog for RMAN backup
I will further research and modify this post to add more information later. 

Labels:

Differences between "crsctl .. crs" and "crsctl ... cluster"

I had been puzzled by the differences between "crsctl ... crs" and "crsctl ... cluster", so I decided to look into the differences between them, it turns out that they are clearly explained in the documentation:
Oracle® Clusterware Administration and Deployment Guide
11g Release 2 (11.2)

Part Number E16794-14

E CRSCTL Utility Reference

According to the above doc:

CRSCTL Overview

CRSCTL is an interface between you and Oracle Clusterware, parsing and calling Oracle Clusterware APIs for Oracle Clusterware objects.
Oracle Clusterware 11g release 2 (11.2) introduces cluster-aware commands with which you can perform check, start, and stop operations on the cluster. You can run these commands from any node in the cluster on another node in the cluster, or on all nodes in the cluster, depending on the operation.
You can use CRSCTL commands to perform several operations on Oracle Clusterware, such as:
  • Starting and stopping Oracle Clusterware resources
  • Enabling and disabling Oracle Clusterware daemons
  • Checking the health of the cluster
  • Managing resources that represent third-party applications
  • Integrating Intelligent Platform Management Interface (IPMI) with Oracle Clusterware to provide failure isolation support and to ensure cluster integrity
  • Debugging Oracle Clusterware components



Clusterized (Cluster Aware) Commands

You can run clusterized commands on one node to perform operations on another node in the cluster. These are referred to as remote operations. This simplifies administration because, for example, you no longer have to log in to each node to check the status of the Oracle Clusterware on all of your nodes.
Clusterized commands are completely operating system independent; they rely on the OHASD (Oracle High Availability Services daemon). If this daemon is running, then you can perform remote operations, such as the starting, stopping, and checking the status of remote nodes.
Clusterized commands include the following:
  • crsctl check cluster
  • crsctl start cluster
  • crsctl stop cluster

---crsctl command help shows the following .
crsctl stop cluster -h
Usage:
crsctl stop cluster [[-all]|[-n [...]]] [-f]
Stop CRS stack
where
Default Stop local server
-all Stop all servers
-n Stop named servers
server [...] One or more blank-separated server names
-f Force option


In simple words it work like this.

crsctl stop cluster is equal to crsctl stop crs
crsctl stop cluster -all --> stops clsuterware on ALL nodes.
crsctl stop cluster -n racnode1 stops clusterware on node racnode1.

Labels: , , ,

How to change the public interface name of 11202 RAC


Recently I was asked to work SA to change the public interface name of our 11202 two nodes RAC on Linux 5.5.  After researching Oracle Support notes and confirmation with Oracle Support, following are the steps required to perform this change:
Note: unfortunately at last minute, this change was cancelled, so I didn't have a chance to really test it out. I tested another set of similar steps but failed due to wrong steps execution.

1) set new public network interface name with oifcfg setif -global <..ifname.>
$ $ORA_CRS_HOME/bin/oifcfg setif -global eth0/192.168.100.0:public

2) delete old public network interface name with oifcfg delif -global
$ $ORA_CRS_HOME/bin/oifcfg delif -global bond0

3) voting disks permission needs to stay permenent to be "oracle:oinstall", even after server reboot

4) SHUTDOWN crs
crsctl stop crs

5) perform changes at OS level

6) reboot both servers
Note: Reboot is not essentail. Needed only if SA need that.

7) After server reboot, check voting disks permission to make sure they belong to "oracle:oinstall".
If not, change it back to oracle:oinstall.
Note: this step is not necessary. I put it here because after SA made the interface name change on OS level and rebooted the server, CRS couldn't be started.  After long hours of hard work, it turned out to be the voting disks permission issue. 

8) Restart the clusterware crsctl start crs
Check crs status, if not all online, shutdown force: crsctl stop crs -f, then start it up node by node: crsctl start crs
Note: what are the differences between "crsctl stop/start crs" and "crsctl stop/start cluster"? 
"crsctl stop/start crs" only stop or start one node. 

9) To modify the VIP change while all cluster related services are online (CRS, ASM, db)
a) Gather the existing setup
$ srvctl config nodeapps -a

b) Verify VIP status
$ crsctl stat res -t
- it should show VIPs are ONLINE

$ ifconfig -a

- VIP logical interface is bind to the public network interface

c) Stop the nodeapps resources and all dependent resources (stop ASM/DB instance only required for full outage):
$ srvctl stop instance -d -n
$ srvctl stop vip -n -f

d) Verify VIP is now OFFLINE and the interface is no longer bound to the public network interface

$ crsctl stat res -t

$ ifconfig -a

e) Modifying VIP and Its Associated Attributes
Determine the new VIP IP/subnet/netmask or VIP hostname, make the network change on OS first, ensure the new VIP is registered in DNS or modified in /etc/hosts
(for Unix/Linux). If the network interface is changed, ensure the new interface is available on the server before proceeding with the modification.

f) Modify the VIP resource
network resource can be modified directly via srvctl modify network command.

as root user:
# srvctl modify network -k ] [-S /[/if1[|if2...]]

For example:
New VIP is: 110.11.70.11 racnode1-nvip
new subnet is 110.11.70.0
new netmask is 255.255.255.0
new interface is eth0

# srvctl modify network -k 1 -S 110.11.70.0/255.255.255.0/eth0

Note:
How to tell if VIP is bound or not bound to the public interface from the output of "ifconfig -a"?
Answer: If the public interface is 'eth0' then vip should bound as 'eth0:1'

g) Verify the change
$ srvctl config nodeapps -a

h) Start the vip and other resources
$ srvctl start vip { -n
$ srvctl start instance -d -i

i) Verify the new VIP is ONLINE and bind to the public network interface

$ crsctl stat res -t
$ ifconfig -a

Related Oracle Support Notes:

  1. How to Change Interconnect/Public Network (Interface or Subnet) in Oracle Clusterware [ID 283684.1]
  2. How to Modify Private Network Interface in 11.2 Grid Infrastructure [ID 1073502.1] 
  3. Troubleshooting CRSD Start up Issue [ID 1323698.1]
  4. CRS can not Start After Node Reboot [ID 733260.1]
  5. 10gR2, 11gR1 and 11gR2 Oracle Clusterware (CRS / Grid Infrastructure) & RAC Command (crsctl, srvctl, cluvfy etc) Syntax and Reference [ID 1332452.1]



Labels: , ,