Monday, April 11, 2011

HOWTO (Oracle DB) - Troubleshooting Mutex Concurrency problems in the Library Cache


Several times we've seen concurrency event "Library cache: mutex X" on database sessions across the cluster database.

A session waiting on Library cache Mutex causes

  • High CPU usage on the instance node
  • Can potentially evict of a RAC node in case loadavg is very high
  • Large number of locked sessions waiting on "Library Cache: mutex X" event

Definition of Library cache MUTEX
The library cache mutex is acquired for similar purposes that the library cache latches were acquired in prior versions of Oracle. In 10g, mutexes were introduced for certain operations in the library cache. Starting with 11g, the library cache latches were replaced by mutexes, hence this new wait event.

Mutexes are a lighter-weight and more granular concurrency mechanism than latches. Mutexes take advantage of CPU architectures that offer the compare and swap instructions (or similar). The reason for obtaining a mutex in the first place, is to ensure that certain operations are properly managed for concurrency. E.g., if one session is changing a data structure in memory, then another session must wait to acquire the mutex before it can make a similar change - this prevents unintended changes that would lead to corruptions or crashes if not serialised.

This wait event is present whenever a library cache mutex is held in exclusive mode by a session and other sessions need to wait for it to be released. There are many different operations in the library cache that will require a mutex, so its important to recognise which "location" (in Oracle's code) is involved in the wait. "Location" is useful to Oracle Support engineers for diagnosing the cause for this wait event.

This query will show sessions waiting on Concurrency Mutex event
select
s1.sid||','||s1.serial#||',@'||s1.inst_id kill,
machine,p1,p1text,p2,p2text,p3,p3text,
(select p.spid from gv$process p
where p.addr=s1.paddr and p.inst_id=s1.inst_id) spid,s1.process,
s1.username,substr(s1.program,1,20) program,substr(s1.module,1,20) module,
substr(s1.action,1,20) action,
substr(s1.event,1,25)event,s1.wait_class,s1.sql_id, last_call_et,seq#
from gv$session s1
where
event like '%mutex%'
order by last_call_et desc;

Identify the Blocking session holding the MUTEX in the library cache
select *
from gv$mutex_sleep_history
where
mutex_identifier=121516;

Note: Use P1 value from the query above, IDN - Mutex identifier. Session sid=7250 on instance=1 is holding a MUTEX identifier=121516 

Check Blocking session from the GV$SESSION table
select
s1.sid||','||s1.serial#||',@'||s1.inst_id kill,machine,
(select p.spid from gv$process p where p.addr=s1.paddr and p.inst_id=s1.inst_id) spid,s1.process,
s1.username,substr(s1.program,1,20) program,substr(s1.module,1,20) module,substr(s1.action,1,20) action,substr(s1.event,1,25) event,s1.wait_class,s1.sql_id,
last_call_et,seq#
from gv$session s1
where
sid = 7250 and
inst_id = 2
order by last_call_et desc;

Note: the blocking session is waiting on the Network event. This could indicate the client is still consuming data. Killing the blocking session will realease the locks. 


Get the SQL statement being run by the blocking session
select *
from gv$sqltext
where
sql_id = 'ajx51mvr52a60' and
inst_id=2; 

Get the SQL statement to kill all sessions that belong to a particular module
select 'echo '||inst_id||';kill -9 '||spid from gv$process where (addr,inst_id) in (select paddr,inst_id from gv$session where module like '%Fogli%');

'ECHO'||INST_ID||';KILL-9'||SPID
--------------------------------
echo 1;kill -9 5367
echo 1;kill -9 9151
echo 1;kill -9 19199
echo 1;kill -9 9981
echo 1;kill -9 8478
echo 1;kill -9 8486


Reference: WAITEVENT "library cache: mutex X" (Doc ID 727400.1)

Tuesday, March 8, 2011

HOWTO (eBiz) - Tracking file changes after applying patches

We normally use adident utility to find what is the current version of a particular file in EBS, however in case you want to see home many patches changed the same file there is no tools that provides this information.

The query below will help you to identify not only the current file version, but the history of changes on the file after being modified by a particular patch.

/* 
-- Filename: check_file_version_with_patch_number.sql
-- Description: file change history tracking 
-- Author: Christian Ricci
-- Created on: 09/Mar/2011
-- Modified on: 20110309 - CR - Initial version
*/
col APP_SHORT_NAME for a8
col filename for a25
col subdir for a30
col version for a40
set lin 300 pages 1000 trim on trimspool on verify off
set colsep '|'
--
alter session set nls_date_format='dd-mm-yyyy hh24:mi:ss';
select instance_name from v$instance;
--
SELECT
  ab.bug_number         ,
  f.app_short_name      ,
  f.subdir              ,
  f.filename            ,
  v.file_id             ,
  v.creation_date       ,
  v.last_update_date    ,
  v.version
FROM applsys.AD_FILE_VERSIONS v,
  applsys.ad_files f,
  applsys.AD_PATCH_RUN_BUGS pr,
  applsys.AD_PATCH_RUN_BUG_ACTIONS pra,
  applsys.ad_bugs ab
WHERE
    v.file_id=f.file_id
and pra.file_id=v.file_id
and pra.patch_file_version_id=v.file_version_id
and pr.patch_run_bug_id=pra.patch_run_bug_id
and ab.bug_id=pr.bug_id
and f.filename='&1'
ORDER BY
  v.last_update_date DESC,
  f.filename             ,
  v.creation_date;


This below query will show the files modified by a particular patch.

/* 
-- Filename: check_file_on_patch_number.sql
-- Description: Files changes by a particular patch
-- Author: Christian Ricci
-- Created on: 09/Mar/2011
-- Modified on: 20110309 - CR - Initial version
*/col APP_SHORT_NAME for a8
col filename for a25
col subdir for a30
col version for a40
set lin 300 pages 1000 trim on trimspool on verify off

set colsep '|'
--
alter session set nls_date_format='dd-mm-yyyy hh24:mi:ss';
select instance_name from v$instance;
--
SELECT
  ab.bug_number         ,
  f.app_short_name      ,
  f.subdir              ,
  f.filename            ,
  v.file_id             ,
  v.creation_date       ,
  v.last_update_date    ,
  v.version
FROM applsys.AD_FILE_VERSIONS v,
  applsys.ad_files f,
  applsys.AD_PATCH_RUN_BUGS pr,
  applsys.AD_PATCH_RUN_BUG_ACTIONS pra,
  applsys.ad_bugs ab
WHERE
    v.file_id=f.file_id
and pra.file_id=v.file_id
and pra.patch_file_version_id=v.file_version_id
and pr.patch_run_bug_id=pra.patch_run_bug_id
and ab.bug_id=pr.bug_id
and ab.bug_number='&1'
ORDER BY
  v.last_update_date DESC,
  f.filename             ,
  v.creation_date;
 

Sunday, January 23, 2011

HOWTO (eBiz) - E-Business Suite R12 Apps DBA full implementation project

During 2007/2008 I had the chance to work on a full implementation of eBiz version R12. I was the main DBA in site and my scope was the delivery of the Production eBiz cluster and rest of support environments. I was involved in the architecture design, installation of the OS via PXE boot (process that I built), installation of R12 in cluster both on the database tier and application tier, initial configuration of eBiz, HA test and documentations, and the cloning process on a RAC and MULTINODE configuration.
The interesting setup of this eBiz is the utilisation of the Shared APPL_TOP on a READ-ONLY file system. This is possible thanks to the new R12 features of the INST_TOP which is the separation of the config and log files outside the APPL_TOP.
Another interesting setup was the way I configured Concurrent Managers in HA mode or Parallel Concurrent Processes, ICM and FNDSM are the key components of this setup, I had lots of problems with TNS session timeout during fail-over tests which I managed to fix by with the help a lot of older metalink notes. I used OCFS2 to shared key directory path locations needed by the MULTINODE setup. Among these locations were APPLCSF, APPLPTMP, APPLTMP, FNDREVIVER, BNE and the list keep growing, this is not very well documented by Oracle and it was difficult to troubleshoot.
These days OCFS2 has been certified to be used on shared APPL_TOP configuration and there is not need to use READ-ONLY file system implementation anymore, however this configuration works perfectly. Other solutions are to use an external NAS device over NFS to mount APPL_TOP (please do not use local NFS server in apps eBiz nodes since this will introduce a single point of failure, some people don't think about HA all the time, so please speak to your System Architect before taking any architectural decision).

Cloning 8 nodes eBiz was really a challenge, specially when you use RAC and MULTINODE setup on a Shared APPL_TOP. The process is not very well documented, and ADCLONE tool are not fully prepared to manage a non-standard setup. I decided to use a manual clone process, where I had to run FND_SETUP.CONC_CLEAN and adfcfclone dbTechStack and adcfgclone appsTechStack. When you use MULTINODE and RAC there are other things to consider like running ADCONFIG which very tricky. 
 
If you have 4 database node, you need to run adconfig in the follwing order node1 , node2, node3 and node4, at this stage people normally think adconfig was run in all the nodes and that's it, but that is not really true... If you have a look at the tnsname.ora for example on the node1, you will see there is no reference of the rest of the nodes in the “ADDRESS LIST”, Why? This is because of a limitation of ADCONFIG which grabs the information of the nodes from the FND tables which are populated as ADCONFIG runs. 
 
To workaround this problem, your need to re-run ADCONFIG twice. For example: run ADCONFIG in node1, node2, node3 and node4, at this stage node4 know about (node 1,2,3,4), node3 know about (1,2,3 but not 4), node2 knows about (1,2 but not 3,4) and node1 knows about (1 but not 2,3,4). So you need to re-run ADCONFIG in this order node3, node2 and node1 to fix the problem. After running ADCONFIG in the remaining nodes TWICE check tnsnames.ora and you will see “ADDRESS LIST” contain reference to all the cluster nodes. This workaround applies both for APP TIER and DB TIER. Hopefully Oracle will fix their documentation soon and properly document this process.

Hardware/Software Specs:
HP class 7000 blade chassis. Composed of 16 blades.
Citrix Netscaler Load Balancers
Redhat 4 EL x86_64
Oracle CRS/ASM/RDBMS V10.2.0.4 and EBS R12.0.4

Contributions:
Thanks to Neil and Dipak for reviewing this documentation and helping all the way through the project.

Full documentation:
This post has references to step-by-step build documentation for the E-Business Suite OS build, Database RAC and Shared APPL_TOP read-only implementation, Multinode setup using Parallel concurrent processing. These documents are very long, see the document index below to have an idea of the content.

Please post a comment including your email address and I will send you the password.
ebsr12_os_build.pdf
ebsr12_oracle_rac_build.pdf
ebsr12_oracle_multinode_build.pdf
ebsr12_clone.pdf

I am asking for a minimum donation via PayPal logo located on the right hand side of my blog (this can come as part of your department's training budget). Alternatively, you can engage my services directly, please contact me on my email.

Click link to download full documentation ebsr12_multinode_rac_full_cluster.tgz.gpg (password will be emailed to you when you Donate).

Powered by
http://www.gnupg.org/download
Hardware Setup, Linux Red Hat EL 4 Build using PXE Boot, Backup Configuration, Maintenance tasks and Procedures
1 Document Control 4
1.1 Signoff 4
1.2 Document Owner 4
1.3 Distribution 4
1.4 Change History 4
2 Purpose 5
2.1 Scope 5
3 Hardware Setup 6
3.1 Overview of HP Blade System c7000 Enclosure 6
3.2 HP Blade System c7000 Enclosure Configuration 8
3.3 ProLiant c-Class Server Blade architecture 11
3.4 Network/SAN Interconnect Bays 13
3.5 Power and Thermal Management 14
4 Linux Red Hat OS Build using PXE Boot process 16
4.1 PXE boot server configuration using Linux 17
4.2 PXE boot server configuration using Microsoft SMS 19
4.3 Configuring the NFS server to store Red Hat Source packages 21
4.4 Start the OS build process 23
5 Configure Kernel Huge Pages for Oracle SGA 26
5.1 HugePages Features 26
5.2 HugePages setup 26
6 Configure CUPS unix printing 29
6.1 Configure cups web access 29
6.2 Add a printer to unix 30
7 Account set up and password policies 31
7.1 Define UNIX password policies 31
7.2 Set up sudo access 32
7.3 Set ACLs on JAVA_TOP and DEE_TOP 32
8 Install and configure CommVault UNIX File System agent 34
8.1 Install the file system agent 34
9 Install and configure CommVault Oracle agent 36
9.1 Install the Oracle agent 36
9.2 Configure the Oracle dataset backup on tape 37
9.3 Perform a disaster recovery of the database from a tape backup 42
10 Setting up backup storage policy and schedules in CommVault 54
10.1 CommVault Backup schedules 54
10.2 CommVault Storage Policies 55
10.3 Current backup policy 56
11 Appendix 57
11.1 FILE: ks-linux-oracle.cfg 57
11.2 FILE: fix_ip.sh 63
11.3 FILE: /etc/sudoers 66
11.4 FILE: create_accounts.sh 68
11.5 FILE: pfile.uat 69

Oracle 10g Release 2 Real Application Clusters and Automatic Storage Management with Oracle E-Business Suite Release 12, Configuring Application Tier using READ-ONLY file system
1 Document Control 4
1.1 Signoff 4
1.2 Document Owner 4
1.3 Distribution 4
1.4 Change History 4
2 Purpose 5
2.1 Scope 5
3 Oracle 10g Release 2 Real Application Clusters and Automatic Storage Management with Oracle E-Business Suite Release 12 6
3.1 Network Configuration 6
3.2 Disk configuration 7
3.3 EBS-R12 installation (non-cluster) 8
3.4 Install Oracle Clusterware 10.2.0.1 30
3.5 Install Oracle Clusterware patchset 10.2.0.4 54
3.6 Configure ONS to support multiple accounts configuration 60
3.7 Install Oracle ASM software 10.2.0.1 62
3.8 Install Oracle ASM patchset 10.2.0.4 69
3.9 Clone Oracle Database home from existing DB Tech Stack home 74
3.10 Create ASM Instances and Disk Groups 78
3.11 Convert 10gR2 Database to Oracle RAC using rconfig 99
3.12 Enable AutoConfig on Database Tier 104
3.12.1 Enable AutoConfig in First DB TIER node 104
3.12.2 Enable AutoConfig on all other DB TIER nodes 111
3.12.3 AutoConfig Post actions on all DB TIER nodes 115
3.13 Establish Applications Environment for Oracle RAC 118
4 Configuring Application Tier cluster using Read-Only file system implementation 120
4.1 Prepare existing node 120
4.2 Make the Applications files accessible 120
4.3 Configure the node to be added 120
4.4 Finishing Tasks 122
4.4.1 DNS Load Balancing with Single Web Entry Point 122
4.4.2 Implement Parallel Concurrent Processing 122
5 APPENDIX 125
5.1 FILE: rac-prereqs.sh 125
5.2 FILE: asm_build.sh 129
5.3 FILE: UAT_ConvertToRAC.xml 131
5.4 FILE: tnsnames.ora (DB TIER Node1) 133
5.5 FILE: tnsnames.ora (DB TIER Node2) 136
5.6 FILE: listener.ora (DB TIER node1) 139
5.7 FILE: listener.ora (DB TIER node2) 141
5.8 FILE: rac-enabled-pfile.ora 143
5.9 FILE: spfile (for the ASM) 161
5.10 FILE: ons_config_db.tmp 162
5.11 FILE: ons_config – CRS_HOME 163

E-Business Suite R12 Multi-Node configuration sharing the Application Tier File System as Read-Only, Network Load Balancing configuration and Parallel Concurrent Processing
1 Document Control 3
1.1 Signoff 3
1.2 Document Owner 3
1.3 Distribution 3
1.4 Change History 3
2 Purpose 4
2.1 Scope 4
3 Multi-Node and shared application tier file system concepts 5
4 Network Load balancing concepts 6
5 Parallel Concurrent Processing Concepts 8
6 Maintaining a Shared Application Tier File System 9
7 Add a node to a Shared Application Tier file system (Read-Only Implementation) 10
7.1 Prepare the Source System 11
7.2 Make the application files accesible to the Secondary node 11
7.3 Configure the node to be added 12
8 Network Load balancing for the Web Entry Point services 17
8.1 Configure Load Balancers with Single Web Entry Point 19
8.2 Enable OC4J clustering on the application tier nodes 22
9 Configuring Parallel Concurrent Processing 24
9.1 Configure a shared file system for PCP 25
9.2 Enabling Concurrent Processing service on App Tier nodes 29
10 APPENDIX 37
10.1 FILE: igccmprf.sh 37
10.2 FILE: mod_oc4j_conf_1013.tmp 40
10.3 FILE: build_ocfs2.sh 41

E-Business Suite R12 environment cloning procedures
1 Document Control 3
1.1 Signoff 3
1.2 Document Owner 3
1.3 Distribution 3
1.4 Change History 3
2 Purpose 4
2.1 Scope 4
3 Cloning process in summary 5
4 Clone DBTIER from source system RAC to target system RAC 6
4.1 Prerequisites 7
4.2 Prepare the Source System 8
4.3 Perform clone steps in DB TIER Target System 9
4.3.1 Enable AutoConfig on Applications Database Tier 26
4.3.2 Enable AutoConfig on all other DB TIER nodes 31
4.3.3 AutoConfig Post actions on all DB TIER nodes 35
4.4 Perform clone steps in APP TIER Target System 38
4.5 Perform post task clone actions (all app tiers) 42
5 APPENDIX 43
5.1 FILE: rac-enabled-pfile.ora 43
5.2 FILE: tnsnames.ora (node2) 59
5.3 FILE: listener.ora (node2) 62
5.4 FILE: listener.ora (node1) 64
5.5 FILE: tnsnames.ora (node1) 66
5.6 FILE: ons_config_db.tmp 69
5.7 FILE: adlsnr10RAC.ora 70
5.8 FILE: /tmp/clone.patch 71
5.9 FILE: /staging/oracle/cmclean.sql 74
5.10 FILE: /staging/oracle/coins-ebs.sql 79

Friday, January 21, 2011

HOWTO (eBiz) - Automatic cloning of AppTier and DbTier. Making Apps DBA's life easier


One of the most tedious tasks a DBA has to execute is a clone of an eBiz environment. Cloning can take from a few hours to days, depending the level of automation you have and how many additional extra post steps there are after AppTier has been cloned.

Most of the companies I worked with has a large number of environment, DEV, TEST, UAT, PRE-PROD, PROD. Sometimes the list gets bigger since each individual project requires a replica of PROD. This, of course double the number of environments. You normally find at least 10+ environments during the implementation of eBiz. But, I've seen companies with 30+ environments, and that is a crazy number for an Apps DBA at the time of cloning them.

Back in 2007, I started developing a script with the intention of doing the full eBiz clone from a SOURCE to TAGET environment, including both dbTier and appTier. Most of the AD utilities allow you to pass parameter via the command line, and with a little bit of shell scripting help plus Unix skill, I managed to create a set of scripts which perform a full clone from end-to-end. Just press the button and after a few hours you TARGET will be cloned from your SOURCE.
The script works like this:

  • It uses SSH public keys to connect to app and db servers. Make sure to create pub keys and add them to destination servers.
  • Remotely executes commands via ssh.
  • It backup SOURCE database to tape or disk. If you using disk, you need to manually copy RMAN backup pieces to TARGET server.
  • It stop database and app process both in TARGET db and app tier.
  • It deletes both database and app tier files from TARGET, in case the environment already exists. Make sure to double check script environment variables “clone.env” before running it.
  • It uses RMAN to duplicate database from the SOURCE to TARGET.
  • It copies the app tier files and Oracle Home, then runs adcfgclone for dbTechStack and appsTier.
  • It uses expect to automate adclone utilities. Thanks to Ned G. for this contribution.
  • In the last steps it will perform a set of post additional clone steps, like enable/disable SSO, change profile options, run adconfig, datamasking, end-date users, fix symbolic links, etc.

Limitations:

  • At the moment the script does not clone a TARGET RAC or MULTINODE environment. It used to work before, but I removed some steps for RAC and MULTINODE, due to complexity of each different sites. I'm planing to work on full RAC and MULTINODE feature sometime this year.
  • It assumes you know what you are doing and that Apps DBA has tested the script in a non-PROD environment. I will not assume any responsibility if the script breaks anything. Please READ the instructions and understand what the scripts does before running it.
  • It assumes ssh keys and sudo commands are properly setup and that some directory path already exists. Checkout manual in the README.txt

Copyright:

GNU GENERAL PUBLIC LICENSE
clone_ebiz is a script that can be used to automate your eBiz clone end-to-end without human intervention.

Copyright (C) 2007 Christian Daniel Ricci – email: ricci.christian@gmail.com

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.

Download:

I am asking for a minimum donation via PayPal logo located on the right hand side of my blog (this can come as part of your department's training budget). Alternatively, you can engage my services directly, please contact me on my email.

To download the script click this link clone_ebiz.tgz.gpg (password will be emailed to you when you Donate).
Powered by
http://www.gnupg.org/download
Contributions:
  • Thanks to Neil for his inputs and changes.
  • Many thanks to Ned G. for his contribution with expect.

Enjoy!!! Any questions or help required, please post a comment.