Monday, April 11, 2011

HOWTO (Oracle DB) - Troubleshooting Mutex Concurrency problems in the Library Cache


Several times we've seen concurrency event "Library cache: mutex X" on database sessions across the cluster database.

A session waiting on Library cache Mutex causes

  • High CPU usage on the instance node
  • Can potentially evict of a RAC node in case loadavg is very high
  • Large number of locked sessions waiting on "Library Cache: mutex X" event

Definition of Library cache MUTEX
The library cache mutex is acquired for similar purposes that the library cache latches were acquired in prior versions of Oracle. In 10g, mutexes were introduced for certain operations in the library cache. Starting with 11g, the library cache latches were replaced by mutexes, hence this new wait event.

Mutexes are a lighter-weight and more granular concurrency mechanism than latches. Mutexes take advantage of CPU architectures that offer the compare and swap instructions (or similar). The reason for obtaining a mutex in the first place, is to ensure that certain operations are properly managed for concurrency. E.g., if one session is changing a data structure in memory, then another session must wait to acquire the mutex before it can make a similar change - this prevents unintended changes that would lead to corruptions or crashes if not serialised.

This wait event is present whenever a library cache mutex is held in exclusive mode by a session and other sessions need to wait for it to be released. There are many different operations in the library cache that will require a mutex, so its important to recognise which "location" (in Oracle's code) is involved in the wait. "Location" is useful to Oracle Support engineers for diagnosing the cause for this wait event.

This query will show sessions waiting on Concurrency Mutex event
select
s1.sid||','||s1.serial#||',@'||s1.inst_id kill,
machine,p1,p1text,p2,p2text,p3,p3text,
(select p.spid from gv$process p
where p.addr=s1.paddr and p.inst_id=s1.inst_id) spid,s1.process,
s1.username,substr(s1.program,1,20) program,substr(s1.module,1,20) module,
substr(s1.action,1,20) action,
substr(s1.event,1,25)event,s1.wait_class,s1.sql_id, last_call_et,seq#
from gv$session s1
where
event like '%mutex%'
order by last_call_et desc;

Identify the Blocking session holding the MUTEX in the library cache
select *
from gv$mutex_sleep_history
where
mutex_identifier=121516;

Note: Use P1 value from the query above, IDN - Mutex identifier. Session sid=7250 on instance=1 is holding a MUTEX identifier=121516 

Check Blocking session from the GV$SESSION table
select
s1.sid||','||s1.serial#||',@'||s1.inst_id kill,machine,
(select p.spid from gv$process p where p.addr=s1.paddr and p.inst_id=s1.inst_id) spid,s1.process,
s1.username,substr(s1.program,1,20) program,substr(s1.module,1,20) module,substr(s1.action,1,20) action,substr(s1.event,1,25) event,s1.wait_class,s1.sql_id,
last_call_et,seq#
from gv$session s1
where
sid = 7250 and
inst_id = 2
order by last_call_et desc;

Note: the blocking session is waiting on the Network event. This could indicate the client is still consuming data. Killing the blocking session will realease the locks. 


Get the SQL statement being run by the blocking session
select *
from gv$sqltext
where
sql_id = 'ajx51mvr52a60' and
inst_id=2; 

Get the SQL statement to kill all sessions that belong to a particular module
select 'echo '||inst_id||';kill -9 '||spid from gv$process where (addr,inst_id) in (select paddr,inst_id from gv$session where module like '%Fogli%');

'ECHO'||INST_ID||';KILL-9'||SPID
--------------------------------
echo 1;kill -9 5367
echo 1;kill -9 9151
echo 1;kill -9 19199
echo 1;kill -9 9981
echo 1;kill -9 8478
echo 1;kill -9 8486


Reference: WAITEVENT "library cache: mutex X" (Doc ID 727400.1)

No comments:

Post a Comment