Tuesday, August 21, 2012

(HOWTO Oracle DB) - Performance problems due Un-Balanced CPU RAC nodes


Background:
Have you seen your production RAC database using lots of CPU in 1 node and NONE on other nodes?  Node1 seems to be particularly busy comparing to Node2, LAVG reaches 40+ in Node1, comparing to LAVG 2+ in Node2.


Question is Why?

This is because the database does not consider CPU utilization at the time of connecting a session. As result we have seen unbalanced CPU utilization across nodes in the cluster. In order to fix this problem, Oracle provides the ability to change the RAC load balancing algorithm. This is done by configuring the default service to use two load balancing features “Load Balancing Advisory” (LBA) and “Connection Load Balancing” (CLB). This change can be done on the fly with using the API “dbms_service.modify_service”.

According to the Oracle documentation DBMS_SERVICE uses the following parameters to modify load balancing mechanism.

DBMS_SERVICE.MODIFY_SERVICE(
   service_name        IN VARCHAR2,
   goal                IN NUMBER DEFAULT NULL,
   dtp                 IN BOOLEAN DEFAULT NULL,
   aq_ha_notifications IN BOOLEAN DEFAULT NULL,
   failover_method     IN VARCHAR2 DEFAULT NULL,
   failover_type       IN VARCHAR2 DEFAULT NULL,
   failover_retries    IN NUMBER DEFAULT NULL,
   failover_delay      IN NUMBER DEFAULT NULL,
   clb_goal            IN NUMBER DEFAULT NULL);

GOAL_NONE = 0 (Disables Load Balancing Advisory)


GOAL_SERVICE_TIME = 1 (Load Balancing Advisory is based on elapsed time for work done in the service plus available bandwidth to the service)
 

GOAL_THROUGHPUT = 2 (Load Balancing Advisory is based on the rate that work is completed in the service plus available bandwidth to the service)

CLB_GOAL_SHORT = 1 (Connection load balancing uses Load Balancing Advisory, when Load Balancing Advisory is enabled (either goal_service_time or goal_throughput). When GOAL=NONE (no load balancing advisory), connection load balancing uses an abridged advice based on CPU utilizatio)


CLB_GOAL_LONG = 2 (Balances the number of connections per instance using session count per service. This setting is recommended for applications with long connections such as forms. This setting can be used with Load Balancing Advisory when the connection pool is sized to accommodate gravitation within the pool itself (without adding or removing connections). The latter is the most efficient design).

Recommendation:
 In order to use CPU load balancing default service need to be changes to consider CPU connection load balancing on CLB parameter (set value to CLB_GOAL_SHORT or 1), and also set the load balancing goal to throughput (set value to GOAL_THROUGHPUT or 2).

Current setup is not using CLB

SQL> col name for a25
SQL> select inst_id,name,goal,clb_goal from gv$services;

   INST_ID NAME         GOAL         CLB_G
---------- -------------------- ------------
     1 PROD             NONE         LONG
     1 PROD_PRIM        NONE         LONG
     1 SYS$BACKGROUND   NONE         SHORT
     1 SYS$USERS        NONE         SHORT
     2 PROD             NONE         LONG
     2 PROD_PRIM        NONE         LONG
     2 SYS$BACKGROUND   NONE         SHORT
     2 SYS$USERS        NONE         SHORT

In order to change CLB goal as statements as per below
SQL> EXEC DBMS_SERVICE.MODIFY_SERVICE(service_name=>'PROD',goal=>2,clb_goal=>1);
SQL> EXEC DBMS_SERVICE.MODIFY_SERVICE(service_name=>'PROD_PRIM',goal=>2,clb_goal=>1);
 
GOODNESS Indicates how attractive a given instance is with respect to processing the workload that is presented to the service. A lower number is better. This number is internally computed based on the GOAL (LONG or SHORT) that is specified for the particular service.
SQL> select end_time,inst_id,service_name,cpupercall,goodness from gv$servicemetric where service_name like 'PROD%' order by end_time desc,inst_id;
 

END_TIME           INST_ID SERVICE_NAME         CPUPERCALL   GOODNESS
--------------- ---------- -------------------- ---------- ----------
03-JUL-12                2 PROD              2647.74074        800
03-JUL-12                1 PROD              406.694915       2300
03-JUL-12                1 PROD              1882.94935       2300
03-JUL-12                2 PROD              2024.03252        800

Testing:

In order to test these changes I will be using SwingBench tool which can simulate  Sales Order Entries transactions with a large number of sessions. SwingBench tool is open source and run on JAVA.

Swingbench will be setup as follows:
  • Two desktop running swingbench will be used as load generators.
  • A database called EBSTEST will be used. This is a 2 node RAC 11gR1 database.
  • First desktop load generator will be runnig swingbench with 500 sessions, each session will reconnect after 5000 transaction have been completed. All this sessions will be forced to connect EBSTEST1 (instance 1) of the RAC cluster. Doing this will simulate the unbalanced situation, where 500+ sessions are connect to node1 and less sessions connected to node2. Also CPU utilization will be unbalanced.
  • Second desktop load generator will be using the service_name EBSTEST which can conect either node of the cluster. Connecting to this server will consider LBA and CLB balancing algorithm. 500 session will be connected and each session will re-connect every 5000 transactions are completed.Next step is to un-assing the plan from the window


Note: First load generator can be seen here running 500 session and putting load in first node.

 Note: Load can be seen here first node LAVG is 31 where second node LAVG is 1.65.

 Note: sessions number is also unbalanced. 642 in instance 1 and 140 in instance 2.


Note: service load balancing is changed on the fly to consider CPU utilization. As you can see all new sessions are connected straight to instance 2 as you can see instance 1 still hold 644 sessions where instance 2 have gone from 140 to 338 sessions.

Note: as you can see node 2 LAVG has incresed from 1.6 to 4.39, confirming new  load balancing goal is now considering CPU utilization.

(HOWTO: Oracle DB) - Ebiz database running slow due to RESOURCE MANAGER feature is enabled


Ebiz database running slow due to RESOURCE MANAGER feature is enabled


RESOURCE MANAGER feature is enable in the database by default when you install 11g. This feature limit the amount of resources a process can use. This impact all payroll processes and other batch processes and cause session to wait on an event “resmgr:cpu quntum”.

This will hopefully remove the CPU limit that is causing “resmgr:cpu quntum” waits in your production database affecting concurrent programs mainly running during business hours.

Check database parameters

SQL> show parameter resource;

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
resource_limit boolean FALSE
resource_manager_cpu_allocation integer 8
resource_manager_plan string SCHEDULER[0x7AFFC]:DEFAULT_MAINTENANCE_PLAN

Disable resource manager from spfile

SQL> alter system set resource_manager_plan='' scope=both;
System altered.

Then, check resource manager windows

SQL> select window_name, resource_plan from dba_scheduler_windows;

WINDOW_NAME RESOURCE_PLAN
------------------------------ ------------------------------
WEEKNIGHT_WINDOW DEFAULT_MAINTENANCE_PLAN
WEEKEND_WINDOW DEFAULT_MAINTENANCE_PLAN
MONDAY_WINDOW DEFAULT_MAINTENANCE_PLAN
TUESDAY_WINDOW DEFAULT_MAINTENANCE_PLAN
WEDNESDAY_WINDOW DEFAULT_MAINTENANCE_PLAN
THURSDAY_WINDOW DEFAULT_MAINTENANCE_PLAN
FRIDAY_WINDOW DEFAULT_MAINTENANCE_PLAN
SATURDAY_WINDOW DEFAULT_MAINTENANCE_PLAN
SUNDAY_WINDOW DEFAULT_MAINTENANCE_PLAN

Next step is to un-assign the plan from the window


SQL> select 'execute dbms_scheduler.set_attribute('''||window_name||''',''RESOURCE_PLAN'','''');' from DBA_SCHEDULER_WINDOWS;

'EXECUTEDBMS_SCHEDULER.SET_ATTRIBUTE('''||WINDOW_NAME||''',''RESOURCE_PLAN'','''
--------------------------------------------------------------------------------
execute dbms_scheduler.set_attribute('WEEKNIGHT_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('WEEKEND_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN','');

9 rows selected.

SQL> execute dbms_scheduler.set_attribute('WEEKNIGHT_WINDOW','RESOURCE_PLAN','');

PL/SQL procedure successfully completed.
...
...
SQL> commit;

Finally, disableauto tasks


SQL> BEGIN
DBMS_AUTO_TASK_ADMIN.DISABLE(
client_name => 'sql tuning advisor',
operation => NULL,
window_name => NULL);
END;
/

2 3 4 5 6 7
PL/SQL procedure successfully completed.

SQL> SQL> commit;

All Done.

Reference: High "Resmgr:Cpu Quantum" Wait Events In 11g Even When Resource Manager Is Disabled [ID 949033.1] 

Monday, August 6, 2012

HOWTO (Oracle DB) - Oradebug to trace session, hanganalyze and statedump

ORADEBUG Session Trace

The following steps are required to trace a user session with oradebug:

Obtain the SPID from v$process

SQL> select username, spid from v$process;

Start the debug session with the SPID of the process that needs traced

SQL> oradebug setospid 2280
SQL> oradebug unlimit

·         Select the appropriate trace level. There are four different options when specifying a tracing level.
·         Level 1 – provides “base set” tracing information. Bind variables are displayed as variables (:b1).
·         Level 4 – provides Level 1 data and the actual data values of bind variables.
·         Level 8 – provides Level 1 data and information on wait events when the elapsed time is greater than the CPU time.
·         Level 12 – combines levels 1, 4 and 8 tracing information. A Level 12 trace contains base set, bind variable values and wait events.

The oradebug command below will enable the maximum tracing possible:

SQL> oradebug event 10046 trace name context forever, level 12

Turn tracing off

SQL> oradebug event 10046 trace name context off

Obtain the trace file name. The oradebug facility provides an easy way to obtain the file name

SQL> oradebug tracefile_name
c:\oracle9i\admin\ORCL92\udump\mooracle_ora_2280.trc

Format the trace file with tkprof (as described in the earlier section on tkprof)
The result will be a trace file that contains more trace file information. Viewing wait events and bind variable values can be critical to diagnosing performance issues.

System state dump

In RAC system this need to be executed on all the instance

# Logon to sqlplus as sysdba

SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug dump systemstate 10
... Wait at least 1 min
SQL> oradebug dump systemstate 10
... Wait at lease 1 min
SQL> oradebug dump systemstate 10

Hanganalyze dump

Syntax hangaanalyze for single instance

# Logon to sqlplus as sysdba

SQL> oradebug setmypid;
SQL> oradebug unlimit;
SQL> oradebug hanganalyze 3;

Syntax for hangaanalyze for RAC

# Logon to sqlplus as sysdba

SQL> oradebug setmypid;
SQL> oradebug unlimit;
SQL> oradebug setinst all
SQL> oradebug -g def hanganalyze 3

Connecting to SGA in case sqlplus does not work

$ sqlplus -prelim '/as sysdba
SQL> oradbug ...