Tuesday, August 21, 2012

(HOWTO Oracle DB) - Performance problems due Un-Balanced CPU RAC nodes


Background:
Have you seen your production RAC database using lots of CPU in 1 node and NONE on other nodes?  Node1 seems to be particularly busy comparing to Node2, LAVG reaches 40+ in Node1, comparing to LAVG 2+ in Node2.


Question is Why?

This is because the database does not consider CPU utilization at the time of connecting a session. As result we have seen unbalanced CPU utilization across nodes in the cluster. In order to fix this problem, Oracle provides the ability to change the RAC load balancing algorithm. This is done by configuring the default service to use two load balancing features “Load Balancing Advisory” (LBA) and “Connection Load Balancing” (CLB). This change can be done on the fly with using the API “dbms_service.modify_service”.

According to the Oracle documentation DBMS_SERVICE uses the following parameters to modify load balancing mechanism.

DBMS_SERVICE.MODIFY_SERVICE(
   service_name        IN VARCHAR2,
   goal                IN NUMBER DEFAULT NULL,
   dtp                 IN BOOLEAN DEFAULT NULL,
   aq_ha_notifications IN BOOLEAN DEFAULT NULL,
   failover_method     IN VARCHAR2 DEFAULT NULL,
   failover_type       IN VARCHAR2 DEFAULT NULL,
   failover_retries    IN NUMBER DEFAULT NULL,
   failover_delay      IN NUMBER DEFAULT NULL,
   clb_goal            IN NUMBER DEFAULT NULL);

GOAL_NONE = 0 (Disables Load Balancing Advisory)


GOAL_SERVICE_TIME = 1 (Load Balancing Advisory is based on elapsed time for work done in the service plus available bandwidth to the service)
 

GOAL_THROUGHPUT = 2 (Load Balancing Advisory is based on the rate that work is completed in the service plus available bandwidth to the service)

CLB_GOAL_SHORT = 1 (Connection load balancing uses Load Balancing Advisory, when Load Balancing Advisory is enabled (either goal_service_time or goal_throughput). When GOAL=NONE (no load balancing advisory), connection load balancing uses an abridged advice based on CPU utilizatio)


CLB_GOAL_LONG = 2 (Balances the number of connections per instance using session count per service. This setting is recommended for applications with long connections such as forms. This setting can be used with Load Balancing Advisory when the connection pool is sized to accommodate gravitation within the pool itself (without adding or removing connections). The latter is the most efficient design).

Recommendation:
 In order to use CPU load balancing default service need to be changes to consider CPU connection load balancing on CLB parameter (set value to CLB_GOAL_SHORT or 1), and also set the load balancing goal to throughput (set value to GOAL_THROUGHPUT or 2).

Current setup is not using CLB

SQL> col name for a25
SQL> select inst_id,name,goal,clb_goal from gv$services;

   INST_ID NAME         GOAL         CLB_G
---------- -------------------- ------------
     1 PROD             NONE         LONG
     1 PROD_PRIM        NONE         LONG
     1 SYS$BACKGROUND   NONE         SHORT
     1 SYS$USERS        NONE         SHORT
     2 PROD             NONE         LONG
     2 PROD_PRIM        NONE         LONG
     2 SYS$BACKGROUND   NONE         SHORT
     2 SYS$USERS        NONE         SHORT

In order to change CLB goal as statements as per below
SQL> EXEC DBMS_SERVICE.MODIFY_SERVICE(service_name=>'PROD',goal=>2,clb_goal=>1);
SQL> EXEC DBMS_SERVICE.MODIFY_SERVICE(service_name=>'PROD_PRIM',goal=>2,clb_goal=>1);
 
GOODNESS Indicates how attractive a given instance is with respect to processing the workload that is presented to the service. A lower number is better. This number is internally computed based on the GOAL (LONG or SHORT) that is specified for the particular service.
SQL> select end_time,inst_id,service_name,cpupercall,goodness from gv$servicemetric where service_name like 'PROD%' order by end_time desc,inst_id;
 

END_TIME           INST_ID SERVICE_NAME         CPUPERCALL   GOODNESS
--------------- ---------- -------------------- ---------- ----------
03-JUL-12                2 PROD              2647.74074        800
03-JUL-12                1 PROD              406.694915       2300
03-JUL-12                1 PROD              1882.94935       2300
03-JUL-12                2 PROD              2024.03252        800

Testing:

In order to test these changes I will be using SwingBench tool which can simulate  Sales Order Entries transactions with a large number of sessions. SwingBench tool is open source and run on JAVA.

Swingbench will be setup as follows:
  • Two desktop running swingbench will be used as load generators.
  • A database called EBSTEST will be used. This is a 2 node RAC 11gR1 database.
  • First desktop load generator will be runnig swingbench with 500 sessions, each session will reconnect after 5000 transaction have been completed. All this sessions will be forced to connect EBSTEST1 (instance 1) of the RAC cluster. Doing this will simulate the unbalanced situation, where 500+ sessions are connect to node1 and less sessions connected to node2. Also CPU utilization will be unbalanced.
  • Second desktop load generator will be using the service_name EBSTEST which can conect either node of the cluster. Connecting to this server will consider LBA and CLB balancing algorithm. 500 session will be connected and each session will re-connect every 5000 transactions are completed.Next step is to un-assing the plan from the window


Note: First load generator can be seen here running 500 session and putting load in first node.

 Note: Load can be seen here first node LAVG is 31 where second node LAVG is 1.65.

 Note: sessions number is also unbalanced. 642 in instance 1 and 140 in instance 2.


Note: service load balancing is changed on the fly to consider CPU utilization. As you can see all new sessions are connected straight to instance 2 as you can see instance 1 still hold 644 sessions where instance 2 have gone from 140 to 338 sessions.

Note: as you can see node 2 LAVG has incresed from 1.6 to 4.39, confirming new  load balancing goal is now considering CPU utilization.

No comments:

Post a Comment