Background:
Have you seen your production RAC database using lots of CPU in 1 node and NONE on other nodes? Node1 seems to be particularly busy comparing to Node2, LAVG reaches 40+ in Node1, comparing to LAVG 2+ in Node2.
Question is Why?
This is because the database does not consider CPU utilization at the time of connecting a session. As result we have seen unbalanced CPU utilization across nodes in the cluster. In order to fix this problem, Oracle provides the ability to change the RAC load balancing algorithm. This is done by configuring the default service to use two load balancing features “Load Balancing Advisory” (LBA) and “Connection Load Balancing” (CLB). This change can be done on the fly with using the API “dbms_service.modify_service”.
Have you seen your production RAC database using lots of CPU in 1 node and NONE on other nodes? Node1 seems to be particularly busy comparing to Node2, LAVG reaches 40+ in Node1, comparing to LAVG 2+ in Node2.
Question is Why?
This is because the database does not consider CPU utilization at the time of connecting a session. As result we have seen unbalanced CPU utilization across nodes in the cluster. In order to fix this problem, Oracle provides the ability to change the RAC load balancing algorithm. This is done by configuring the default service to use two load balancing features “Load Balancing Advisory” (LBA) and “Connection Load Balancing” (CLB). This change can be done on the fly with using the API “dbms_service.modify_service”.
According to the Oracle documentation DBMS_SERVICE uses the following parameters to modify load balancing mechanism.
DBMS_SERVICE.MODIFY_SERVICE(
service_name IN VARCHAR2,
goal IN NUMBER DEFAULT NULL,
dtp IN BOOLEAN DEFAULT NULL,
aq_ha_notifications IN BOOLEAN DEFAULT NULL,
failover_method IN VARCHAR2 DEFAULT NULL,
failover_type IN VARCHAR2 DEFAULT NULL,
failover_retries IN NUMBER DEFAULT NULL,
failover_delay IN NUMBER DEFAULT NULL,
clb_goal IN NUMBER DEFAULT NULL);
GOAL_NONE = 0 (Disables Load Balancing Advisory)
GOAL_SERVICE_TIME = 1 (Load Balancing Advisory is based on elapsed time for work done in the service plus available bandwidth to the service)
GOAL_THROUGHPUT = 2 (Load Balancing Advisory is based on the rate that work is completed in the service plus available bandwidth to the service)
CLB_GOAL_SHORT = 1 (Connection load balancing uses Load Balancing Advisory, when Load Balancing Advisory is enabled (either goal_service_time or goal_throughput). When GOAL=NONE (no load balancing advisory), connection load balancing uses an abridged advice based on CPU utilizatio)
CLB_GOAL_LONG = 2 (Balances the number of connections per instance using session count per service. This setting is recommended for applications with long connections such as forms. This setting can be used with Load Balancing Advisory when the connection pool is sized to accommodate gravitation within the pool itself (without adding or removing connections). The latter is the most efficient design).
service_name IN VARCHAR2,
goal IN NUMBER DEFAULT NULL,
dtp IN BOOLEAN DEFAULT NULL,
aq_ha_notifications IN BOOLEAN DEFAULT NULL,
failover_method IN VARCHAR2 DEFAULT NULL,
failover_type IN VARCHAR2 DEFAULT NULL,
failover_retries IN NUMBER DEFAULT NULL,
failover_delay IN NUMBER DEFAULT NULL,
clb_goal IN NUMBER DEFAULT NULL);
GOAL_NONE = 0 (Disables Load Balancing Advisory)
GOAL_SERVICE_TIME = 1 (Load Balancing Advisory is based on elapsed time for work done in the service plus available bandwidth to the service)
GOAL_THROUGHPUT = 2 (Load Balancing Advisory is based on the rate that work is completed in the service plus available bandwidth to the service)
CLB_GOAL_SHORT = 1 (Connection load balancing uses Load Balancing Advisory, when Load Balancing Advisory is enabled (either goal_service_time or goal_throughput). When GOAL=NONE (no load balancing advisory), connection load balancing uses an abridged advice based on CPU utilizatio)
CLB_GOAL_LONG = 2 (Balances the number of connections per instance using session count per service. This setting is recommended for applications with long connections such as forms. This setting can be used with Load Balancing Advisory when the connection pool is sized to accommodate gravitation within the pool itself (without adding or removing connections). The latter is the most efficient design).
Recommendation:
In order to use CPU load balancing default service need to be changes to consider CPU connection load balancing on CLB parameter (set value to CLB_GOAL_SHORT or 1), and also set the load balancing goal to throughput (set value to GOAL_THROUGHPUT or 2).
Current setup is not using CLB
SQL> col name for a25
SQL> select inst_id,name,goal,clb_goal from gv$services;
INST_ID NAME GOAL CLB_G
---------- -------------------- ------------
1 PROD NONE LONG
1 PROD_PRIM NONE LONG
1 SYS$BACKGROUND NONE SHORT
1 SYS$USERS NONE SHORT
2 PROD NONE LONG
2 PROD_PRIM NONE LONG
2 SYS$BACKGROUND NONE SHORT
2 SYS$USERS NONE SHORT
SQL> select inst_id,name,goal,clb_goal from gv$services;
INST_ID NAME GOAL CLB_G
---------- -------------------- ------------
1 PROD NONE LONG
1 PROD_PRIM NONE LONG
1 SYS$BACKGROUND NONE SHORT
1 SYS$USERS NONE SHORT
2 PROD NONE LONG
2 PROD_PRIM NONE LONG
2 SYS$BACKGROUND NONE SHORT
2 SYS$USERS NONE SHORT
In order to change CLB goal as statements as per below
SQL> EXEC DBMS_SERVICE.MODIFY_SERVICE(service_name=>'PROD',goal=>2,clb_goal=>1);
SQL> EXEC DBMS_SERVICE.MODIFY_SERVICE(service_name=>'PROD_PRIM',goal=>2,clb_goal=>1);
SQL> EXEC DBMS_SERVICE.MODIFY_SERVICE(service_name=>'PROD_PRIM',goal=>2,clb_goal=>1);
GOODNESS Indicates how attractive a given instance is with respect to processing the workload that is presented to the service. A lower number is better. This number is internally computed based on the GOAL (LONG or SHORT) that is specified for the particular service.
SQL> select end_time,inst_id,service_name,cpupercall,goodness from gv$servicemetric where service_name like 'PROD%' order by end_time desc,inst_id;
END_TIME INST_ID SERVICE_NAME CPUPERCALL GOODNESS
--------------- ---------- -------------------- ---------- ----------
03-JUL-12 2 PROD 2647.74074 800
03-JUL-12 1 PROD 406.694915 2300
03-JUL-12 1 PROD 1882.94935 2300
03-JUL-12 2 PROD 2024.03252 800
END_TIME INST_ID SERVICE_NAME CPUPERCALL GOODNESS
--------------- ---------- -------------------- ---------- ----------
03-JUL-12 2 PROD 2647.74074 800
03-JUL-12 1 PROD 406.694915 2300
03-JUL-12 1 PROD 1882.94935 2300
03-JUL-12 2 PROD 2024.03252 800
Testing:
In order to test these changes I will be using SwingBench tool which can simulate Sales Order Entries transactions with a large number of sessions. SwingBench tool is open source and run on JAVA.
Swingbench will be setup as follows:
- Two desktop running swingbench will be used as load generators.
- A database called EBSTEST will be used. This is a 2 node RAC 11gR1 database.
- First desktop load generator will be runnig swingbench with 500 sessions, each session will reconnect after 5000 transaction have been completed. All this sessions will be forced to connect EBSTEST1 (instance 1) of the RAC cluster. Doing this will simulate the unbalanced situation, where 500+ sessions are connect to node1 and less sessions connected to node2. Also CPU utilization will be unbalanced.
- Second desktop load generator will be using the service_name EBSTEST which can conect either node of the cluster. Connecting to this server will consider LBA and CLB balancing algorithm. 500 session will be connected and each session will re-connect every 5000 transactions are completed.Next step is to un-assing the plan from the window
Note: First load generator can be seen here running 500 session and putting load in first node.
Note: Load can be seen here first node LAVG is 31 where second node LAVG is 1.65.
Note: sessions number is also unbalanced. 642 in instance 1 and 140 in instance 2.