Wednesday, May 22, 2013

A 'Better' Execution Plan Graph

One crucial number that is missing from the standard plan generated by 'EXPLAIN PLAN' is the actual rows flowing in each node of the graph.  Without it, it is close to impossible to judge the efficiency of the plan. How do you decide if the 'optimizer has underestimated/overestimated the row source' at any point of the execution?

A pretty neat feature which first appeared in v9.7 and unfortunately still largely unknown in v10.1, is the 'section actuals', which can be set at the database level as  SECTION_ACTUALS, or at the individual application using WLM_SET_CONN_ENV.

You will also need to capture the stats using activity event monitor, eg: ALTER WORKLOAD WL1 COLLECT ACTIVITY DATA WITH DETAILS,SECTION, etc etc .... 

But assuming that all you want to do is to do simple test on certain query, there is an easier way. Hidden somewhere in the sea of binaries is a utility called 'db2caem'. It's really easy to use. 

Let's illustrate this using a simple example. Assuming that you want to test your query 'select count(*) from t1', all you need to do is issue this:

db2caem -d db2test -st "select count(*) from t1"


      ____________________________________________________________________

                      _____     D B 2 C A E M     _____

                DB2 Capture Activity Event Monitor data tool
                              I      B      M


          The DB2CAEM Tool is a utility for capturing the activity event
         monitor data with details, section and values, as well as actuals.
          which could be used when analyze the SQL statement performance.
      ____________________________________________________________________


________________________________________________________________________________

DBT7041I  "db2caem": The db2caem utility is connecting to the following database: "db2test".
DBT7038I  "db2caem": The db2caem utility successfully connected to the following database: "db2test".
DBT7042I  "db2caem": The SQL statement "select count(*) from t1" is being issued.
DBT7043I  "db2caem": The db2caem utility is disconnecting from the following database: "db2test".
DBT7039I  "db2caem": The db2caem utility successfully disconnected from the following database: "db2test".
DBT7000I  db2caem completed. Output path: "C:\\DB2CAEM_2013-05-22-16.58.09.762000".



That's pretty much to it. The execution plan you want is in the directory ( as shown on the last line) "C:\\DB2CAEM_2013-05-22-16.58.09.762000"


Look for the file named 'db2caem.exfmt' inside, and you can see something like below:


-------------------------------
< some stuff removed for clarity>


Original Statement:
------------------
select
  count(*)
from
  t1


Optimized Statement:
-------------------
SELECT
  Q3.$C0
FROM
  (SELECT
     COUNT(*)
   FROM
     (SELECT
        $RID$
      FROM
        XXXXXX.T1 AS Q1
     ) AS Q2
  ) AS Q3

Explain level:    Explain from section

Access Plan:
-----------
Total Cost: 78.1959
Query Degree: 1

      Rows
   Rows Actual
     RETURN
     (   1)
      Cost
       I/O
       |
        1
        1
     GRPBY
     (   2)
     78.1957
       NA
       |
      10001
      10001
     IXSCAN
     (   3)
     77.8301
       NA
       |
       -1
       NA
 INDEX: XXXXXXX
      T1X2
       Q1

<more stuff removed> ...


Note that now there are *two* numbers, one is the standard estimated rows, and the one immediately below it is the *actual* rows.

Easy eh? :)

While this is all good, unfortunately the interpretation of the numbers can be very confusing for certain execution plan. I shall leave that for another post ...

No comments:

Post a Comment