Saturday, July 6, 2013

DB2 10.5 : Is ORGANIZE BY COLUMN a magic cure for all?

Unless your system is 100% pure OLAP, the quick answer to this is a big 'NO'.

This short demo tells it all:

CREATE TABLE T1 ( C1 BIGINT NOT NULL PRIMARY KEY) ORGANIZE BY COLUMN;

So, we have a single-column table built with the 'COLUMN ORGANIZED' method. Since it is a primary key, there exists an index on the same column. So, do we get the 'best of both worlds'? Let's take a look.

Populate about 100 thousands rows into this table, and issue this simple query:

SELECT * FROM T1 WHERE C1 = 3;

We would 'expect' that the engine would use the primary key index to locate this value. Let's see the execution plan:

Access Plan:
-----------
Total Cost: 440.126
Query Degree: 2

        Rows
       RETURN
       (   1)
        Cost
         I/O
         |
          1
       CTQ  
       (   2)
       440.126
         480
         |
          1
       TBSCAN
       (   3)
       440.125
         480
         |
       195040
 CO-TABLE: ADMIN  
         T1
         Q1



One thing that stands out is that the cost is over 400 for a single row search. Not good. 

So, in a system where we anticipate mixed workload,  a COLUMN ORGANIZED table could potentially flood your buffer cache when there are plenty of simple singleton query. Of course, if you have a PERFECT foresight that you know 100% for sure that certain table will always be OLTP access pattern and certain table always be OLAP, then choosing ROW/COLUMN organize table is a breeze. 

I'm just curious, who has that ability? ;)

Having said that, I do see this feature lends itself naturally to a pure data warehouse system. For the rest of us, stay alert!