Monday, September 9, 2013

ORGANIZE BY COLUMN - Part 2

In an earlier post, I demonstrated that DB2 would do a TABLE SCAN for a simple equality predicate on a primary key column. While that looks bad ( ya, looks REALLY BAD), something else is taking place under the hood. In most cases, the TABLE SCAN is not what we think it is.

While a TABLE SCAN is understood to scan the entire table, it is not really necessary if certain condition is met in an ORGANIZE BY COLUMN table. If the data type is datetime/boolean/numeric, the storage engine is able to perform DATA SKIPPING, which avoid scanning data ranges that *do not qualify* the predicate. Any pages picked up by the SCAN operation are guaranteed to contain data of interest. These pages might contain non-qualifying AND qualifying data, depending on the distribution. As a result, a TABLE SCAN pick up much less pages than necessary, and not the entire table is being processed.

Note that as of this writting ( 9-Sep-2013) , varchar/char column do not allow DATA SKIPPING, unless that column is an (enforced) primary key/unique.

Non-enforced uniqueness do not allow DATA SKIPPING as well.

There you go.