Tuesday, April 9, 2013

Datatype Space Usage: SMALLINT, BIGINT, etc

When a datatype (such as BIGINT) is chosen for a column, is space being allocated based on the 'value' , or is it a fixed allocation regardless of the value?

This is what I mean:

If a value of '5' being insert into the table, does it consume the same amount of space compared to value '9876543' ? Or does a '5' occupy less space than '9876543'? 

Let's find out. First, we create a single column table with type BIGINT. Then we insert roughly 150000 rows of value '1' into it.

This is the space used:

TAB             NPAGES
--------------- --------------------
BIGINT                           771


Let's repeat this test using value '99999999' instead. This is the space used:


TAB             NPAGES
--------------- --------------------
BIGINT                           771


That suggests there is no difference. 

That also means, if you know the range of the values ( number type, that is) for the column, you should pick the smallest possible datatype that would fit the range.

Below is a table showing the range for each numerical type:


Limit
Smallest SMALLINT value-32 768
Largest SMALLINT value+32 767
Smallest INTEGER value-2 147 483 648
Largest INTEGER value+2 147 483 647
Smallest BIGINT value-9 223 372 036 854 775 808
Largest BIGINT value+9 223 372 036 854 775 807
Largest decimal precision31
Maximum exponent (Emax) for REAL values38
Smallest REAL value-3.402E+38
Largest REAL value+3.402E+38
Minimum exponent (Emin) for REAL values-37
Smallest positive REAL value+1.175E-37
Largest negative REAL value-1.175E-37
Maximum exponent (Emax) for DOUBLE values308
Smallest DOUBLE value-1.79769E+308
Largest DOUBLE value+1.79769E+308
Minimum exponent (Emin) for DOUBLE values-307
Smallest positive DOUBLE value+2.225E-307
Largest negative DOUBLE value-2.225E-307
Maximum exponent (Emax) for DECFLOAT(16) values384
Smallest DECFLOAT(16) value1-9.999999999999999E+384
Largest DECFLOAT(16) value9.999999999999999E+384
Minimum exponent (Emin) for DECFLOAT(16) values-383
Smallest positive DECFLOAT(16) value1.000000000000000E-383
Largest negative DECFLOAT(16) value-1.000000000000000E-383
Maximum exponent (Emax) for DECFLOAT(34) values6144
Smallest DECFLOAT(34) value1-9.999999999999999999999999999999999E+6144
Largest DECFLOAT(34) value9.999999999999999999999999999999999E+6144
Minimum exponent (Emin) for DECFLOAT(34) values-6143
Smallest positive DECFLOAT(34) value1.000000000000000000000000000000000E-6143
Largest negative DECFLOAT(34) value-1.000000000000000000000000000000000E-6143

Tuesday, March 26, 2013

DB2 Subquery Removal Test 1


Does DB2 do subquery removal? Let's do a test and find out:

We will use a simplified TPC-H query, and faking 100000 rows for both tables:

Original Statement:
------------------
SELECT
  SUM(l_extendedprice) / 7.0 AS avg_yearly
FROM
  lineitem,
  part
WHERE
  p_partkey = l_partkey AND
  p_brand = '[BRAND]' AND
  p_container = '[CONTAINER]' AND
  l_quantity <
  (SELECT
     0.2 * AVG(l_quantity)
   FROM
     lineitem
   WHERE
     l_partkey = p_partkey
  )



The outcome? See the optimized statement:

Optimized Statement:
-------------------
SELECT
  (Q7.$C0 / 7.0) AS "AVG_YEARLY"
FROM
  (SELECT
     SUM(DECFLOAT(Q6.L_EXTENDEDPRICE, 34))
   FROM
     (SELECT
        Q5.L_EXTENDEDPRICE
      FROM
        (SELECT
           SUM(DECFLOAT(Q2.L_QUANTITY, 34)),
           COUNT(DECFLOAT(Q2.L_QUANTITY, 34))
         FROM
           (SELECT
              Q1.L_QUANTITY
            FROM
             LINEITEM AS Q1
            WHERE
              (Q1.L_PARTKEY = Q4.P_PARTKEY)
           ) AS Q2
        ) AS Q3,
        PART AS Q4,
        LINEITEM AS Q5
      WHERE
        (Q5.L_QUANTITY < (0.2 * (Q3.$C0 / Q3.$C1))) AND
        (Q4.P_CONTAINER = '[CONTAINER]') AND
        (Q4.P_BRAND = '[BRAND]') AND
        (Q4.P_PARTKEY = Q5.L_PARTKEY)
     ) AS Q6
  ) AS Q7




Notice that the subquery has vanished, but LINEITEM Q1 is still being used as a lateral view.
Can it do better than this? Can it reduce the access to LINEITEM from 2 to 1?

More on this later ....




Wednesday, March 6, 2013

"<> ALL" Multi-column Problem

For single column, everything is fine, but when you try multi-column set comparison, 'it' gets confused.

Let's start with a single column comparison:


db2 => select * from no_null where c1 <> all ( select c1 from some_null) ;

C1                       C2                    
------------------------ ------------------------

  0 record(s) selected.


So far so good.

Now, let's try two columns, c1 and c2:



db2 => select * from no_null where (c1, c2) <>ALL ( select c1, c2 from some_null) ;

SQL0104N  An unexpected token "(" was found following "".  Expected tokens may
include:  "".  SQLSTATE=42601


Fun eh?  ;)




Wednesday, February 13, 2013

VARCHAR, Trailing Spaces, and Unique Constraint


Another look at VARCHAR. This time, the implication of the 'important-ness' of trailing space(s) and constraint.

Let's start from a table with 3 rows:

db2 => select c1, length(c1) from smallt ;

C1   2
---- -----------
B              1
B              2
B              3

  3 record(s) selected.


As you can see, these three rows are 'unique' in the sense that they have different length. In Oracle, they are three distinct rows with all the 'expected' properties.

Can we create a unique index on it?

Let's try:


db2 => create unique index uqi on smallt (c1 );
DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL0603N  A unique index cannot be created because the table contains data
that would result in duplicate index entries.  SQLSTATE=23515


NOPE!

Another example of the 'useless' trailing spaces in VARCHAR.

VARCHAR and Equality Comparison

This is a follow-up on the previous post.

Now that we know trailing space(s) is 'not that important' as far as DB2 is concerned, what would you expect in a simple '=' comparison?

Let's do a little test:



db2 => create table smallt (c1 varchar(4)) ;
DB20000I  The SQL command completed successfully.

db2 => insert into smallt values 'B', 'B ', 'B  ';
DB20000I  The SQL command completed successfully.

db2 => select length(c1), c1 from smallt ;

1           C1
----------- ----
          1 B
          2 B
          3 B

  3 record(s) selected.


We have three rows with different 'length' due to the trailing space(s) we inserted. If we compare this column to itself, ie, c1 = c1, how many rows would you expect?

Three?

Let's see ...



db2 => select * from smallt t1, smallt t2 where t1.c1 = t2.c1 ;

C1   C1
---- ----
B    B
B    B
B    B
B    B
B    B
B    B
B    B
B    B
B    B

  9 record(s) selected.


Need I say more? ;)


Varchar: When a limit is unlimited

How much do you know about 'varchar' type in DB2?

I thought I knew, until I ran a little test and got a nasty surprise.

Here's a simple test case to illustrate:


db2 => create table smallt ( c1 varchar(1));
DB20000I  The SQL command completed successfully.

db2 => insert into smallt values('B             ') ;
DB20000I  The SQL command completed successfully.


db2 => select * from smallt ;

C1
--
B

  1 record(s) selected.



See the 'problem'?


This behaviour can be very disturbing for those used to Oracle DB. Another potential 'hidden' bug for Oracle developer doing DB2.

Wednesday, January 16, 2013

BLEVEL: How High Can You Go?

While investigating index splits, I encountered a fairly 'mind-numbing' number. But first, here's the basic set up on a database with default page size 4K:

CREATE TABLE LONGT (V1 CHAR(254)) ;
CREATE INDEX IDX_LT ON LONGT ( V1);

And populate this table with 50,000 rows of ascending 'numbers', starting with 1.

Guess what BLEVEL you get?

Drumroll please ... it is ... 5!

I'm beginning to wonder if this BLEVEL has the same meaning as the BLEVEL I have been exposed to ... ;)