Skip to main content

skip to main content

developerWorks  >  Information Management  >

DB2 Storage Trivia

developerWorks
Document options

Document options requiring JavaScript are not displayed


New site feature

Check out our new article design and features. Tell us what you think.


Rate this page

Help us improve this content


Level: Introductory

Blair AdamacheIBM

11 Aug 2001

Read this article for insights, tips and techniques for managing data in a DB2 storage system. You'll learn about storing data for optimum efficiency, EEE vs. non-partitioned databases, page size, system-managed vs. database-managed table spaces, micro-managing data storage, and more. Numerous examples are included.

Macro-managing DB2 Storage

A relational database presents a logical view of data only - so why do we care about the physical placement of data? There are a few reasons to care. First, not all disks in the storage system may have the same access speed. Given that all hardware we can assign to DB2 to manage may not be equal, then the data alaso may not be equal. If that's the case, it may be worthwhile to put the least-accessed data on the slowest devices.

Second, you may know things about your data that DB2 may never know:

  • what's going to happen in the future, such as projected sales in a given month,
  • phone numbers that will change when the local area code splits (thus altering a potential primary key),
  • or a company you plan to acquire, with its own data which now must be merged into an existing schema.

If none of these things are on the horizon, you'll have to be content with a cliche to justify reading the rest of this article: if you can't measure it, you can't manage it.

Let's start with some basics: how big can things get with DB2? Your first reference to answer these questions is Appendix A of the SQL Reference manual, which discusses common limits. Below is the list of storage limits, some of which are in the SQL Reference manual. These are also absolute limits. For limits that specific page sizes enforce, see the final table in Appendix A, "Database Manager Page Size Specific Limits". Note that some limits are "per partition" or "per node". These are absolute limits for DB2 Personal, Workgroup and Enterprise Editions, but they can be multiplied by 999 for Enterprise - Extended Edition. (It allows 999 nodes, also known as partitions.)

Most tables in an SMS table space65 534
Most tables in a DMS table space51 000
Most table spaces in a database4096
Most columns in a table500-1012 (varies by page size)
Most columns in a view5 000
Maximum length of a row including all overhead4005-32 677 bytes (varies by page size)
Longest varchar32 672 bytes
Longest long field (CLOB, BLOB)2 GB
Maximum size of a table per partition64-512 GB (varies by page size)
Maximum size of an index per partition64-512 GB (varies by page size)
Most rows in a table per partition4,000,000,000
Longest index key including all overhead1024 bytes
Most columns in an index key 16
Most indexes on a table32 767 or storage
Most tables referenced in an SQL statement/viewstorage
Most host variable references in an SQL statement32 767
Longest SQL statement (in bytes)65 535
Most elements in a select list1012
Maximum number of constraints on a tablestorage
Most values in an INSERT statement1012
Most columns in a UNIQUE constraint
(supported with a UNIQUE index)

16
Maximum combined length of columns in a UNIQUE
constraint (supported with a UNIQUE index)

1012
Maximum combined length of referencing1024
Maximum number of columns in a
partitioning key

500
Maximum concurrent users of server64 000
Maximum run-time depth of cascading triggers16
Maximum size of a regular DMS table space512 GB
Maximum size of a long DMS table space2 TB
Maximum size of a temporary DMS table space2 TB
Maximum number of concurrent users per instance64 000
Maximum partition number999
Most table objects in DMS table space51 000
Largest database (excluding long fields)4096*65,535*0.5 TB
Largest EEE database (excluding long fields)4096*65,535*0.5 TB * 999 nodes

There are three basic choices you make which guide the limits above:

  • EEE versus non-partitioned databases (such as Workgroup or Enterprise Edition)
  • Choosing a page size of 4K, 8K, 16K or 32K
  • SMS versus DMS table spaces (System Managed versus Database Managed)

EEE versus non-partitioned databases
Choosing Enterprise - Extended Edition over another DB2 database edition increases most physical limits by three orders of magnitude, so this is your easier weapon to break the sound barrier (or whatever the equivalent is for your users). Implications and advantages of choosing EEE are discussed in two other articles:

  1. http://www.ibm.com/developerworks/db2/library/techarticle/adamache/0430_adamache2.html
  2. http://www.ibm.com/developerworks/db2/library/techarticle/adamache/0529_adamache.html

Choosing a page size of 4K, 8K, 16K or 32K
4K pages are the default. For each additional page size you want, you must create bufferpools, table spaces, and temporary table spaces with matching page sizes. The primary benefits of larger page sizes are tables that can contain:

  • More data (64 GB with a 4K page size, doubling at 8K, 16K and 32K to 512 GB)
  • More columns (500 with a 4K page size, 1012 with all larger page sizes)
  • Wider rows (4005, 8101, 16 293, or 32 677 bytes)
  • Wider varchars (roughly the same as the row width)


Back to top


SMS versus DMS table spaces (System Managed versus Database Managed): The Raw and the Cooked

SMS stands for system managed storage: all DB2 objects are managed by the operating system using the file system. This is ideal for table spaces that grow and shrink rapidly (such as temporary table spaces), or all data in databases where you cannot perform a lot of maintenance or monitoring. DMS (Database Managed Storage) requires you to predict the growth of data, and/or be ready to add containers to table spaces that are running short on disk space. The containers can be either files in the file system or raw devices. DMS table spaces on raw devices (and raw logs) are called "raw". SMS table spaces and DMS table spaces that use the file system are called "cooked" (perhaps in homage to the Fine Young Cannibals or Claude Levi-Strauss - you pick).

Let's first position SMS versus DMS:

SMSDMS
StripingYesYes
(both can stripe data for parallel I/O)
Object ManagementOperating system (with unique file names)DB2
Space AllocationGrows/shrinks on demandPre-allocated
Ease Of AdministrationGoodDBA-intensive
PerformanceGoodExcellent

As mentioned in the 2nd referenced article above (2), System Managed Storage (SMS) table spaces are appropriate for system data (temporary table spaces and the catalog table space) and databases that have to be low-maintenance because they are too numerous to warrant a lot of investment in design and maintenance. The rapid allocation and de-allocation of disk space that temporary table spaces require makes SMS the right choice

DMS gives you the ability to have a table's data, long fields (such as LOBs) and indexes in up to three separate table spaces. This gives you the option of putting long fields on slower disks and indexes on faster disks. You also get more granular backup and restore options. For long fields (LOBs or LONG VARCHARs), use SMS or DMS with file system containers. Note that LONG VARCHARs should be avoided: a VARCHAR can be just as long, and a LONG VARCHAR is an absolute subset of a LOB in terms of function and standards adherence. LONG VARCHAR is just a data type left around from DB2 Version 1 which DB2 honors for backward compatibility.

Placing long fields in containers on the file system (whether SMS or DMS) allows them to benefit from operating system file system caching. DB2 will not use its own memory (in the buffer pool) to cache long fields. For the catalog table space: use SMS or DMS with file system containers and a small extent size (2 or 4 pages). There are a lot of relatively small tables in the catalog tablespace, and DMS requires twice as many overhead extents per table (two versus one for SMS). Also, several LOB columns exist in the catalog tables, and these benefit from file system caching in the same manner as user LOBs described above.

Choosing SMS versus DMS for regular user data depends on a few things. Avoid DMS on the file system, as it combines the weaknesses of SMS and DMS (the potential slower speed of the file system compared to raw devices, but the same requirement for extra administration to predict and measure growth for the addition of containers required by the table as it grows). To maximize convenience and achieve very good performance, choose SMS, especially if you're on a sophisticated operating system with an advanced file system like recent releases of AIX and Solaris. These operating systems are good at caching critical files into memory. To maximize performance, choose DMS with raw containers. Raw containers generally outperform file containers because they avoid the extra path through the operating system and its file system for DB2 to read the data.

Deciding how to allocate tables across table spaces depends on a few things. DB2 for OS/390 DBA's tend to like putting each table in its own table space. This is right for DB2 on Unix and Windows only some of the time. Tables related through referential integrity should be in the same table space if you ever want to use rollforward point in time and avoid a check pending state (the way DB2 freezes tables in an RI relationship if they're at different points in time, with a possible referential constraint violation). On the other hand, tables which are the targets of LOAD should be in their own table space, as LOAD requires exclusive access to the table space in DB2 Version 7.2 and earlier (this restriction will be lifted in the future). What if you LOAD into tables that require RI? These should be in the same table space. Since they're linked with RI, when one of them is offline during a LOAD, you probably don't want transactions against the others anyway. If you want to permit SELECTs against the tables in an RI relationship that are not being LOADed, put them in different table spaces.



Back to top


Micro-managing DB2 Storage

The Macro-managing DB2 Storage section above discusses controlling where tables go. Now let's look at how we can micro-manage data, and decide where rows and columns go. The danger of trying to control which sector every byte occupies on disk is that you'll make a poorer choice than DB2 would. If you know things that DB2 can never know, you may be able to optimize performance or usage of space. For example, DB2 does not have range partitioning. Instead, in clustered machine environments (or with multiple logical partitions on a large SMP), rows are hashed across all partitions to ensure an even distribution of data. This is ideal for ad hoc queries. For data that gets rolled in and out of a warehouse, range partitioning has advantages. In particular, if you need to keep only the last n decades, years, months, weeks, days, hours, minutes, or seconds of data online, the ability to add July 2001 data when you delete July 2000 data is powerful. It's a good way to conserve disk space. However, if all July data is clustered, the hardware that controls it can become overworked (consider the number of queries that will request July data when bank customers receive their July statements in the mail in the first week of August). Overworking the node with July data defeats the purpose of paying for multiple machines to build a data warehouse: you will have bought more hardware than you're using if four processors are getting saturated with requests for July data, while 44 processors do little with the other eleven months of data. How many customers phone with a question on their January statement in August?

Range partitioning can also be extra work for the DBA. If you compare benchmark kits in the FDR's (Full Disclosure Reports) at http://www.tpc.org/tpch/results/h-ttperf.idc, you'll see that creating the schema in database management systems with range partitioning takes about four times the number of DDL lines as needed to create the schema with DB2's hash partitioning. In the predecessor to TPC-H, TPC-D, the difference was even more dramatic: 10:1. TPC-D allowed materialized views, summary tables and multi-column indexes (and lots of them), so other factors contributed to the difference as well.

With DB2, if you want all the July data on one partition, you need to create a table for each month, and use UNIONs to bring all the data together. This is not optimal on EEE (EEE is shared nothing, which works better when data is randomly distributed), and UNIONs are not as efficient as other methods for combining answer sets. A better choice is to let DB2 use hash partitioning to spread the data across all nodes, and take advantage of the arrival of data to achieve the benefits of range partitioning while still preserving the benefits of hash partitioning. After all, the July data tends to arrive in July. What better time to issue an ALTER TABLE statement that will keep all the July data contiguous on each node?



Back to top


Home on the Range

Before we reveal this diabolic magic, let's all recall that rows in a relational table have no inherent order. Our users and application programmers must understand that an ORDER BY is required to get data in the order they want. The clustering of data by month (or which ever other range you choose) is a happy accident (one that will not be repeated if the July 2001 rows outnumber the July 2000 rows, and you didn't reserve enough additional space). This is also a good time to recall that columns also have no inherent order in a relational table. We'll have to wait until the end of this article to subvert that principle.

We start by choosing a range. Since we've been using month, let's start with that. In our warehouse, we maintain 13 months of data. When July 2001 arrives, it will replace June 2000. The July 2001 data should be larger: apart from the fact that July has one more day than June, we normally expect our business to grow with the economy (or faster, if we have stock options). Let's assume a growth of 15%, and that before we have to worry about 2002, and the compound growth of subsequent years, we'll have a chance to reorganize the table. We create the table with 15% free space reserved over each page to save space for future growth. Normally this is for inserts, but in our case it will be to have the July 2001 rows replace the June 2000 rows when we roll data in and out. The example below will use a single node table to demonstrate. After creating the table, reserve space with an index on the range we care about. PCTFREE is where we specify to save 15%:

            CREATE TABLE RECEIPTS
                           (RECEIPT_DATE DATE NOT NULL, CUST_NUM INT NOT NULL,
                           RECEIPT_KEY TIMESTAMP NOT NULL, AMOUNT DEC(10,2),
                             PRIMARY KEY(CUST_NUM, RECEIPT_KEY))
                             PARTITIONING KEY (CUST_NUM, RECEIPT_KEY)

            CREATE INDEX DATE_IND ON RECEIPTS 
                         (RECEIPT_DATE) PCTFREE 15

Now data is inserted month by month:

           INSERT INTO RECEIPTS VALUES ('2000-06-06',1,CURRENT TIMESTAMP,1)
           ….
           INSERT INTO RECEIPTS VALUES ('2001-06-16',40,CURRENT TIMESTAMP,33)

When July 2001 data arrives, we delete June 2000 data:

          DELETE FROM RECEIPTS 
                      WHERE MONTH(RECEIPT_DATE)=6 
                      AND YEAR(RECEIPT_DATE)=2000

Now we can begin inserting July 2001 data:

          INSERT INTO RECEIPTS VALUES ('2001-07-01',1,CURRENT TIMESTAMP,1)

If July 2001 has more data than the month it's overlaying (June 2000) we'll have to alter the PCTFREE value. Note that while initial PCTFREE is specified on the CREATE INDEX, DB2 has no ALTER INDEX statement. You establish the initial PCTFREE on the CREATE INDEX, but after that you adjust it with ALTER TABLE:

          ALTER TABLE RECEIPTS PCTFREE 10
   Or
          ALTER TABLE RECEIPTS PCTFREE 0

The examples above are extremely simple. We have not looked at the definition of the nodegroup (which would specify which partitions RECEIPTS was spread over) or how load would be handled if monthly data were loaded in batch. We have also avoided clustering indexes, and we are storing redundant data (the timestamp is a superset of the RECEIPT_DATE column). Why have we gone through all this pain? Because at any month, all the data for a particular month is contiguous on disk within each node, but all data for that month is spread equally over all nodes using the hash key on the partition key (CUST_NUM, RECEIPT_KEY). Note the column we query for the month (RECEIPT_DATE) is not part of the partition key. When we ask for July 2001 rows in a query, each node supplies an equal number of rows (thus putting all hardware to work, and getting maximum benefit out of distributing the data) but within each node, the July 2001 rows are contiguous on disk: they'll end up in the buffer pools together, and probably stay in memory throughout August 2001, making queries faster as customers phone in with questions on their July statements. Printing and querying the July statements should be faster too, as the July rows being contiguous on disk may result in faster I/O. Finally, we've saved disk space by plunking down the July 2001 rows in the space vacated by the June 2000 rows, which were deleted when they became obsolete. And I'll say it a third time: July 2001 can be 15% bigger than June 2000 because we reserved free space with PCTFREE on the index. We can put rows into this free space by altering the table to set PCTFREE lower.



Back to top


Haste Makes Waste

The example above takes advantage of DB2's willingness to scan through a table on an INSERT to put the new row into an empty slot. In this case, we delete June, and DB2 finds the empty slots when inserting July 2001 data. This makes INSERTs slower so that SELECTs can be faster. What about the opposite condition: when we want INSERTs to be fast, but don't anticipate many SELECT statements on the data (and perhaps don't care about disk space). We continue our banking example: the data warehouse keeps 13 months of data online, but the bank wouldn't actually delete the June 2000 data. Auditors love to go back in time seven or ten years and ask why you didn't report that $100 you found on the street and put into your bank account. Since 99% of the requests are just for the previous 13 months, our application will create a table for old data in a table space built on some old slow disks:

CREATE TABLE OLD_RECEIPTS LIKE RECEIPTS IN SLOW_DISK_TBSP

Before we delete the June 2000 rows, we use an INSERT with a subselect to move them to the OLD_RECEIPTS table:

	INSERT INTO OLD_RECEIPTS 
                          SELECT * FROM RECEIPTS 
                                       WHERE MONTH(RECEIPT_DATE)=6 
                                        AND YEAR(RECEIPT_DATE)=2000

Now we can delete the June 2000 rows:

	DELETE FROM RECEIPTS 
                               WHERE MONTH(RECEIPT_DATE)=6 
                               AND YEAR(RECEIPT_DATE)=2000

Note that in this scenario, we may not care about saving disk space for the devices used by the table space SLOW_DISK_TBSP. Consider that since we just move data into this table, and less than 1% of our queries go against it, a few things are likely true:

  1. The table grows as long as our bank is in business
  2. There's no point having DB2 search for empty slots when inserting new rows, since rows just get inserted into this table, and never get deleted or updated. How often does a bank change a record for a transaction that occurred more than thirteen months ago?

We could have told DB2 to never search for empty slots when performing INSERTs by altering the table to switch APPEND ON:

ALTER TABLE OLD_RECEIPTS APPEND ON

This will make INSERTs faster. APPEND can be turned off, should our bank ever need to quickly do some creative accounting, and hit the OLD_RECEIPTS table with a bunch of "discovered" INSERTs, UPDATEs and DELETEs for the late 1990's.

The Fifth Column
You've learned some clever things about how to order rows on disk, but remember that if a user cares about order, they must specify this with an ORDER BY. DB2 will never guarantee the delivery of rows in any order without an ORDER BY. Access plan changes that result from maintenance or new releases, or changes in the speed in which individual processors can deliver rows to a coordinator node make the order of any result set arbitrary unless you specify an ORDER BY. Columns are the same but different. Specify the order you want the columns in with the SELECT statement. A SELECT * brings back the columns in the order they were specified when the table was created. Does the order of columns matter? Maybe. Please note that the information that follows applies to DB2 on Windows, Unix and OS/2 only. Column handling is different on other platforms. Further, what's discussed below could change in any future DB2 release (although if it ever did change, major engineering feats would be required to move all tables created in DB2 Version 7.2 and earlier to the new format).

To have faster SELECT operations, there is no optimal way to order columns when you create the table. However, there is an advantage to listing the most frequently updated columns first if you know what they will be when the table is created. When a row is UPDATEd, DB2 must write the results of the UPDATE to the log and immediately, or eventually, to the disk where the row is stored (typically when the buffer pool is flushed). The entire row is not written to the log; only the columns that change need to be written. What gets written to the log is the first column in the row, and every row thereafter until DB2 reaches the final column that changed. If the customer's social insurance number never changes, put it near the end of the row. Stuff that changes all the time, like account balance, should be at or near the beginning. Stuff that changes once in a while, such as address, should be between the account balance and the social insurance number.

Let's look at how things are laid out on disk. If you create a table with only fixed length columns, they are laid out exactly as the table is specified in the CREATE statement:

           CREATE TABLE TESTORD (COL1 INT, COL2 CHAR(5), COL3 DEC(10,2), COL4 FLOAT)

Figure 1.

If a table has variable length columns such as VARCHARs, these are still ordered in the order specified in the CREATE TABLE, but the variable data itself is at the end of the row:

           CREATE TABLE TESTORD (COL1 INT, COL2 VARCHAR(5), COL3 DEC(10,2))

Figure 2.

If the table has a long field, it will not be inline with each row. Since row width is limited by the page size (4K to 32K), the row has only a pointer to the long field, which is placed on database pages apart from the row:

          CREATE TABLE TESTORD (COL1 INT, COL2 CLOB(100 K), COL3 DEC(10,2))

Figure 3.



Back to top


One Stop Shopping

Indexes create some data redundancy, but hide this from the user (the user never has to worry about updating the index when they update the table - that's DB2's job). DB2 relies more heavily on indexes that many other database management systems, mainly due to its sophisticated optimizer. "One stop shopping" is known as index-only access in the database world: if all the data you need for a SELECT is in the index, DB2 has to read the index only. Consider the following table and its index on NAME, which is the primary key:

                 CREATE TABLE ADDRESS
                                        (NAME CHAR(20) not null,
                                        STREET CHAR(20),
                                        CITYPROV CHAR(20),
                                        POSTCODE CHAR(7),
                                        HOMEPHONE CHAR(12),
                                        WORKPHONE CHAR(12),
                                        NOTES CHAR(20))
 
	   CREATE UNIQUE INDEX NAME_INDX ON ADDRESS (NAME)

If you issue:

SELECT NAME FROM ADDRESS 
or 
SELECT NAME FROM ADDRESS WHERE NAME = 'Adamache, Blair' 

DB2 will gather the results of the query by scanning the index only. If you issue:

SELECT NAME, WORKPHONE FROM ADDRESS 

DB2 will scan the table: the index provides no advantage. You can test these access paths by issuing all of the above commands from the Command Center, and using the "Create Access Plan" icon to see the path taken by the DB2 optimizer to the data. All three SELECTs above do one stop shopping: either the table or the index alone can satisfy the query. What about when the optimizer wants to use an index, but then needs to read from the table to pick up other columns? This SELECT will require both an index scan and reading the table:

SELECT NAME, WORKPHONE FROM ADDRESS WHERE NAME = 'Adamache, Blair'

If this is a common access method (people want to see the phone number for someone and start by searching on the name), you can tell DB2 to include the extra column in the index. WORKPHONE will not be part of the primary key: it will just be included in the index so that more queries can be satisfied with index-only access. Start by getting rid of the single column index:

DROP INDEX NAME_INDX

Now recreate the index and include the extra column or columns users typically want to see:

	  CREATE UNIQUE INDEX NAME_INDX ON ADDRESS (NAME) 
                               INCLUDE (WORKPHONE, HOMEPHONE)

Remember, the more extra columns that tag along, the longer UPDATES and INSERTS will take: each included column now has to be changed in both the table and the index. With the include index, the following query can now be run with index-only access:

	  SELECT NAME, WORKPHONE FROM ADDRESS WHERE NAME = 'Adamache, Blair'



Back to top


Conclusion

Having shown you a few tricks to determine where rows and columns end up, let's remember that the human's job is to be the DBA - the Database Administrator. DB2's job is to be the Database Manager. In most cases DB2 will make sound decisions based on the schema and indexes created against it. I've always stressed taking advantage of DB2 EEE's shared nothing architecture to get your money's worth out of your hardware. The same applies to the software. You paid money for the hardware, and you also paid something (I hope) for the software: DB2's optimizer, data manager, and its relational engine. Some of the techniques above are worth pursuing for the critical tables in a database application. You don't need to pursue all or any of them for every table created in your database. If you have time to do this, you could make a career out of writing articles and books on DB2 that tell other people how to tune DB2.



About the author

Blair is a seventeen-year veteran of the IBM Toronto Lab. He uses names for primary keys in his address book, and consequently will never have two friends with the same name.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top


IBM, Informix, Informix Dynamic Server and DataBlade are trademarks or registered trademarks of IBM Corporation in the United States, other countries, or both. Windows and Windows NT are registered trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Other company, product, and service names may be trademarks or service marks of others. Other company, product, or service names may be trademarks or service marks of others.