Performance & Tuning Development Training Publications Seminars Company
 

Data Partitioning in ASE 15

 

Published by Wordware

ABOUT THE AUTHORS

 

Jeff Garbus’s background includes a bachelor’s degree from Rensselaer Polytechnic Institute, and work experience from PCs to mainframes and back again. He has many years of Microsoft and Sybase SQL Server experience, with a special emphasis on assisting clients in migrating from existing systems to pilot and large-scale projects. He is well known in the industry, having spoken at user conferences and to user groups for many years, written articles and columns for many magazines nationally and internationally, as well as having written several books. Recently his focus has been on very large databases, data warehousing, training, and remote database administration. Jeff has been in consulting for fourteen years, training for nine, and in the software business for eighteen. He has a demonstrated talent for staying at the leading edge of technology, as well as transferring his knowledge to others. Jeff currently is CEO of Soaring Eagle Consulting, an RDBMS consulting and training firm based in Tampa, which specializes in solving business problems as well as performance problems. Soaring Eagle now offers remote database administration services.

 

 

 

Ashish Gupta is currently a database administrator with Mizuho Securities in Japan. He has more than nine years of database administration experience with various global investment banks and five years of software design and development experience. He received a bachelor of science degree in computer science from the Indian Institute of Technology, Roorkee, India.

 

You may read the rest of this book by ordering online on AMAZON.com or on the Publishers site

WORDWARE.com

 

WHAT IS DATA PARTITIONING?

Data partitioning breaks up large tables and indexes into smaller pieces that can reside on separate partitions, allowing DBAs to have a very fine level of control over data placement, maintenance, and management.

A segment is a portion of a device that is defined within ASE. It is used for the storage of specific types of data such as system data, log data, and the data itself. Partitions can be placed on individual segments and multiple partitions can be placed on a single segment. In turn, a segment or segments can be placed on any logical or physical device, thus isolating I/O and aiding performance and data availability.

Data in a table or index on one partition can be managed and processed separately from data in other partitions. Queries only need to access partitions that contain the required data.

DBAs can perform management and maintenance tasks much more quickly on individual smaller partitions rather than on huge tables and indexes. To save even more time, some common tasks can be run in parallel on multiple partitions. Additionally, a DBAcan automate tasks on partitions. As the size of the data grows, more partitions can be added.

There are four methods of data partitioning offered in ASE 15. The first is called round- obin partitioning, the only method used in pre-ASE 15 versions. This method places data sequentially on partitions. There is no way to specify which data goes on which partition, and all partitions will be involved in query processing.

The next three methods are referred to as “semantic data partitioning” methods because the placement of data on partitions can be specified. They  are range, list, and hash partitioning.

The partition method expected to be most widely used in ASE 15 is range partitioning.  With this method, ranges of values can be specified to be placed on a given partition.

With list partitioning, individual values can be placed on separate partitions.

In hash partitioning, the data is placed on partitions based on the columns specified and and internal hashing algorithm.

WHY USE DATA PARTITIONING?

By using data partitioning in ASE 15, businesses lower the cost of keeping and using databases of any size on ASE, while improving the performance of their applications and making their data more available.

REDUCING THE COST OF MANAGING and
MAINTAINING DATABASES

The most costly aspect of operating a database of any size is the time and effort required to manage and maintain it. Databases can become so large that there simply isn’t enough time to perform all of the necessary maintenance tasks. This not only has a direct impact on application performance but also puts data in danger by making backups less practical due to the time required.

By making the use of various utilities much more efficient, data partitioning in ASE 15 has a positive impact on the time and resources spent maintaining and managing large databases.

Imagine that multiple activities are happening on a large table. A large complex DSS-style query is running, OLTP processing is occurring, bcp is being used to load data, and update statistics is running to freshen statistics. In an unpartitioned table, the activities under way are enough to slow operations to an unacceptable level if not bring many of them to a halt. Now imagine the same table, but this time it is data partitioned. Each of the operations that are running are doing so on separate partitions without affecting the others.

By partitioning a table or index into smaller pieces, DBAs can run utilities on a per-artition basis. This results in the utilities running faster, allowing other operations to work efficiently on data in other partitions and ensuring that the bulk of the data in the table is available for applications. In this way, the use of data partitioning in ASE 15 increases the operational scalability of applications using VLDBs.

With partitioned tables and indexes, a DBA can schedule maintenance tasks and cycle through one partition of a table at a time, or perform tasks simultaneously on multiple partitions if preferred. If maintenance on a given partition is not practical at a certain time, that partition can be skipped and returned to later. The maintenance schedule can also be tightly integrated with ASE’s Job Scheduler, thus freeing up the DBA for other important activities.

Two commonly run utilities are also two of the most resource and time consuming — update statistics and reorg. Both need to be run on tables and indexes on a regular basis in order to assure good performance from queries.

The update statistics utility gathers statistics about the table, its indexes, and the distribution of data in its columns. This information is needed by the query processing engine’s optimizer to determine the most efficient way to access the required data. The reorg rebuild utility is used to reclaim empty space in tables and indexes. This operation is important for space management and for efficient query performance. Some tables are so large that it is virtually impossible to run either of these two important utilities without exceeding the allotted time and impeding regular business operations.

As an example, many databases contain a date/time column used when recording a transaction. It’s important to keep the distribution statistics for such columns up to date so that queries run as efficiently as possible. But to update statistics for an unpartitioned table and index, the entire column has to be read. With huge tables, this can have a major impact on performance and data availability.

The same operations on a partitioned table and index will take considerably less time and have minimal impact on overall performance and data availability. This is because the distribution statistics only have to be gathered from the partition that contains the most recent date/time values. The same applies to the reorg utility. When run on a table and index that are partitioned, only the target partition will be affected, keeping the rest of the table and index free for use.

Another time-consuming but necessary task is the running of common table and index diagnostic utilities such as the DBCCs that check for data consistency. Running these diagnostics on a per-partition basis makes this critical task more efficient.

Still another time- and resource-intensive operation is the archiving off or deletion of large amounts of data that are no longer needed in the table. The bcp (bulk copy program) can be run on individual partitions, allowing data to be put into or taken out of a table without interfering with other operations that are using that same table. Only the partition whose data is being imported or exported via bcp will be affected. By partitioning tables, data that “ages out” can be managed more easily. bcp can also run in parallel on multiple partitions simultaneously.

The truncate table..partition command is used to perform mass deletions of data from a table. Often this is data copied off of the table via bcp or other backup methods. In the past, truncating the table deleted all data from it and could take a long time. Now, however, data can be deleted on a per-partition basis.

DATA AVAILABILITY

The availability of data to applications is vital for the functioning of any business. If applications can’t get to the data they require, work doesn’t get done.

As we have seen, the purpose of partitioning tables and indexes is to divide them into separate and smaller pieces so that operations on each piece can be isolated from other operations on other pieces, thus ensuring that more data can be accessed by applications more of the time.

With a partitioned index, the reorg utility can be running on partition 2 while statistics on partition 3 are being updated. Either of these utilities would make the index unavailable on an unpartitioned index. In this case, both are running simultaneously while data on partition 1 is still available to an application.

Similarly, if queries are changing the data, indexes can be updated on one partition without affecting operations on others.

In a mixed workload environment, DBAs can design partitions so that transactional operations occur on one partition while DSS queries run on others.

The data availability that partitioning offers is critical to the performance of applications running in a mixed workload environment.

Partitions also enhance availability by being placed on various physical devices. If one device fails, the remaining partitions will still be available for use.

 

INDEX PARTITIONING

There are two methods of index partitioning: global and local. A global index has its entire structure (one B-tree) on one partition. When queries on partitioned tables use a global index, they will follow pointers from the index to the data on the table’s data partition(s). The index structure covers all data partitions of the table.

The second type of index is a local index. Here the index’s structure is broken up into pieces (multiple B-trees), which are stored together on a single partition. Since the partitioned index has to be based on the same column order of values as the data partitions, a local index is similar to having smaller  individual indexes pointing to data in corresponding data partitions. If a query is searching for only a fraction of the data in the table, it will only have to read one piece of the index. This increases concurrency by allowing different queries to simultaneously access different pieces of the same index. Aquery that is retrieving data from one partition will not interfere with a query that is using another partition. The result is less blocking, less I/O, more data availability, and higher performance. Another advantage to using partitioned indexes is that the size of each is based on the number of rows in its corresponding table partition. Smaller indexes take less time to scan.

HOW TO AND WHEN TO USE DATA PARTITIONING

Partitioning tables and indexes is relatively simple. Atable can be partitioned at its creationor later. Once partitioned, data put into the table will follow the  rules set by the partitioning method, whether it’s done with an insert or update or by using bcp to import data. An index can also be partitioned when it’s created, but an existing index cannot be partitioned. It will need to be dropped and recreated with partitioning.

Here are some situations where DBAs might consider using the three semantic methods of partitioning described earlier:

  • Range partitioning — This method allows DBAs to specify what data will be contained in each partition based on a range of values in the key column. For example, the values 1, 2, 3, and 4 could be on partition 1, while 5, 6, 7, and 8 are on partition 2, and so on. Below is an example of the syntax used to create a table that uses range partitioning on adate/time column:

  • create table customer (ord_date datetime not null,

  • name varchar(20) not null,

  • address varchar(40) not null, other columns ...)

  • partition by range (ord_date)

  • (ord_date1 values <= (3/31/05) on segment1,

  • ord_date2 values <= (6/30/05) on segment2,

  • ord_date3 values <= (9/30/05) on segment3,

  • ord_date4 values <= (12/31/05) on segment4)

Range partitioning is especially useful for tables with constant updates, inserts, and deletes that contain a column or columns with sequential data in them such as a customer ID or a transaction date. Such tables require the most attention from DBAs for maintenance and management. These tables are also commonly used in decision support style queries, making them excellent candidates for range partitioning. While transactions are  occurring on a single partition, DBA tasks and DSS queries can access other partitions. At the same time, it is critical that the DBAkeep the distribution statistics up to date on the active transactional partition so that new data can be described to the query processing engine. As we have seen, the time it takes to update the statistics is considerably shorter when it only has to be run on a single partition.

List partitioning—List partitioning is similar to range partitioning, but here the actual values to be placed on a partition are specified. List partitioning is useful for controlling where specific values are stored, even if the column itself is not sorted, and in cases where the order of values in the partition isnot important. Below is an example of the syntax used to

create a table with list partitioning:

  • create table nation (nationkey integer not null, name char(25) not

  • null, regionkey varchar(30) not null, comment varchar(152) not null)

  • on segment 1

  • partition by list (n_regionkey)

  • (region1 values ('Americas'),

  • region2 values ('Asia'),

  • region3 values ('Europe'),

  • region4 values ('Australia', 'Other'))

Hash partitioning—This method distributes values to partitions based on the column(s) specified and an internal hashing mechanism. There is  no need to specify a list or range of values. If the column key contains unique values or values that have little duplication, hash partitioning will balance the data across all the partitions. However, if there is extensive duplication of values, the partitions can become skewed, with some containing more rows than others. Hash partitions are useful when many partitions are desired for a largetable and the values in the key column are not in a sorted order. They also help equality searches done by the query processing engine run more efficiently.

  • create table lineitem (l_orderkey integer not null, l_partkey integer

  • not null, l_suppkey integer not null, l_linenumber integer not null,

  • l_quantity double not null, l_extendedprice double not null, other

  • columns ...)

partition by hash (l_orderkey, l_linenumber)

  • (litem_hash1 on segment1,

  • litem_hash2 on segment2,

  • litem_hash3 on segment3,

  • litem_hash4 on segment4)

 

SUMMARY

If you have a huge amount of data and need to break it up to manage I/O issues or to manage a smaller subset of data, partitions are for you. ASE 15 introduces three methods of partitioning: range, list, and hash.

 
 
 
©Soaring Eagle 2006