Performance & Tuning Development Training Publications Seminars Company
 

Sybase ASE 12.5
Performance and Tuning

Physical Database Design Issues

Published by Wordware

ABOUT THE AUTHORS

 

Jeff Garbus’s background includes a bachelor’s degree from Rensselaer Polytechnic Institute, and work experience from PCs to mainframes and back again. He has many years of Microsoft and Sybase SQL Server experience, with a special emphasis on assisting clients in migrating from existing systems to pilot and large-scale projects. He is well known in the industry, having spoken at user conferences and to user groups for many years, written articles and columns for many magazines nationally and internationally, as well as having written several books. Recently his focus has been on very large databases, data warehousing, training, and remote database administration. Jeff has been in consulting for fourteen years, training for nine, and in the software business for eighteen. He has a demonstrated talent for staying at the leading edge of technology, as well as transferring his knowledge to others. Jeff currently is CEO of Soaring Eagle Consulting, an RDBMS consulting and training firm based in Tampa, which specializes in solving business problems as well as performance problems. Soaring Eagle now offers remote database administration services.

 

Eric Miner has been with Sybase since October 1992, starting as an associate technical support engineer. After a couple of years on the phones working with customers, he began to study the optimize, and moved through the ranks of Technical Support and into the “back line” work of Product Support Engineering. After leaving Sybase for nine months—and realizing that the grass isn’t always greener on the other side—he returned and joined the Optimizer Group in ASE Engineering. Eric has spoken at four Techwave conferences, usually giving optimizer-related presentations. He has written several optimizer-related articles for the ISUG Technical Journal and has spoken to various ISUG groups in Europe. He has contributed to the Sybase Performance and Tuning course offered by Sybase Education and done many internal training sessions for both support and consulting staff, as well as made technical presentations on site for customers and spoken on two webcasts. As an escape from work, Eric does digital stereo photography and is very involved in the online community of stereo photographers worldwide.

 

Joel DuPlessis is a senior database consultant working in the New York metropolitan area. Eclectic by nature, he holds advanced degrees in history and business and has been working in information technology since 1983. During his tenure he has worked on a wide variety of databases, both relational and non-relational.

 

Alvin Chang is a technical trainer, author, and consultant working for Soaring Eagle Consulting, a Tampa-based consulting and training firm. He has taught Adaptive Server Enterprise throughout the United States. Currently specializing in Sybase Adaptive Server and Microsoft SQL Server System Administration, Alvin started as a technical trainer for products like Microsoft Office, Lotus Smartsuite, and Lotus Notes beforemoving into RDBMSs. This is his sixth  book. He can be reached at alvin@soaringeagleltd.biz.

Yuval Malchi is  a database administrator working for Investment Technology Group (ITG) Israel. Yuval currently manages over 100 of ITG Europe’s ASE databases throughout its European offices.  Prior to joining the office in Israel, Yuval worked at ITG headquarters in New York. He also worked for Sybase Professional Services during 2000-2001 and managed the NY Life Insurance project, overseeing several professional services consultants, and was responsible for ASE and SQL Anywhere  systems used by over 10,000 clients and users. Yuval designed databases for several of NY Life’s projects and tuned several of their ASE’s systems. He was also the vice president of WonderBase Consulting, where he designed databases for several web companies. Yuval’s current field of expertise is in performance tuning of ASE systems and SQL. He currently serves as the president of ISUG Israel and lectures at Sybase Israel conferences.

 

You may read the rest of this book by ordering online on AMAZON.com or on the Publishers site

WORDWARE.com

 

  

From a purely practical standpoint, you’ll find that 95 percent of query tuning is index selection. In order to do this correctly, you have to understand (1) How the physical storage structures work (including how space is managed) and (2) How the server chooses indexes.

PHYSICAL STORAGE STRUCTURES

Adaptive Server manages space with allocation pages, GAM pages, and OAM pages. Somewhat out of the scope of this book (see our systems administrations book, Administrator’s Guide to SybaseŽ ASE 12.5), allocation pages track used pages within database fragments, GAM pages manage allocation pages, and OAM pages are used on a per-object basis to identify which allocation pages contain allocated extents for an object.

The goal of the data storage structure is to improve performance by reducing   enough of the I/O necessary to retrieve data. Adaptive Server supports two basic storage structures:

Linked pages (used for data) and B-tree structures with pointers up, down, and across levels (used for indexes).

Both of these structure types take different physical forms if you opt to use the Data–Only Locking (DOL) schemes available since version 11.9.2; evaluating the use and costs of DOL is also outside of the scope of this book. We will assume that tables and indexes are using the installation default of the All Pages Locking (APL) scheme.

 

 

Note: There is a trade-off between CPU and I/O; storage structures are much more important if the application is I/O bound.

 

Space is initially defined for use by Adaptive Server with the disk init command. It is allocated for data with the create database command. Although space is defined by number of pages (2k by default, but sizes up to 16k are available in 12.5), Adaptive Server divides this space into allocation units of 256 data pages each. Each allocation unit is controlled through an allocation page.lization Percent

ASE provides for configuration of the “page utilization percent” to allow Adaptive Server to allocate new extents to a table or index rather than search the OAM page chain to find available pages. Page utilization percent is the ratio of used pages to allocated pages; the default value is 95. When the ratio of used pages to allocated pages exceeds the page utilization percent, Adaptive Server simply allocates a new extent to the table or index. This saves a sometimes exhaustive search of OAM pages to find allocated but unused pages. As databases get large (tens to hundreds of gigabytes),this is a performance benefit, as the server does not need to walk all of the extents to find free space.

Lowering the page utilization percent can speed up processes performing a large amount of insert activity on a table, but it will increase the number of pages allocated to the table.

USAGE

sp_configure ‘page utilization percent’, 85

OAM and GAM Pages

In brief, the database is broken into logical units of 256 pages called allocation units; the first page of each of these units is known as the allocation page and does not contain table or index data; it only contains information about the contents of the allocation unit (including which pages are in use). Thus, every 256th page of the database (starting from page 0) is one of these allocation pages.

The allocation unit is further divided into logical structures known as extents, which are each eight pages long (with 256 pages/allocation units, this makes 32 extents).

Extents are important because they provide allocation and deallocation of space to a table or index. All eight pages of an extent are assigned to the same table or index, although they may not all be in use right now. A special exception applies to the first extent of an allocation unit, where only seven pages are available for allocation to objects.

While the allocation page tracks the use of space for a given region of the database, the object allocation map page determines where in a database a given object is assigned space. OAM pages contain various details about a table or index, but we are most concerned now with the allocation page entries; if an object is assigned any amount of space in an allocation unit (from the minimum of one extent to the maximum of 32 extents), the OAM for the object will contain a pointer to that unit’s allocation page. Reading the OAM page for an object will provide a list of allocation pages, which will each provide pointers to the particular used pages for the object. These scans can also be used to find pages belonging to an object that are currently unused.

Lastly, the global allocation map page provides a quick list of allocation units with unassigned extents. When it’s necessary to find an extent available for allocation to a table or index, one method is to read the GAM page.

The diagram above reflects the default 2k data page. ASE allocates at the extent level and stores at the page level. Every page has a header, which contains information about the page (page type, locking–specific details, available space, and other details) and the body, which contains data. Here we’re diagramming a data page. If we are locked with the default (and pre-11.9.2 only) locking mechanism, allpages, we have a 32-byte header and 2016 useable bytes on a page. If you are locking DOL, then we have a 46-byte header and 2004 bytes available on the page.

Note the word “useable;” for APL tables, we can’t actually have a 2016-byte row because when we log the row on a log page, we require an additional 54 bytes of overhead. This would split the row across pages, which is not permitted. Also note that for all data pages, there is a series of 2–byte pointers at the end of

the page known as the row offset table. These entries indicate the location of rows on the page.

Additional information you need to know about pages:

  • Rows will not cross page boundaries; note max row and page sizes.

  • Each row has at least 4 bytes of additional overhead (itemized later).

 

ESTIMATING TABLE SIZE

The sp_estspace stored procedure exists to allow you to estimate how much space your table will need for storage. In order to use this procedure to the best effect, you need to first create the tables and indexes so that the procedure can read the structures and sizes from the system tables for its calculations.

In the above figure, we are sizing the table pt_tx_CiamountNCamount. We are instructing the server to size this table for 1,000,000 rows, assuming a fillfactor of 75 percent. The fillfactor indicates that the pages will be initially set to approximately 75 percent full to allow for some growth; a smaller fillfactor value will allow more rows to be added before additional pages need to be assigned but will require many more initial pages to hold the same number of rows.

 

Estimating Table Size for an Existing Table

For an existing table, use the stored procedure sp_spaceused or dbcc checktable.

dbcc checktable (sample)

The total number of data pages in this table is 244.

Table has 5772 data rows.

DBCC execution completed. If DBCC printed error messages, contact a user with System Administrator (SA) role.

Table size may also be calculated by hand, but as these calculations are also estimates, using these tools is far simpler and likely to be as accurate or more accurate.

sp_spaceused sample

name rowtotal reserved data index_size unused

------ -------- -------- ------ ---------- ------

sample 5772 494 KB 488 KB 0 KB 6 KB

 

Estimating Performance

A table scan is a sequential read of every data page in a table. Table scans are used if there is no appropriate index or if an index would slow down the process. The time required for a table scan is directly proportional to the size of the table in pages.

To get exact logical counts, set “statistics I/O” on prior to running a query.

 


Note:
Knowing the amount of time a table scan will take is useful for index selection, as

well as determining maximum expected query execution time.

TABLE SCAN TIME

It is useful to be able to accurately measure the page access speed of your server.

Here is one technique:

  • Create a large table.

  • Cycle the server, so that there are no pages in cache.

set statistics io on

set statistics time on

select count(*) from table_name

 

  • Look at the physical page reads reported, and divide by total elapsed time. As of publication, typical page access speeds are between 200 and 3000 pages persecond.

  • At this point, you can estimate how long a scan would take for a specific table, although the typical values quoted above will give us a very large range of possibilities.

For a 1,333,334-page table:

1,333,334/ 200 = 6667 seconds = ~ 111 minutes

OR

1,333,334/ 3000 = 1333 seconds = ~ 7.5 minutes

 

TABLE SCAN PERFORMANCE

There are several ways to improve table scan time. Adding an index is typically the first and easiest choice, in the hopes of avoiding the table scan in the first place. In fact, it may be reasonably stated that the purpose of an index is to avoid a table scan. When the table scan cannot be avoided, it frequently helps to spread I/O across multiple disks and controllers. This may include horizontal or vertical partitioning to reduce the amount of data to be scanned. Cache can be increased so that more I/O is logical, rather than physical (a 20:1 performance ratio, at publication).

Faster drives (Solid State Accelerators, for example) or faster disk access methods might help (raw partitions may be 40-50% faster than file systems on a UNIX box).

A 16k buffer pool in the appropriate cache may have a dramatic, positive effect on scan performance.


Note:
The server will table scan when it is cheaper than accessing via an index, a threshold around 20% of total data (calculations later).

INDEXES and B-TREE STRUCTURES

We’ve talked all around indexes, so we must finally describe them. Indexes are storage structures separate from the data pages. Their benefits include:

  • Providing faster access to data.

  • Helping to avoid table scans.

  • Setting boundaries for a table scan.

  • Helping to avoid some sorts.

  • May be used to enforce uniqueness.

INDEX STRUCTURES

All ASE indexes are B-Tree Structures.

 

INDEX STRUCTURES NOTES

There may be many intermediate levels, depending on key width, row count, and type of index. Different design books count levels from different directions; it is the number of levels that is important.

Storage structures are different for DOL locking mechanisms. Although DOL may lock data at either the page level or row level,depending on the scheme chosen, it will not lock index entries at all. Instead, a new structure known as a latch (or, more formally, a Nested Top Level Action) will permit consistent access and change of index values. Also note that DOL indexes don’t have sibling links, except on the leaf.

Index Types (APL):

Clustered indexes — The data is physically sorted in index order, and the data page is the leaf level of the index. There is a maximum of one clustered index per table (why?).

Non-clustered indexes — Index is independent of the physical sort order. There is a maximum of 249 non-clustered indexes per table. Tables can (and typically do) have both types of indexes. Either index is limited to a maximum key width, dependent upon the size of the data pages. Because you can create tables with columns wider than the limit for the index key, you will be unable to index these columns. For example, if you perform the following on a 2k page server and try to create an index on col3, the command fails and Adaptive Server issues an error message because column col3 is larger than the index row-size limit (600 bytes).

create table tab1 (

col1 int

col2 int

col3 char(700))

“Unable to index” does not mean that you cannot use these columns in search clauses. Even though a column is non-indexable (as in col3, above), you can still create statistics for it. Also, if you include the column in a where clause, it will be evaluated during optimization.

Statistics Maintained on the First 255 Bytes of Data

Optimizer statistics are various measures of table size (page and row counts, row size, etc.) and data distribution used to determine the most efficient access method for a query (index, table scan, etc.). Different statistics are measured and maintained by different means, some by the server automatically and some by the action of an owner or administrator. Regardless of how you gather statistics, they are maintained only for the first 255 bytes of data in a column. If you use wide columns, any information after the first 255 bytes is considered statistically insignificant by the optimizer.

INDEX TYPES (DOL)

The data is still logically sorted in index order, but it may not be sorted on the page (this means that clustered indexes aren’t, so we will often refer to them as “placement indexes” instead). The storage structures are identical for both clustered and non-clustered indexes, and there are no longer sibling pointers at non-leaf index levels.

Index pages now have row offset tables to determine the order of keys, since they are not necessarily stored in order on the index page. Ordinarily, a clustered index is identified by Index ID 1; for DOL tables, this value is no longer used. The one DOL CI is identified with 0x200 (decimal 512) in the status2 column of the sysindexes table.

DOL indexes also have some structural changes designed to improve their storage efficiency: Duplicate keys in non–unique indexes are stored only once per index page, and key values on non-leaf pages are reduced to the minimum possible byte length by using suffix compression.

Clustered Indexes

Clustered Indexes are used to direct placement of data rows in sorted order. Rowsmay not be sorted on the page. The data pages are not leaf pages.

Advantages

  • Data modifications are faster.

  • Less movement on page.

  • DOL index B-Trees are more efficient.

Disadvantages

  • Retrieval may be slower (more I/O to get a row).

  • Large I/O may not be as effective.

  • May need 40-50% more storage.

For APL clustered indexes, the leaf level is the data level, and the prior index level needs only one entry per page of data.

 

Non-Clustered Indexes

The non-clustered index requires an extra level, as it needs pointers for every row of data

Clustered

Index Detail

(allpages locking)

Index levels point to succeeding index levels. The B-Tree is traversed by following

these pointers.

 

Non-Clustered

Index Detail

 

DESCENDING INDEXES

Create index i1

on sales (id asc, purchasedate desc)

This will create an index on the sales table, and if you look frequently for the most recent purchase, you will avoid scanning through older purchases. This can help avoid backward scans, which are a frequent cause of deadlocks.

NEW ROWS

APL tables with a clustered index will place rows in the table according to the clustered index order. APL heap tables will add rows in the last position of the last pageof the table.

DOL tables without a placement index have new rows added to a designated insert page. Unlike APL tables, we cannot say that this page is at the end of the table, since DOL tables do not maintain page pointers or a page order. The last logical page number (on APL) or insert page number (on DOL) is stored in sysindexes as the root page.

ROW REMOVAL

When the last row is removed from an APL data or leaf page, that page is removed from its current page chain and marked as “available” on the allocation page. When rows are removed from the middle of a page, the remaining rows are adjusted upward on the page so that free space is left at the end of the page. Removing rows causes deletions to cascade into all non-clustered indexes that point to rows on that page. Rows that are deleted from DOL tables are marked as logically deleted, but they are not physically removed from the page immediately (some of this space will be recovered with time and some when maintenance commands, like “reorg,” are used). Tables with a placement index have rows added on the appropriate page, if there is room; otherwise, the row will be placed on the nearest available page. DOL leaf-level index pages will split, since the entries must remain in index order.

IMPLICATIONS OF MODIFICATION

Data modification statements can have a variety of adverse impacts on performance in regard to indexes. Updates and deletes will frequently cause rows to move, both in the data and in the index. If an update or delete causes an APL row to move, all non-clustered indexes on that table will need to be updated to reflect the new row position. By contrast, a DOL row that moves will not update its index entries, but it will leave behind a forwarding address to the new physical location. Likewise, inserts into an APL data page of a table with a clustered index will frequently cause rows on the page to be moved down, or they will cause the page to split. All non-clustered indexes on the table will need to be updated to reflect the new row position(s).

Setting Fill Factor

The default fill factor is set at the server level, but it can be overridden when indexes are created.

Server-wide default:

sp_configure 'fill factor',N

Overriding at index creation:

create index idx_name on table (column)

with fillfactor=N

In both cases, N is the percentage full, except

  • N = 100 Data and Index pages full, except root.

  • N = 0 Full data pages, indexes at 75%.

CHOOSING INDEXES

Sometimes, the clustered indexes are created based on the primary key of the tables. In a way, this makes sense. A clustered index tends to be one I/O faster than a non-clustered index for a single-row lookup. So many database design tools automatically make this selection: “If this is the primary way we go after the data, we’ll place the clustered index on this one and save the I/O whenever we retrieve the data.”

However, there are other reasons to choose: Remember, clustered indexes have the data physically sorted in index-order, so that makes it a great choice for any range searches or queries with an order by clause.

Non-clustered indexes are a bit slower and take up much more disk space, but they are the next best alternative to a table scan. For some queries, known as covered queries, non-clustered indexes can be faster than clustered indexes.

When creating a clustered index, you need free space in your database equal to approximately 120 percent of the total table size.

SUMMARY

Ninety-five percent of tuning is based upon index selection. Choose your indexes wisely, based upon your understanding of physical storage.

 

 

 
 
 
©Soaring Eagle 2006