The index-only scan executes an SQL statement using only the redundant data stored in the index. The original data in the heap table
is not needed. If we take that concept to the next level and put all
columns into the index, you may wonder why we need the heap table.
Although the table access might become a bottleneck, it is still limited to one read operation per row because the index has the
Accessing an index-organized table via a secondary index is very
inefficient, and it can be prevented in the same way one prevents a
table access on a heap table: by using an index-only scan—in
this case better described as “secondary-index-only scan”. The
performance advantage of an index-only scan is even bigger because it
not only prevents a single access but an entire
Using this example we can also see that databases exploit all the
redundancies they have. Bear in mind that a secondary index stores the
clustering key for each index entry. Consequently, we can query the
clustering key from a secondary index without accessing the
index-organized table:
When selecting other columns, the database has to run an
Tables with more indexes can often benefit from heap tables. You can still use index-only scans to avoid the table access. This gives you the select performance of a clustered index without slowing down other indexes.
Database support for index-organized tables and clustered index is
very inconsistent. The following overview explains the most important
specifics.
Some databases can indeed use an index as primary table store. The Oracle database calls this concept index-organized tables (IOT), other databases use the term clustered index. In this section, both terms are used to either put the emphasis on the table or the index characteristics as needed.
An index-organized table is thus a B-tree index
without a heap table. This results in two benefits: (1) it saves the
space for the heap structure; (2) every access on a clustered index is
automatically an index-only scan. Both benefits sound promising but are
hardly achievable in practice.
The drawbacks of an index-organized table become
apparent when creating another index on the same table. Analogous to a
regular index, a so-called secondary index
refers to the original table data—which is stored in the clustered
index. There, the data is not stored statically as in a heap table but
can move at any time to maintain the index order. It is therefore not
possible to store the physical location of the rows in the
index-organized table in the secondary index. The database must use a
logical key instead.
The following figures show an index lookup for finding all sales on May 23rd 2012. For comparison, we will first look at Figure 5.2 that shows the process when using a heap table. The execution involves two steps: (1) the INDEX RANGE SCAN
; (2) the TABLE ACCESS BY INDEX ROWID
.Figure 5.2. Index-Based Access on a Heap Table
Although the table access might become a bottleneck, it is still limited to one read operation per row because the index has the
ROWID
as a direct pointer to the table row. The database can immediately load
the row from the heap table because the index has its exact position.
The picture changes, however, when using a secondary index on an
index-organized table. A secondary index does not store a physical
pointer (ROWID
) but only the key values of the clustered index—the so-called clustering key. Often that is the primary key of the index-organized table.
Accessing a secondary index does not deliver a
ROWID
but a logical key for searching the clustered index. A single access,
however, is not sufficient for searching clustered index—it requires a
full tree traversal. That means that accessing a table via a secondary
index searches two indexes: the secondary index once (INDEX RANGE SCAN
), then the clustered index for each row found in the secondary index (INDEX UNIQUE SCAN
).Figure 5.3. Secondary Index on an IOT
Figure 5.3 makes it clear, that the B-tree of the clustered index stands between the secondary index and the table data.
INDEX UNIQUE SCAN
.Important
Accessing an index-organized table via a secondary index is very inefficient.SELECT sale_id
FROM sales_iot
WHERE sale_date = ?;
------------------------------------------------- | Id | Operation | Name | Cost | ------------------------------------------------- | 0 | SELECT STATEMENT | | 4 | |* 1 | INDEX RANGE SCAN| SALES_IOT_DATE | 4 | ------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("SALE_DATE"=:DT)
The table
SALES_IOT
is an index-organized table that uses SALE_ID
as clustering key. Although the index SALE_IOT_DATE
is on the SALE_DATE
column only, it still has a copy of the clustering key SALE_ID
so it can satisfy the query using the secondary index only.When selecting other columns, the database has to run an
INDEX UNIQUE SCAN
on the clustered index for each row:SELECT eur_value
FROM sales_iot
WHERE sale_date = ?;
--------------------------------------------------- | Id | Operation | Name | Cost | --------------------------------------------------- | 0 | SELECT STATEMENT | | 13 | |* 1 | INDEX UNIQUE SCAN| SALES_IOT_PK | 13 | |* 2 | INDEX RANGE SCAN| SALES_IOT_DATE | 4 | --------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("SALE_DATE"=:DT) 2 - access("SALE_DATE"=:DT)
Index-organized tables and clustered indexes are, after all, not as useful as it seems at first sight. Performance improvements on the clustered index are easily lost on when using a secondary index. The clustering key is usually longer than a
ROWID
so that the secondary indexes are larger than they would be on a heap
table, often eliminating the savings from the omission of the heap
table. The strength of index-organized tables and clustered indexes is
mostly limited to tables that do not need a second index. Heap tables
have the benefit of providing a stationary master copy that can be
easily referenced.Important
Tables with one index only are best implemented as clustered indexes or index-organized tables.Tables with more indexes can often benefit from heap tables. You can still use index-only scans to avoid the table access. This gives you the select performance of a clustered index without slowing down other indexes.
- By
default SQL Server uses clustered indexes (index-organized tables)
using the primary key as clustering key. Nevertheless you can use
arbitrary columns for the clustering key—even non-unique columns.
To create a heap table you must use theNONCLUSTERED
clause in the primary key definition:
CREATE TABLE ( id NUMBER NOT NULL, [...] CONSTRAINT pk PRIMARY KEY NONCLUSTERED (id) );
Dropping a clustered index transforms the table into a heap table.SQL Server’s default behavior often causes performance problems when using secondary indexes.
No comments:
Post a Comment