Recall from “Tables, Columns, and Rows” in Chapter 2 that rows stored in a table are unordered, as required by the relational model. This lack of order makes it easy for the DBMS to INSERT, UPDATE, and DELETE rows quickly, but its unfortunate side effect is that it makes searching and sorting inefficient. Suppose that you run this query:
WHERE au_lname = 'Hull';
To execute this query, the DBMS must search the entire table authors sequentially, comparing the value in each row’s au_lname column to the string Hull. Searching an entire table in a small database is trivial, but production database tables can have millions of rows.
DBMSs provide a mechanism called an index that has the same purpose as its book or library counterpart: speeding data retrieval. At a simplified level, an index is a sorted list in which every distinct value in an indexed column (or set of columns) is stored with the drive address (physical location) of the rows containing that value. Instead of reading an entire table to locate specific rows, the DBMS scans only the index for addresses to access directly. Indexed searches typically are orders of magnitude faster than sequential searches, but some tradeoffs are involved, as explained in this chapter.
Indexes are complex; their design and effects on performance depend on the idiosyncrasies of your DBMS’s optimizer. I’ll provide guidelines in this section, but search your DBMS documentation for index to learn how your DBMS implements and uses indexes. In general, indexes are appropriate for columns that are frequently:
In general, indexes are inappropriate for columns that:
When you’re creating an index, some important considerations are:
You can create multiple composite indexes that use the same columns if you specify distinctly different combinations of the columns. The following two statements, for example, specify valid combinations for the same table:
CREATE INDEX au_name_idx1 ON authors (au_fname, au_lname);
CREATE INDEX au_name_idx2 ON authors (au_lname, au_fname);
Indexes aren’t part of the SQL standard, so index-related SQL statements vary by DBMS, although the syntax for the minimal CREATE INDEX statement is the same for the DBMSs covered in this book.
To create an index:
CREATE [UNIQUE] INDEX index
ON table (index_columns);
index is the name of the index to create and is a valid SQL identifier. Index names must be unique within a table. For Oracle, Db2, and PostgreSQL, index names must be unique within a database.
table is the name of the table to create the index for, and index_columns is a list of one or more comma-separated names of the columns to index.
Specify UNIQUE to create a unique index. UNIQUE causes the DBMS to check for duplicates in index_columns. If table already contains rows with duplicates in index_columns, then the DBMS won’t create the index. If you attempt to INSERT or UPDATE duplicate values in unique index_columns, the DBMS generates an error and cancels the operation.
Listing 12.1 creates a simple index named pub_id_idx on the column pub_id for the table titles. pub_id is a foreign key and is a good candidate for an index because:
CREATE INDEX pub_id_idx
ON titles (pub_id);
Listing 12.2 creates a simple unique index named title_name_idx on the column title_name for the table titles. The DBMS will create this index only if no duplicates already exist in the column title_name. This index also prohibits nondistinct title names from being INSERTed or UPDATEd in titles.
CREATE UNIQUE INDEX title_name_idx
ON titles (title_name);
Listing 12.3 creates a composite index named state_city_idx on the columns state and city for the table authors. The DBMS uses this index when you sort rows in state plus city order. This index is useless for sorts and searches on state alone, city alone, or city plus state; you must create separate indexes for those purposes.
CREATE INDEX state_city_idx
ON authors (state, city);
Microsoft SQL Server and Db2 consider multiple nulls to be duplicates when UNIQUE is specified and allow no more than one null in columns with a unique index. Microsoft Access, Oracle, MySQL, and PostgreSQL allow multiple nulls in such columns.
Some DBMSs let you create indexes on views (Chapter 13) as well as tables.
Use the DROP INDEX statement to destroy an index. Because an index is logically and physically independent of the data in its associated table, you can drop the index at any time without affecting the table (or other indexes). All SQL programs and other applications will continue to work if you drop an index, but access to previously indexed data will be slower.
The usual reasons for dropping an index are:
The SQL standard omits indexes, so index-related SQL statements vary by DBMS. This section describes how to drop an index for each DBMS covered in this book. If you’re using a different DBMS, then search the documentation for index to learn how to drop an index.
In Oracle, Db2, and PostgreSQL, index names must be unique within a database, so you don’t specify a table name when you drop an index. In Microsoft Access, Microsoft SQL Server, and MySQL, index names must be unique within a table but can be reused in other tables, so you must specify a table along with the index to be dropped. The examples in this section drop the index created by Listing 12.1 in the preceding section.
To drop an index in Microsoft Access or MySQL:
DROP INDEX index
index is the name of the index to drop, and table is the name of the index’s associated table (Listing 12.4a).
DROP INDEX pub_id_idx
To drop an index in Microsoft SQL Server:
DROP INDEX table.index;
index is the name of the index to drop, and table is the name of the index’s associated table (Listing 12.4b).
DROP INDEX titles.pub_id_idx;
To drop an index in Oracle Database, IBM Db2 Database, or PostgreSQL:
DROP INDEX index;
index is the name of the index to drop (Listing 12.4c).
DROP INDEX pub_id_idx;