Predicates: The WHERE Clause

The WHERE clause is the one that determines whether or not indexes can be used efficiently. One side of each predicate must be as specified in the index(es) for Oracle to be able to use any index. Whether it is the left-hand side or the right-hand side is irrelevant, although typically it is the left-hand side because SQL is written from the left to the right. Note that the order sometimes matters though: col_name LIKE 'ABC%' is not the same as 'ABC%' LIKE col_name. The former searches for col_name entries that begin with ABC, whereas the latter is the same as the filter col_name = 'ABC%', that is the % is not interpreted as a wild card at all.

Indexes can only be used when predicates are sargable, or search-argument-able, which admittedly is a horrible phrase. Functions on columns in the index can prohibit index use, particularly when the index is not a function-based index. Apart from that, some operators are sargable and optimizable (i.e. allow the use of an index): =, <, >, >= IS NULL; some operators are sargable yet not optimizable: <> and its equivalents (i.e. != and ^=) and NOT; and LIKE with a leading wild card is not sargable and hence not optimizable. Sargable predicates can be pushed down, which means that a predicate in a statement that references a view or derived table can be ‘pushed down’ to the view or derived table itself, which avoids having to scan the entire underlying data structure only to filter out a few relevant rows later. Sargable, non-optimizable predicates can still benefit from the optimizer’s efforts; non-sargable predicates cannot though.

A SQL statement that links several sargable predicates with an OR cannot be optimized when the predicates involve different columns. If, however, the predicates can be rewritten as an equivalent IN-list, which Oracle does internally as a part of its predicate transformations, then Oracle can indeed optimize the statement and therefore utilize existing indexes.

Important is, as always, that the data type of each search term matches the data type of the indexed column or expression; it is best that you convert search terms on the right-hand side if necessary but leave the left-hand side as is. Unnecessary use of TO_CHAR() and TO_NUMBER() (on the left-hand side) is not only sloppy but it can hamper index use. The same goes for NVL() and the like.

If you often encounter fixed expressions or formulas in your predicates, you can create function-based indexes on these expressions. Make sure that the columns referenced appear in the index in exactly the same way as they appear in the predicates, and make sure that the right-hand side does not contain the columns from the index: Oracle does not solve your equations for you.

Predicates that are often badly coded include operations on dates. Yes, it is possible to create a function-based index on TRUNC ( expiry_date ) and use same expression in the database. However, all predicates on the column expiry_date must include TRUNC() for Oracle to be able to use the index in all cases. A simple and elegant solution is to provide ranges, either with >= TO_DATE(...) and <= TO_DATE(...) or with BETWEEN TO_DATE(...) AND TO_DATE, which is inclusive. Should you not want it to be inclusive subtract a minimal interval like so: TO_DATE(...) - INTERVAL '1' SECOND'.

Why not the literal 1/86400 or 1/(24*60*60)? Well, it may be easy for you to understand something like that because you wrote it (and perhaps added a comment), but it is not always easy to fathom such literals, especially if developers simplify their fractions as in 7/10800, which is 56 seconds by the way. The index may not care about how you write your literals but the other developers in the team do care. Let the code speak for itself!

Since we’re on the topic of dates: never write TO_CHAR ( expiry_date, 'YYYY-MM-DD' ) = '2014-01-01'. Leave the DATE column as is and write expiry_date >= TO_DATE ( '2014-01-01','YYYY-MM-DD' ) and expiry_date < TO_DATE ( '2014-01-01','YYYY-MM-DD' ) + INTERVAL '1' DAY instead. [1] Yes, it’s a bit more typing, but that way an index range scan can be performed and you do not need a function-based index.

‘But what if I need only products from the fridge that expire in February?’ Since repetition is the mother of learning, here comes: specify ranges from the first day of February to the last day of February.

‘But I want to show the total number of products by the year and month of the expiry date.’ You could use the EXTRACT ( YEAR FROM expiry_date ) and similarly for the month, TRUNC( expiry_date, 'MM' ) or TO_CHAR ( expiry_date, 'YYYY-MM' ). However, since you are pulling in all data from the table, a full table scan really is your best option. Yes, you read that right: a full table scan is the best alternative; we’ll say more about full table scans in a few moments. Furthermore, if you already have an index on expiry_date and it is stored in order (i.e. it is not a HASH index on a partitioned table), then the GROUP BY can make use of the index without any additional function-based indexes.

The LIKE comparison operator is also often a cause for performance problems because applications tend to allow wild cards in strings, which means that a search condition à la WHERE col_name LIKE '%SOMETHING%' is not uncommon. Obviously, you cannot create a sensible index for a predicate like that. It is tantamount to asking a dictionary to provide you with a list of all possible sequences of characters in any position.

The INDEX hint, as described by Laurent Schneider, is — contrary to what is claimed by the said author — not always beneficial for predicates with leading and trailing wild cards, so be sure to try it out. An index is, however, used when such a predicate is specified with bind variables:

 VARIABLE l_like VARCHAR2(20);
 EXEC :l_like := '%SOMETHING%';

   col_name LIKE :l_like;

If you always look for things ending with a series of characters, such as LIKE '%ABC' you can use an index. Just create the index on REVERSE ( col_name ) and reverse the string you are looking for itself, and voilà, it works: WHERE REVERSE ( col_name ) LIKE 'CBA%'.

To search in a case-insensitive manner you have to create a function-based index, say, UPPER(col_name). You could have gone with LOWER(col_name) and whatever you prefer is really up to you. All that matters is that you are thrifty and consistent: switching back and forth between UPPER() and LOWER() is a bad idea because the database has to maintain two indexes instead of one, and you really only need one index. Which function you choose for case-insensitive searches is irrelevant but document whichever you choose, so it is clear to all developers on the team.

In an international setting you may want to use NLS_UPPER( col_name, 'NLS_SORT = ...' ), so that for instance — for ... = XGERMANß and ss are seen as equivalent. The parameters NLS_SORT and NLS_COMP can be made case- or accent-insensitive by appending _CI or _AI to their sort name values respectively. The NLS_SORT parameter can be used to alter a session or the entire system.

For purely linguistic rather than binary searches of text, you can set the system’s or session’s NLS_COMP = LINGUISTIC. The performance of linguistic indexes can thus be improved: CREATE INDEX ix_col_name_ling on tab_name ( NLSSORT( col_name, 'NLS_SORT = FRENCH' ) ), for French for example.

We have already seen that with function-based indexes it is important to have the exact same expression save for irrelevant spaces. A functionally equivalent expression that is syntactically different prohibits the use of an index, so writing REGEXP_LIKE() in your WHERE clause when you have used INSTR() in the index means that the optimizer will ignore the index.

For Oracle Database 11g there is a good book on expert indexing, if you want to learn more about indexes.


[1]The INTERVAL function has one major disadvantage: SELECT TO_DATE ( '2014-01-31', 'YYYY-MM-DD' ) + INTERVAL '1' MONTH FROM dual leads ORA-01839: date not valid for month specified error. The function ADD_MONTHS() solves that problem.