Subquery Unnesting

The Oracle query optimizer has basically two types of transformations at its disposal:

  1. Cost-based transformations, which are performed only when the optimizer believes it will improve the performance;
  2. Heuristic transformations, which are generally applied regardless of what the optimizer believes.

Some heuristic transformations are really no-brainers because they cannot impact the performance negatively:

  • COUNT conversion: whenever a COUNT(*) is equivalent to a COUNT(col_not_null) Oracle can use a BITMAP CONVERSION COUNT to speed up the SQL statement.
  • DISTINCT is eliminated whenever Oracle knows the operation is redundant, for instance when all columns of a primary key are involved.
  • Columns in a SELECT-list of a subquery are removed whenever they are not referenced, which is known as select-list pruning.
  • Filters can be pushed down into subqueries when appropriate, so that the amount of data is minimized as soon as possible.
  • Predicates can also be moved around from one subquery to another whenever an inner or natural join of two (or more) query blocks is performed where only one of the blocks has a particular predicate, but transitive closure guarantees that is also applies to the other subqueries. Aggregations and analytic functions are known to throw a spanner in the works for filter push-downs and predicate move-arounds.

Subquery unnesting is an interesting beast in the sense that is always applied irrespective of the impact on the performance. As such, it should be classified as a heuristic transformation were it not for the fact that it can be disabled with a hint.

Oracle automatically unnests subqueries (in the WHERE clause) when possible and merges the body of the subquery into the body of the statement or rewrites the nested subquery as an inline view, which of course opens up new optimization avenues, unless the nested subquery:

  • contains the ROWNUM pseudocolumn;
  • contains a set operator;
  • contains an aggregate function or GROUP BY clause;
  • contains a correlated reference to a query block that is not the immediate outer query block of the subquery;
  • is a hierarchical subquery.

Why is unnesting important? Without it, Oracle needs to evaluate a (correlated) subquery for each row of the outer table. With it, it has a whole array of access paths and join options it can go through to improve the execution.

When you execute SELECT * FROM dual WHERE 0 NOT IN (NULL) you will receive no rows, as expected. After all, null may be 0, it may not be. Before Oracle Database 11g, a column that could be null would prevent Oracle from unnesting the subquery. With the null-aware anti-join in 11g and above this is no longer the case, and Oracle can unnest such subqueries.

As of Oracle Database 12c there is the so-called null-accepting semi-join, which extends the semi-join algorithm, indicated by SEMI NA in the execution plan. This is relevant for correlated subqueries that have a related IS NULL predicate, like so: WHERE col_name IS NULL OR EXISTS ( SELECT 1 FROM ... ) The null-accepting semi-join checks for null columns in the join column of the table on the left-hand side of the join. If it is null, the corresponding row is returned, otherwise a semi-join is performed.

So, you may be wondering, ‘If Oracle already unnests correlated subqueries, is there any reason to use correlated subqueries instead of joins?’

A correlated subquery is perfectly acceptable when your outer query already filters heavily and the correlated subquery is used to find corresponding matches. This often happens when you do a simple lookup, typically in a PL/SQL (table) function in an API.

Beware of nulls in the subquery of anti-joins though: whenever one or more rows return a null, you won’t see any results. A predicate such as col_name NOT IN (NULL, ...) always evaluates to null. Analogously, it is important that you inform Oracle of nulls, or the absence thereof, in case you decide to explicitly rewrite a nested or correlated subquery as a join, as it may assist Oracle in determining a better execution plan. Remember: the more information the optimizer has, the better its decisions.

Combined Nested Subqueries

Sometimes you need to filter for two different (aggregated) values from a subquery. Basically, you have two options.

Option 1:

1
2
3
4
5
6
7
 SELECT
   ...
 FROM
   tab_name
 WHERE
   col_name = ( SELECT ... FROM sub_tab_name ... )
 AND another_col_name = ( SELECT ... FROM sub_tab_name ... );

Option 2:

1
2
3
4
5
6
7
8
9
 SELECT
   ...
 FROM
   tab_name
 WHERE
   ( col_name, another_col_name ) =
   (
     SELECT aggregation(...), another_aggregation(...) FROM sub_tab_name ...
   );

The second option is to be preferred because the number of lookups in sub_tab_name is minimized: col_name and another_col_name are retrieved in the same round trip, potentially for each relevant row of tab_name.

Subqueries with DISTINCT

Let’s take a look at two queries:

1
2
3
4
5
 SELECT
   DISTINCT
   some_fancy_function(col_name) AS col_alias
 FROM
   tab_name;
1
2
3
4
5
6
 SELECT
   some_fancy_function(col_name) AS col_alias
 FROM
   (
     SELECT DISTINCT col_name FROM tab_name
   );

Which one will run faster?

Well, in the first case, a full-table scan is done to fetch the columns, after which the function some_function is applied to each column, and finally Oracle looks for distinct values. In the second case, Oracle scans the table tab_name, returns only distinct values for col_name, and then applies the function to the results of the inline view. The function is invoked for every row of tab_name in the former query, whereas in the latter it is only called for every distinct col_name. Therefore, the bottom query will have better runtime performance.

Important to note is that the result sets of both may not be the same though. Suppose col_name contains the following distinct entries: 0, 1, and 2. Let’s also assume that the function we want to apply is SIN(col_name*gc_pi), where gc_pi can be a global (PL/SQL) constant defined in a package as ACOS(-1). In case you have already forgotten geometric functions from basic calculus: the sine function is zero at all multiples of \(\pi\). The former query will therefore return only one row with col_alias equal to zero, whereas the latter will return three rows, all zero.

Functions that lead to the same result set in both cases are known as bijective maps in mathematical circles. They map distinct input (domain) to distinct output (range); there is a one-to-one correspondence between the domain and the range of the function. A non-mathematical example that shows similar behaviour as our sine function is SUBSTR(col_name, 1, 1). It takes the first character of each col_name, which means that 'Jack' and 'Jill' are both mapped to 'J'.

So, when you know that the function you apply is a bijection, then you can rewrite your original query in the format that typically runs faster.

Sometimes you can even avoid a DISTINCT (with the associated costly sort operation) in a main query’s SELECT-list altogether by opting for a semi-join (i.e. EXISTS) instead. This is common when you want unique entries from the main table but only when there is a match in another table for which there are multiple rows for one original row, that is, there is a one-to-many relationship from the main to the other (subquery) table.