Now that we are equipped with a better appreciation and understanding of the intricacies of the various join methods, let’s revisit the queries from the introduction.
Queries 1a and 1b are logically the same and Oracle will treat them that way.
First, let’s assume there there is an index on
department_id in both tables.
Such an index is only beneficial to nested loops because that particular column is in the join clause.
As such, the
employees table is likely to become the driving row source, for a filter like
LIKE last_name = 'X%' is probably very selective in many instances, which means that the number of iterations will be relatively low.
While accessing the
employees table, Oracle will apply the filter because it knows that single-column join conditions in the
ON clause of inner joins are the same as predicates in the
The database will do so either with a lookup if the relevant index on
employees is selective enough or by means of a full table scan if it is not highly selective.
It will then use the index on
departments to access its data by ROWID, thereby joining it to the data from the leading row source.
When the filter on
last_name is not as selective, especially when the cardinality of the
departments table is lower than the cardinality of the
employees table after the filter has been applied, the roles of driving and probe row sources are reversed.
If no such indexes exist at all, then a hash join seems logical.
employees table is used to generate an in-memory hash cluster depends on what table Oracle believes will be best based on the cardinality estimates available.
We basically have the same logic as before: for highly selective filters, Oracle will use the
employees table as the driving row source, otherwise it will pick (on) the
departments table to take the lead.
Queries 2 and 3 yield different result sets, so it’s more or less comparing apples and oranges.
Nevertheless, with an appropriate, selective index on
last_name Oracle will probably settle for nested loops for Query 2 (i.e. the one with the
WHERE clause), and a hash join for Query 3 (i.e. the one with the
If the index on
last_name is not selective at all and its clustering factor is closer to the number of rows than the number of blocks, then Query 2 may also be executed with a hash join, as we have discussed earlier.
Should the SQL engine decide on nested loops for Query 3, it is to be expected that the
departments table be promoted to the position of driving row source because Oracle can use the single-column join condition on
last_name as an access predicate.
Please note that a sort-merge join is possible in all instances. The sort-merge join is rarely Oracle’s first choice when faced with equality join conditions, especially when the tables involved are not sorted to start with.
So, what you should take away from this section is that even though the
WHERE clause is technically a post-join filter, it can be and often is used by Oracle when it fetches the data of the leading row source, analogously to single-column predicates specified in the
ON clause, thereby reducing the number of main iterations (i.e. over the driving row source) or the number of index lookups (in the probe row source) for nested loops, or the size of the in-memory hash cluster for a hash join.
For sort-merge joins these predicates can be used to minimize the size of the tables to be sorted, if one or both tables require reordering.