What is an optimizer and why do we need it?
As SAP developers, virtually all the ABAP programs we write will contain one or more Open SQL select statements, so we are constantly accessing the database in increasingly complex ways. As the database matures and naturally grows, it becomes progressively more important to identify and implement the most efficient data access modes we can.
To help us with this task, Oracle provides an Optimizer as an integral part of the database. The optimizer figures out how best to retrieve data from the database by calculating the most efficient way to execute an SQL statement. While this is a great feature to have, the optimizer is bound by the quality of the SQL statement written. The more we know about the optimizer, the better our SQL statements will be.
For any SQL statements processes by Oracle, the optimizer does the following:
- Evaluation of expressions and conditions: Translates certain syntactic constructs like (EQ, IN, LIKE, NOT etc) into equivalent constructs.
- Statement transformation: For a complex statement involving, for e.g. correlated sub queries, the optimizer may transform the original statement into an equivalent join statement.
- View Merging: For an SQL statement that accesses a view, the optimizer often merges the query with that in the view and then optimizes the result.
- Choice of Optimization Approaches: The Optimizer chooses either a cost-based or rule-based approach to optimization and determines the goal of optimization.
- Choose of Access paths: For each of the tables accessed by the statement, the optimizer chooses one or more of the available access paths to obtain the table’s data.
- Choose of Join Orders: For a join statement that joins more than 2 tables, the optimizer chooses which pair of tables is joined first, and then which tables is joined to the result and so on.
- Choice of Join Operations: For any statement, the optimizer chooses an operation to use to perform the Join (i.e Nested Loops Join, Sort-Merge Join, Cluster Join)
Different Optimizers Approaches
There are 2 approaches to the Optimizer:
1. Rule Based Approach
2. Cost Based Approach
The preferred approach is determined by a system parameter which can even be set to ‘CHOOSE’ .When this setting is Chosen the optimizer chooses between a cost-based approach and a rule-based approach based on whether statistics are available for the cost-based approach. Statistics quantify the data distribution and storage characteristics of tables, columns, indexes and partitions. Read more about these 2 approaches below.
Rule Based Approach
Here, the optimizer chooses an execution plan based on the access paths available and the ranks of these access paths.
Oracle’s ranking of the access paths is heuristic (evaluation using predefined knowledge or experience a.k.a the thumb rule).If there is more than one way to execute an SQL statement, the rule-based approach always uses the operation with the lower rank. Usually, operations of the lower rank execute faster than those associated with constructs of higher rank.
Rank Access Path
1 Single row by row id
2 Single row by cluster join
3 Single row by hash cluster key with unique or primary key
4 Single row by unique or Primary key
5 Cluster join
6 Hash cluster key
7 Indexed Cluster Key
8 Composite Key
9 Single-Column indexes
10 Bounded range search on indexes
11 Unbounded range on indexes
12 Sort-merge join
13 MAX or MIN of indexed column
14 ORDER BY on indexed Columns
15 Full table scan
The Access paths are described in detail in the posts “SQL Optimizer Access Methods” and “Join Operations”.
Cost Based Approach
Here the Optimizer determines which execution plan is most efficient by considering available access paths and the factoring information available for the schema objects(Tables or indexes) accessed by the SQL statement. This also takes HINTS into account, which is optimization suggestions placed in the SQL statement.
Steps in the Cost-Based approach :
- The optimizer generates a set of potential execution plans for the SQL statement based on its available access paths and hints.
- The optimizer estimates the cost of each execution plan based on statistics in the data dictionary for the data distribution and storage characteristics of the tables, indexes and partitions accessed by the statement. The cost is an estimated value proportional to the expected resource use needed to execute the statement with a particular execution plan. The optimizer calculates the cost of each possible access method and joins order based on the estimated computer resources, including I/O, CPU time and memory that are required to execute the statement using the plan.
- The optimizer compares the cost of the execution plans and chooses the one with the smallest cost.
Statistics for Cost-Based Optimization:
The cost-based approach uses statistics to calculate the selectivity of predicates (a.ka. the WHERE clause comparison values) and estimate the cost of each execution plan. Selectivity is the fraction of rows in a table that the SQL statement’s predicate chooses. The optimizer uses the selectivity of a predicate to estimate the cost of a particular access method and to determine the Optimal Join order. Statistics, as explained before, quantify the data distribution and storage characteristics of tables, columns, indexes and partitions.The optimizer uses these statistics to estimate how much I/O, CPU time, and memory are required to execute an SQL statement using a particular execution plan.