Cost Based Optimizer
A cost-based optimizer decides which access strategy is to be used based on the costs to be expected. Costs are as already mentioned above, for example, the CPU load or the read and write load (I/O).
For processing an SQL statement a cost-benefit plan is created for the different access options. The best strategy is chosen to execute the SQL command depending on the values defined in the WHERE condition. If the values are transferred as parameters, the final strategy can be determined at the time of execution.
In detail, the optimization there consists of the steps listed in the following. They apply in case the SQL statement only accesses one table.
- First, an SQL statement is processed by the parser. This performs a syntactic and semantic analysis. The syntactical analysis checks whether the SQL statement meets the SQL standard and whether the necessary authorizations are available. In the semantic analysis, tables and their column data are checked. The entire SQL statement is used, including the values in the WHERE condition. Following that, the SQL statement is converted to a special, low-level format that can be processed by the database core. The process of the syntactical and semantic check including the format transfer is also referred to as the parsing of the statement.
Each parsed SQL statement is assigned a parse ID. Identical SQL statements within a session only have to be parsed once and can be processed using the already assigned
- The optimizer determines which primary keys and indexes are available for the table and checks whether a corresponding key can be used to search for values.
This is the point at which it is decided whether the ideal access strategy can already be determined now (if concrete values are specified directly in the SQL statement) or only later at the time of execution when the exact values are transferred as parameters.
- The evaluation of the SQL statement then follows, see section #Evaluating. The strategy that causes the least read or write cost is selected on the basis of this analysis.
The cost-based optimizer can work with statistics. However, this function is not required for the SQL statements above. Here, the one-table optimizer, which does not require statistics, makes the decision. In contrast, statistics are used for SQL statements that require access to several tables (joins). The join transitions affect performance significantly. The best access strategy is determined by means of statistical information on the size of the tables and the values within the table columns.
For each join statement, the optimizer seeks out the most suitable access path for each table, as described above. Then it has to be decided in which order the tables will be processed and connected with each other (join transitions). The generation of temporary result sets should be avoided if possible. This is achieved by means of a new join implementation (nested loop join, as of MaxDB Version 7.5.00). The result is the same irrespective of the order in which the tables are linked. However, the order is significant for performance. Table statistics, in particular column statistics, are used to determine this order. These are especially important at this point, since no concrete values are known at the join transitions of the corresponding table columns before the SQL statement is executed.
Hence, the performance of the join optimizer essentially depends on current statistics, especially column statistics.
In a MaxDB database instance, the statistics are generated with the update statistics statement and stored in the system table optimizerstatistics. If the statement has been executed successfully, this system table contains information on columns and indexes, the number of different values for each column as well as the number of rows for each table that exists in the database instance.
Which predicates does the optimizer take into account?MaxDB Optimizer does not take into account predicates that are not listed in table above.
Single Table Select
Column = value
Table1.column = Table2.column
Column <, <=, >=, > value
Column BETWEEN value AND value
Column IN ( value, value, ... )
Column LIKE string value
Column = (ANY) <subquery> or
Column IN <subquery>
Column is NULL