Queries on large fact indexes cause high memory consumption and network load. This leads to non optimal performance.
Large fact indexes are normally split into parts and distributed over the blades of a BWA landscape. Now if a dimension index is joined against this fact index, the parts of these indexes first have to be transferred to one single blade so that the full dimension can be joined against the full fact index locally on this blade. This may cause high memory consumption on one blade and generates network load. The performance of the query may not be optimal.
The splitting of fact index and dimension index is not done randomly but in a smart way, optimized for joining the two indexes. This is called “Partitioning”. Joining a dimension index and a fact index means that the dimension index keys get related with those rows of the fact index that have the same values in the corresponding dimension column. So, partitioning a fact index by a given dimension means that:
- All rows of the fact index that have the same value for this dimension column go to one part index (on one blade).
- The dimension index is also split: in such a way that the rows that go to a certain blade have the same key values (=Dimension IDs ) as the rows of the fact index part on this blade.
With such a distribution of index rows it is possible to do the complete join of a fact and a dimension index by joining each part dimension index to the corresponding part fact index.
These part-joins can be done locally on each blade. The high memory and network load can get avoided, the query performance is improved.
Setting the partitioning manually
With the standard settings, a fact index will be split if it exceeds the size given in the Standalone TREXAdmin tool (“Landscape” -> “Configuration”, tab “Index”).
For large indexes there is an automatic partitioning with regards to the largest dimension. This dimension index will also be split, so that the values in the parts of fact and dimension indexes match on each blade. (Like described above.)
This setup fits for most situations.
Note 1242217 describes how the partitioning can be set manually. Another dimension (not the biggest!) can be selected and the fact index will be partitioned by this other dimension. However, please note: The corresponding dimension index will be split and partitioned only if it is bigger than the split threshold set in the Standalone TREXAdmin tool. Before the partitioning of a fact index, this InfoProviders indexes should first get deleted (in transaction RSDDV). Then the entries in RSDDTREXADMIN should be set and then the InfoProvider should get indexed again.
In very special cases, please consult note 1313260 for more detailed settings.
When should a fact table get partitioned?
There are no strict rules for partitioning, however the following guidelines can help:
- For most cases, the standard settings should not be changed. This means partitioning by the largest dimension.
- If there is no (or no critical) query that uses the largest dimension table, then partitioning by this dimension will bring no benefit. In this case, the second largest dimension index can be used for partitioning. A query uses a dimension if there are selections (filters) or drill downs that use characteristics from this dimension. In doubt, please consult your BW administrator.
- A dimension may be a Line-Item dimension (also known as Flat Dimension), then the dimension only holds one single characteristic. In this case, key in the fact index is directly the SID of this characteristic. If this single SID table is very large, the fact index can get directly partitioned according to the SID index. For details see here.
- In case of doubts we recommend just to try different ways of partitioning. No permanent damage can be done by setting an inappropriate partition attribute. The execution of some queries may become slower or in the worst case they become temporarily un-executable. But: All bad effects can always be removed by just un-doing the settings and re-indexing the InfoProvider!
- Partitioning means that the location on the blades for every data-row will be determined by the value of the partitioning column. This means a fact index can never be partitioned by two dimensions.
- If a fact index is partitioned by one dimension, the value distribution according to another dimension is random. However, this can never make the performance for other characteristics worse, since no partitioning at all also means random distribution.
- Partitioning of a split table leads to a specific distribution of the rows to the parts. But the number of rows in each part (i.e. the load on each blade) will not change. There is no danger of un-balanced load to the blades.
Exception: Over time, when more data gets loaded into an index, the load distribution can get unbalanced. Then the index should be re-build in order to ensure even distribution.