Skip to end of metadata
Go to start of metadata

Let's discuss about what is the difference between SAP HANA and a traditional RDBMS like Oracle. Sometimes it's confusing and surprising reading blogs about in-memory database computing and the understanding of it. If processing of all data of an traditional RDBMS would be done in-memory (all data in place) you are still not having an in-memory database at all!

How Traditionally RDBMS Work

As in the introduction of traditional RDBMS memory was very very expensive and compared to that disk space was a lot cheaper so the disk based RDBMS was born. As I look back from today - this was always meant as to be an intermediate solution.

Understanding The Buffer Cache And It's Nature

Even though data of traditional RDBMS is disk based - the only place to operate on that data is in the CPU-registers, so the buffer cache is needed to bring a subset of the data nearer to the CPU without the need of I/O on every block access;

The buffer cache itself is nothing else then a small, virtual and logical memory window of the complete disk based data. Data blocks which will read into the buffer cache and are replacing other cached data blocks (already flushed ones), blocks changed in the buffer cache will written down to the disk on checkpoint and contiguously changed committed data will be logged as a byte stream by the log writer.

Because the buffer cache is a virtual window on the file block oriented data there is no capability of direct memory access to the data or more precise to a specific row of a table once loaded into the buffer cache. You need to organize a lot of lists, semaphores and memory address translation stuff to get a specific row from the buffer cache, because the unique identifier of a row the rowid. The rowid is not a memory based construct but a file based one - it contains no direct info where a specific row is located in the buffer cache - the rows starting address in the memory. A lot of CPU-cyles are needed to translate this virtual file cache nature into the a memory addressable one.

Back to the intro, if you would resize the buffer cache to hold the complete data in the cache you still have all these virtual file based mechanisms; No direct memory-access to a row - you deal still with a disk based behaving RDBMS; This is not in-memory databasing!

Real In-Memory Databasing - SAP HANA

Now as CPU and Memory has increased with it's capacity/capabilities with stellar growth even a larger amount of data could be hold directly completely in-memory.

Hence on startup of a SAP HANA database all data is loaded into memory - then there is no need to check anymore if a data is already in memory or a read from disk is necessary. The data due to column stores (vertical colum wise storage, mean values of one attribute are stored sequential in memory) is CPU-aligned; no virtual expensive calculation of LRU, logical block addresses ... but direct (pointer) addressing of data.

Additionally with SAP HANA the data is dictionary compressed means the table itself is modelled as a micro starschema, tables data contains only integers (CPU -friendly and compact) or bitmaps as data referencing the dictionary maintained values of the column and even more the usage of native advanced features of the CPU for example SIMD (Single instruction, multiple data) is supported.

Source: Tekslate

The main database storage now is the RAM instead of the disks; with this in mind an SAP HANA is able too be multitudes faster compared to traditional RDBMS even the data on the old style RDBMS would fit completely in the buffer cache.

In a real in-memory database you won't find any rowids anymore :)

  • No labels