So we said, a normalized data model is not suitable because of the joins. But how expensive is a join actually?
For a join, the database reads one row after the other of the first (driving) table and for each row the matching partner in the second (so called "driven") table has to be found. Thus the database will walk through the database index to get the address of the matching row and then this row will be read.
So what, how expensive is it now? 5%? 10%? 40%?
For sure there is the overhead of walking through the Index tree. Reading the Index from disk should be negligible and in most cases it will be cached in memory anyway. What will have more effect is the random access on the disk. Instead of streaming through one table, it is now reading one table row, index, read one row from the other table, etc...
Also, keep an eye on the overhead in absolute numbers. If the overhead would be just 1 ms per row, for 100'000 rows that would be 1'40 secs!
To finally come to a conclusion regarding the costs we can do some tests. We can read one table, and then read the same table but via its primary key index. With the SQL statement
the database is forced to read the entire table, cannot use any shortcuts like for a count(*) it can read the index instead of the table, and the processing overhead is minimal.
This, executed on my notebook, show the following execution plan and execution time:
In order to force the database to read the table via the Index, we can use a database hint:
So what is the cost of the join? For the join as such we do not know. But we know when reading an entire table via an index, the overhead is 600%!!!!! It takes six times longer. and a (nested loop) join is nothing else than reading driving table row by row and reading the driven table via an index. So the cost of join can only be bigger.
Keep in mind, this is just an example to show a join is not for free. If you need one or at least a few rows only, a nested loop join like described above is perfect. But for joining huge numbers of rows it certainly is not. Therefore databases support other join methods too, like hash-joins, sort-merge-joins, etc. They are more effective than a nested loop join for large numbers of rows but still, the argument stands, a join is surprisingly expensive.