The advantage is, not the entire comparison table has to be read, only the rows actually required. On the other hand, for every single row we have to execute the SQL wait for the database to perform the query and fetch the found data.
The SQL generated for the row-by-row mode is like
select columns_required.... from comparison_table where key = :parameter order by generated_column desc
So the SQL statement is preparsed and the where clause contains all the columns listed in the "Input Primary Key Columns". The additional order by is required in case the comparison table is a slow changing dimension table with one logical key having multiple versions.
The performance of this transform can never be in the same league as an engine-only transform and also depends on the database performance. Btw, it might be a good idea to check if the comparison table does have an index on all columns of the "Input Primary Key" list, otherwise the database will have to perform a full table scan for every single input row.
Nevertheless, this transform was highly tuned, therefore the performance is not that bad with still a throughput of 3600 rows per second. Also, setting a DegreeOfParallelization on the dataflow might be a good test on multi-CPU servers as the wait for the lookup can be reduced, especially in case the jobserver and the database are not on the same server - the network latency has to be added.
Also, it might be a very good idea to use DegreeOfParallelism setting in this mode as there will be a lot of waiting for Data Integrator and the database.
One interesting remaining question is, what is the difference if a column is omitted in the input schema vs. not listed in the compare column list?
For new rows, this is simple, the value will be null. For update rows, the Table Comparison will copy all the found column values of the comparison table into the before image buffer and the after image buffer of the row. After that, the input schema values will be copied into the after image buffer. If the input does not contain a certain column, it obviously is not compared - there is no value to compare with. But if any column listed in the compare column has changed, the transform outputs the update row including the before and after image values. So if the input column is omitted, the update will have the old values in the column. If the input schema did contain the column, it will be updated to the new value. In the example, the CURRENT_IND remains what it was for the update row, the insert row did not get any value, so the CURRENT_IND is null.
Job_TableComparison_rowbyrow.zip (4.4 KB)