An ETL tool is primarily used to load - insert/update in other words. But sometimes a delete is required and here I explain how that actually works and what options you have.
In the simplest version you use a script with the sql('datastorename', 'delete from xxx where ....') function.
The classic way is to build a DataFlow with the Map Operation Transform in it. Let's assume we want to delete all records of a given kind from the target table. Like in this example, we want to delete all records from the target table where their column DESCRIPTION equals "ABC".
Although this looks like one delete statement is executed after the other, we benefit from the fact that source and target table can be accessed from one database (datatstores are identical in fact) and hence the Data Services optimizer performs a pushdown.
So the optimized statement will be one pushdown, in case of Oracle a PL/SQL script.
DECLARE commit_count NUMBER; BEGIN commit_count := 0; :processed_row_count := 0; FOR record IN (SELECT "CO_TRANS"."CO_TXN_ID", "CO_TRANS"."DESCRIPTION" FROM "CA_DWH"."CO_TRANS" "CO_TRANS" WHERE "CO_TRANS"."DESCRIPTION" = 'ABC') LOOP DELETE FROM "CA_DWH"."CO_TRANS" WHERE ("CO_TXN_ID" = record.CO_TXN_ID); commit_count := commit_count + SQL%ROWCOUNT; :processed_row_count := :processed_row_count + SQL%ROWCOUNT; IF (commit_count >= 1000) THEN COMMIT; commit_count := 0; END IF; END LOOP; COMMIT; END;
So the result is in a way positive, one statement is pushed down.
But what would you do as a SQL developer? You would execute one delete statement with the approbiate where clause like we suggested with the sql() function
This can be done with a DataFlow as well! We want ONE statement, so the source should be the Row Generation Transform. Then the Query, the Map Operation Transform and the target table.
In the first example due to the Map Operation Transform the target table loader did execute a "delete" statement. With what "where" clause? Why was the "where" clause in the previous solution "where CO_TXN_ID=..."? Because this column was marked as the primary key!
If that is the solution, we can do the same in the Query, we output just one column, the DESCRIPTION, we mark it as Primary Key and the mapping of this column should be the value of the where condition, 'ABC' in our case.
When you validate this DataFlow you will get an error like "Column CO_TXN_ID is key but not provided". For Data Services the Primary Key is still the Primary Key of the table. We have to change that. One option would be to fake the table - open it from the object library and modify it - but this would have issues of its own, e.g. what if that table is used somewhere else? Fortunately we do not have to do that, there is a flag in the table loader called "Use Input Keys" - For this Loader assume the Primary Key is the Primary Key as provided from the input, forget about the table. So check that flag and execute the job with the [BOBJ:Trace Option] "trace sql loader" turned on to get the delete statement printed in the trace log of the execution.