We call Components all WorkFlows that can or should be reused, e.g. a Workflow loading an entire dimension table. So in other words, we never reuse DataFlows or other objects, we always embedded them in a Workflow called C_...... A Component consists of the actual DataFlow (maybe inside a Conditional in case initial and delta load is different) but before, all other Components required to be loaded successfully, are listed. In the attached example, C_CostCenter_SAP Component requires the Controlling Area table to be loaded first.
As Components might be called in several different places, e.g. Dimension Tables belonging to different Star Schemas and therefore are part of Sections, each such Component WorkFlow should have its property Execute Only Once set.
Component Checklist
Common DI Design considerations |
|
---|---|
10 |
What other Components are required to be loaded upfront? Add those as the first objects in this Workflow and connect them to ensure they are loaded prior execution of our DataFlow. |
20 |
Is the flag "execute only once" set for this Workflow (Component)? Otherwise, it would be loaded again if it is a dependant object to others. |
30 |
Does the Component have a Description to document what it is doing? |
Data Considerations |
|
10 |
In Initial Load mode, is the table being truncated? |
20 |
Does the target table have a primary key and do you make sure it cannot be violated regardeless of the source data? |
30 |
What will happen if this Component failed with the load in the middle and is restarted again? It has to be ensured that it will deal with that situation automatically, e.g. truncate the table prior the load, load with autocorrect load/table comparision or having a sql() script that deletes the data going to be loaded. Also consider that problem for all other supporting tasks as well like dropping and creating an index. |
40 |
For all colums used as foreign key or used in filter conditions in end user queries, make sure the column value is never NULL. |
50 |
What columns are marked as NOT-NULL in the target table? These should have the guarantee that there will never ever be a NULL value. |
60 |
If joins between source tables are required, join them in the first query if possible to ensure the join is executed in the source database if possible. |
70 |
Use lookup_ext function calls rather than joins if - there is no unique constraint (primary key) on the lookup condition - the cache setting has to be controlled directly - the lookup table is small (less than 100'000 rows) - the colums returned by the lookups are not key to understand the dataflow (lookup is hidden inside a query, a join is obvious when looking at the dataflow) |
80 |
Inside nvl() functions and as default values of the lookups, use the global variables $G_DEFAULT_DATE, $G_DEFAULT_NUMBER, $G_DEFAULT_TEXT instead of hardcoding a value unless it is really required. |