DataServices is an ETL tool, it pumps data from the source to the target. So asking for the right sized hardware is like asking "What should the fire-hose diameter be to get enough water through?
Pretty sure that will depend on many other factors as well:
- How much water pressure is provided by the fire hydrant?
- How long is the hose?
- Is the fire in the 20th floor or at ground level?
The diameter of the hose is the least important part of the equation in this analogy, assuming the source system speed (water pressure at the hydrant) is a given as well as the target database speed (the height of the building representing how much data queues up in the loader). It has to be "enough" only.
In the ETL Speed chapter we will show throughput numbers of typical small to mid scale systems with typical databases and Data Warehousing dataflows.
Data Quality speed
Above example benefits from the fact that DataServices itself can process millions of rows per second including typical transformations, much faster than any source or target can provide/save the data.
Data Quality transforms are an example of Transforms that require a lot of processing power, so suddenly the ETL tool is the slowest part in the pipeline. Further more, these transforms require CPU only and then scaling is simple, twice the CPU power, twice the throughput as you will see in the Address Cleanse and Geocoding chapter.
Other transforms cannot be parallelized, for those sizing is simple as well, you will get the same throughput no matter how many CPUs your system has, see Match and Universal Data Cleanse chapter.
Text Data Processing speed
The Text Data Processing transform (TDP transform) takes free form text and does analyze it for keywords, semantic, relationships within sentences and outputs this information as multiple entities. The size of the input is less important, it is the number of entities that counts, hence we use these number of output rows as the key measure. From a performance perspective the TDP transform is just like the Data Quality transform, one that requires lots of CPU power.
- DataServices Performance example - ETL Speed
- Data Services Performance example - Address Cleanse and Geocoding
- Data Services Performance example - Match and Data Cleanse
- How to use the DataServices sizing dashboard
- Data Services Performance example - Text Data Processing