This page is intended to show the input, output, and option tabs available to the Data Cleanse transform within Data Services and what is contained in them.
The Data Cleanse transform is used to perform parsing, standardizing, and enhancement of customer and operational data.
- Parsing identifies individual data elements and breaks them down into their component parts. It rearranges data elements in a single field or moves multiple data elements from a single data field to multiple discrete fields. Data Cleanse parses name, title, firm, phone, US social security number, date, e-mail, and user-defined patterns.
- Standardization includes business rules around formats, abbreviations, acronyms, punctuation, greetings, casing, order, and pattern matching – all examples of elements you can control to meet your business requirements and prepare the data for validation, correction, and accurate record matching.
Source input fields must be correctly mapped to the Data Cleanse transform for generating correct output. Fields from the input source can be mapped to Data Cleanse in pure multiline format (MULTILINE 1- 12 fields), discrete format (FIRM_LINE1, DATE1, etc.) or a combination of multiline and discrete fields which is also known as hybrid format. Data Cleanse will attempt to parse all types of data from MULTILINE fields whereas it will only look for the specific data type associated with a discrete field.
After processing the source data, the Data Cleanse transform populates the parsed and standardized results in the applicable output fields. Data mapped to MULTILINE fields will be output to the appropriate field for the data type that was identified and parsed. Name data that is input as a single line can be output as discrete given names, family name, and postnames. Data Cleanse can identify gender based on the name and add a name designator (e.g. Mr.) based on gender.
Brief description of some attributes of Data Cleanse output fields:
- The Field Class for a field specifies if the field contains standardized data or if it only contains original parsed data.
- Type identifies the data type of the output field as well as the maximum length of the field.
- Content Type can be set to identify the content contained in the field so downstream transforms such as Match can automatically identify the data contained in those fields.
All major Data Cleanse option groups are depicted in the image below. Various options are available for controlling how data is standardized and parsed.There are several country specific predefined Data Cleanse configurations that can be used when setting up a dataflow. Introduced with version 4.1, all domain (regional/country) specific cleansing packages have been combined to for a single global SAP supplied person and firm cleansing package.
Options for 4.2
Options for 4.1
Options for 4.0:
1596776 - Can one Data Cleanse utilize multiple language packages?