Skip to end of metadata
Go to start of metadata

The Global Address Cleanse transform takes as input any address information and matches that, using different engines, against its Address Dictionary. The transform uses its internal knowledge how address lines are written to parse it into its segments and also corrects typing errors.


Input Option 1 - distinct fields

If the source has distinct fields for street, city, name etc you would use the corresponding input fields.

ADDRESS_LINE as the street related data


FIRM contains the company name as some companies have a postal code for themselves

LOCALITY1 is the city



LASTLINE As a backup in case city, postal code, country(?) are in one field

Performance: see Global Address Cleanse - Global Engine

Input Option 2 - Multiline fields

In case the entire address comes in one line or it is not obvious in what input column the city is actually located, put all into the Multiline1-12 columns and let the transform figure out.


The address information is put into multiple output fields to suit different requirements. You might want to identify the portion of the address that contains the street only but without any checking. In that case you would take the fields with the type GENERATED_FIELD_CLASS = PARSED. If you use the type BEST fields, spell checking is performed in addition. An address can be a delivery address only or be an actual location.
And if these distinctions are not enough you can have multiple GENERATED_FIELD_NAMES, e.g. LOCALITY1_ALTERNATE contains the city information in the version that is close to what was typed, e.g. "Washington DC" or "New York City" whereas LOCALITY1_OFFICIAL is much more strict and uses the postal code to return the official name of the city "Washington" or "New York". And if no postal code was provided it returns a NULL value here. And then there are is one field like LOCALITY1_NAME where either the ALTERNATE or the OFFICIAL city is placed, depending on the transform option settings.

Other output fields provide insight into the cleansing process. Most important is the ACTUAL_ASSIGNMENT_LEVEL where you can get the information to what level you can trust the data, was the country recognized, even the city? With the STATUS_CODE you can figure out what parts of the address got changed.

Options - Needs update to reflect Data Services 4.0 changes

The way this transform works is to identify the country first. Depending on the engine dealing with this country is used to cleanse this address utilizing its detailed address dictionary. If such a local is not available, the global engine is used for this country. So the first and most important step is to define what engines can be used, what kind of Address Directories have been installed. By default all engines in the Global _AddressCleanse configuration are enabled except the Japanese and if the job starts it will read the detailed dictionaries from the reference_data folder. If it is not installed, a runtime error is thrown.



Additional Resources



  • No labels