Skip to end of metadata
Go to start of metadata

You have millions of customer records in your ERP system but are you absolutely certain one record represents one individual? Or are you sending the new catalog via postal mail to all four members of the family instead of just one for all? The Match Consolidate Transform in its different Customer configurations helps to identify the duplicates and who belongs to whom.

Take this example here:

Here, each house is populated by two families named "Daehn" and "Miller", each family has three members with first name "Werner", "Jack", "Diana". But in the system, the record for "Werner" is a duplicate, there are two rows each in the source.
So when are two customer records identical? That really depends on the use case. If you consider a customer to be an individual, you would say the two "Werner" records of each family are duplicates. In the catalog example, you might send one per address or one for each family of that address?
In a completely different case all you have from you customer is name and date of birth. Duplicates are records where these two match. Or company names, or....many other options. For the most common scenarios we have created different predefined Configurations of Match, the one picked in this example is "ConsumerHouseholdResFamInd_MatchBatch", a version where Residence, Family and Individual - all three at once - is tried to be matched.
And the result is that each input record gets three keys to identify what record belongs together. So for the 8 records with the same address, the Residence_Group_Number will be the same number. The four records with same family name within that group will share the same Family_Group_Number. On the level of the Individual, those records with the same first name will share the same Individual_Group_Number. In case one record cannot be matched to any other as it is a true unique record, the group number is NULL.

One impression I gave here is that the text has to match down to the very character, but actually this is not a requirement. The match transform can have complex rules included to decide when records are considered to be a match, not only between one field but also combination of fields. So with that "WERNER DAEHN" is the same as "Werner Daehn" as "Daehn Werner" etc. The one thing the Match Transform does not have are the Address and Name dictionaries, so saying that "Elisabeth Tylor" and "Liz Tayler" are actually the same Individual is beyond its capabilities. For that reason you will find a Data Cleanse Transform (Data Services 3.x) and or an Global Address Cleanse Transform being placed before the Match to generate that information upfront and the Match transform is prepared to take their input into account.


for the ConsumerHouseholdResFamInd_MatchBatch configuration

PERSON1_GIVEN_NAME1 as the first name

PERSON1_FAMILY_NAME1 as the last name


ADDRESS_PRIMARY_NUMBER the street number

POSTCODE1 the postal code

all the other fields are partially optional, so I mapped them all to a query column created before and named "EMPTY". Also, as the street information I had was not cleansed via an Address Transform first, the street contained name and number, so the same field was placed as input for ADDRESS_PRIMARY_NAME and _NUMBER. This decreases the match capabilities as the street name might have some flexibility configured to when names are similar, the number will have less. Certainly not optimal use of this transform...
Some of these additional fields are effectively prepared to consume the upfron Data Cleanse transform, e.g. the PERSON1_GIVEN_NAME1_MATCH_STD1 is the first alias of the given name (or empty) created by the Data Cleanse transform and named identical. Just to help the user.


Performance: see Match Consolidate - Household Data


For each of the tree levels (Residence, Family, Individual) there are

_GROUP_NUMBER where the shared key of duplicates is stored

_MATCH_STATUS telling what record is the leading one, the others are the duplicates of that one



_MATCH_SCORE returns the confidence level of the transform this actually is a duplicate


and some more fields.



In the options there should not be the need to change anything as that was the entire idea of creating such a configuration in first place instead of using the Base_Match Transform. Nevertheless, you can configure the options completely.

  • No labels