The intent of this Wiki is to provide information regarding performing a Majority Posting to master records using Best Record functionality in the Match transform in Data Services.
Majority Posting uses Best Record functionality in a Match transform to post the value in a field that occurs the most often within the duplicate group back to the master record. This field would not be used in determining the duplicate group so the values may differ within the duplicate group. A single Match transform with Best Record enabled will not be able to perform a Majority Posting. However, it is possible to do so with multiple Match transforms where the first transform performs the initial duplicate group forming and each subsequent Match transform performs the Best Record for each field to be Majority Posted.
To better illustrate what a group may look like and what the desired output would be, refer to the following screenshot. In this example the dupe groups will form by matching on the Source field and the value that occurs most often in each of the attr fields will be posted to the master record of the group.
In the group where Source=1, the expected output in the attr fields of the master record would be:
As you can see these are the values that occur the most times for each attr field where Source=1. Note that in the attr3 column the values for “xyz” and “abc” both occur twice. In cases where there is a tie for number of maximum occurrences, the first of these values found in the dupe group will be posted.
How the Dataflow Works
The following is a sample layout of a dataflow that performs a match and then does a majority posting on three different fields. For reference, a sample atl file for this dataflow is attached to this page under Related Documents.
The Match_Source transform is a multi-level Match transform that forms the initial duplicate groups and also creates a group count for the first field to be majority posted. In this example the level 1 match is using the Source field to create the duplicate groups. Note in the screenshot only one custom rule is used.
In Post Match Processing the GroupStats are being generated and then the output is sorted in ascending order by Group Number and Group Rank. This ensures the dupe groups are sorted in the desired order prior to being passed on to the second level of matching.
The second level of matching is performed on the first field that will have majority posting done on it. In this case this is the Attr1 field. This second level of matching receives the level 1 dupe groups and then forms duplicates based on the contents of the Attr1 field. The largest group formed at this level is the value that occurred the most times within the level 1 dupe group.
From the match on Attr1, the group count that is generated will be used to identify what value occurs the most times in that field. This group count information will be used to perform the Best Record posting in the next Match transform in the dataflow.
Once matching is complete in the first Match transform, the following Case transform routes duplicate records to the next Match transform and any unique records bypass the matching.
Match Transform Attr1_post
The Match transform Attr1_post is used to perform the Best Record posting of the value in the attr1 input field that occurs the most times in the dupe groups formed in the Match_Source Match transform. Input to this transform are the shown in the below screenshot.
Breaking in the Attr1_post transform will be done on the Source_GROUP_NUMBER field which maintains the same match groups that were formed in the initial matching. Note that the Input Already Sorted option is set to YES so the input data is not sorted again since it is already in dupe group order.
There are again two levels of matching in the transform. The first level matches on the Source_GROUP_NUMBER field to reform the dupe groups. This level also performs the Best Record posting of the Attr1 field.
For the Best Record posting, the PRIORITY_STRING strategy is chosen and priority order is set as DESCENDING. The strategy field used is the Attr1_GROUP_COUNT field which is what allows the largest group count (value occurring the most) from subordinate records to be posted to the master record.
The second match level named Attr2_Match receives the duplicate groups formed in the first level and matches within those groups on the next field to be majority posted, Attr2. This matching generates the group count that will be used for the Best Record posting in the following Match transform.
Match Transform Attr2_post
The next Match transform, Attr2_post will be setup in similar fashion to the Attr1_post transform. This transform will have the attr3 field also mapped to input for matching to create the group count for the following Best Records processing.
The breaking in the transform is setup the same as it was in the previous transform on the same break field and with no sortation performed.
Again the transform incorporates two levels of matching with the first level match being done again on the Source_GROUP_NUMBER field to reform the dupe groups for the Best Record process to be performed on.
The Best Record posting is done in this match level and is set up the same way as in the previous Match transform. The difference being the Attr2 field is being posted this time.
The second level matches on the final field to be majority posted, Attr3, to create the group count that will be used for the Best Record posting in the last Match transform.
Final Match Transform Attr3_post
For the final Match transform there will only be one match level since there are no additional fields to be majority posted so no additional group count needs to be created. The match will be setup with the same breaking and matching as used in the previous two Match transforms.
The Best Record posting will be setup the same as in previous transforms except the source field will be the Attr3 field.
The results of the matching process, and the unique records that were bypassed, are then merged back together prior to being output to the temp table. The final output shows the groups that were formed and the majority value for each of the Attr fields posted back to the master records.
Sample version 4.0 atl and input file Majority_Post.zip