Skip to end of metadata
Go to start of metadata
Summary

The Global Suggestion Transform is used to either lookup data or to return a list of all possible entries for a given input, e.g. a postal code is given, all matching streets are returned.

This transform is used to lookup data or to generate a list of records matching the input.


 

Input

The input of this transform is

COUNTRY with the 2 digit ISO code

plus whatever you have

POSTCODE with the postal code

LOCALITY1 is the city name

PRIMARY_NAME1 the street name

PRIMARY_NUMBER the house number


 

Output

As output the most important fields are

SUGGESTION_LIST is a long string with all the possible entries

STATUS telling if the suggestion list could be built or why not

DATA_TYPE returns what data is missing for complete processing

SYSTEM_ERROR_DESCRIPTION contains information if something went wrong


 

So when using this transform, keep in mind it usually does not cause the dataflow to fail if an error occurred, this information is saved in the data. This makes sense as this transform is often used in RealTime Jobs where any input is provided and errors should be returned and not causing the DI process to fail.

Options

The first question to answer is what engine to be used, a country specific one or the Multi-Country one.
In the default country field you can enter a country 2 digit ISO code and then omit the COUNTRY input column.
The option "Return Single Item Lists" controls if lists with one suggestion only are to be returned.
And with the "Suggestion List" options the content of the suggestion list return column is controlled. This is somewhat different to the other transforms as there the selected output columns control what kind of data is generated and returned. But here the output column SUGGESTION_LIST is a varchar(30000) and contains a string in XML format with the configured fields.


 

Option Scenario 1 - Lookup city and state with a zip code

In this scenario the only input the transform ever gets is the country, either by specifying it in the options as default country or via the input column, plus the postal code. Do not forget, the country has to be the 2 digit ISO code, everything else will raise an error! Consider using the CountryID Transform upfront if this information is not available.
The engine to use in the options is the Multi-Country one, and the "Return Single Item Lists" has to be set to Yes. The information we want to get in the SUGGESTION_LIST column are only Lastline Components of the address, so:

SELECTION as the counter of each returned line

LOCALITY1 is the city name for the provided zip code

REGION1 will contain the state name

are set to Yes, all other fields of this Component are not important for this use case and will be empty most of the time (always?). All options of the "Primary Name Components" and "Address Components" have to be set to No.

Performance: see Global Suggestion - Lookup City

The usual question around this transform is how to decode the XML string into its fields. For this task there is a conversion function called "extract_from_xml" but as it returns an entire schema, we have to call it as New-Function-Call.
But before doing that, we need a DTD or XML Schema as the structural information of this XML. Without, the XML could contain any fields of any name and it is kind of hard to define field based mappings of none of those are known. So I created text file with the extension .DTD in any directory to define the fields of the XML:

<?xml version="1.0" encoding="UTF-8"?>

<!ELEMENT ROOT (LIST*)>

<!ELEMENT LIST (SELECTION, LOCALITY1, LOCALITY2, LOCALITY3, CITY_ADDITION, REGION1, POSTCODE1, PRIMARY_NAMES_AVAILABLE) >

<!ELEMENT SELECTION (#PCDATA)>
<!ELEMENT LOCALITY1 (#PCDATA)>
<!ELEMENT LOCALITY2 (#PCDATA)>
<!ELEMENT LOCALITY3 (#PCDATA)>
<!ELEMENT CITY_ADDITION (#PCDATA)>
<!ELEMENT REGION1 (#PCDATA)>
<!ELEMENT POSTCODE1 (#PCDATA)>
<!ELEMENT PRIMARY_NAMES_AVAILABLE (#PCDATA)>

In the object library, under Formats, a new DTD was created. Its name I have chosen to be "GlobalSuggestionLastLine", I specified the file name and the root node was ROOT. (Note: We need that file just to import it into the Designer. From then on the file is not used anymore, just the information stored in the repo.)

Now we add a new query behind the transform, carry all its source fields to the output side except the SUGGESTION_LIST, on this we want to apply the extract_from_xml function on. So a right-click on the output pane and selecting "New Function Call"...

 
...allows us to pick the extract_from_xml function found in the conversion function category (note: you can see now only those functions that a new-function-call able, not mapping functions).


 

The function requires three parameters, the first is the varchar field where the text containing the XML is located at, so we pick via the drop down button the SUGGESTION_LIST input field. But as this list lacks a common root node, we have to concat it around. Then the previously created DTD is picked and the "enable validation" is set to 0 because we do not want the XML to be validated against the strict rules of the DTD - there are no more rules in the DTD anyway.


 

What we have got with that is shown in this screenshot.


 
The input is a long string that looks like XML, and the output contains a column called LIST and its datatype is "I am an entire table/array of data". By clicking into one of these LIST data values, the data preview drills into the details, here we can see the schema's data.


 

To get rid of the schema level and bringing down its information we have to join the row with the schema's rows, we Unnest it.


 

As the XML will contain one row only, this join of one row with the one row of the schema still returns one row, in other words the unnest does not change anything. Problem is, what if the zip code contains two cities, the suggestion list contains more than one row? We join a row with a table of two rows, hence the output will be two rows - we get the row for one zip code twice with the two different cities. Not good, we have to deal with that somehow.
First and most important is that this join resp. unnest does not change the base row count, so we add a where clause. A where clause in the nested structure, not the base row. So we right-click the LIST schema and say make current.


 

Note, that when the query itself is still the current object, there is no WHERE clause. Once it was made the current object, the from, the where, the group by, order by all show the schema's settings. And we type a where clause to pick from the schema just the SELECTION=1 row.


 

With that we made sure one zip code gets from the suggestion list only the first row and during the unnest we do not duplicate the zip code row. We might use the column SUGGESTION_COUNT to further deal with that situation, e.g. if the SUGGESTION_COUNT is greater than one we know we had the case of a zip code ambiguously matching multiple cities and then we do not return any city. Easiest way of doing that would be to enhance our where clause above to "where SELECTION=1 and SUGGESTION_COUNT=1". If that is the case we join zero rows with the base row...

 

 

  • No labels

1 Comment

  1. Former Member

    Hi Tammy,

    Nice helpful article. Could you please explain more on unnesting part.

     

    To get rid of the schema level and bringing down its information we have to join the row with the schema's rows, we Unnest it.

    I see you are joining back to the original record. Could you please post the screenshot of the from Clause where you are doing Quer_1.List.selection = 1.

     

    I have also tried the same but not getting any output. Real time job is working fine but not giving any result.