Page tree
Skip to end of metadata
Go to start of metadata

The basic problem with the previous lookup_ext() example was that there is no OR-condition possible in a lookup_ext(). Otherwise the lookup table could contain e.g. a '' character for the cancel flag and we lookup where CANCELED = :value: or CANCELED = ''. So one solution would have been to allow the lookup to contain boolean operators between the conditions, but then the function would quickly get very complex to use.

Another thought is, if we want to fix that somehow, we should implement a method to be more flexible with the conditions as well, mixing >=, <, =, etc operators like we did in the originating decode() condition.

What we did in DI 12.1 to provide this required additional flexibility by supporting a new operand ~.

If this operand is chosen, we do not expect a value in the field of the lookup table but instead a pattern. And depending on the pattern syntax, you can do all kind of conditions: ranges, match-all, in-lists, equals,...etc. And now the condition is not defined inside the lookup function, it is stored inside the lookup table itself.

As Data Integrator supports two pattern matching functions already, match_pattern() and match_regex(), you have to chose which one to be used. And actually, for many cases both functions might be overly complex to use, so we created a third one, the match_simple(). Our lookup function therefore expects the column value to be either one of "mp(....)", "mr(....)" or "ms(....)" to indicate which of the three pattern matching functions to be used.

Let's walk through our example: We had a mapping of

decode(ORDER_TYPE = 'R', REVENUE * (-1),
       ORDER_TYPE = 'X' and CANCELED = 'Y', 0,

We can rewrite it so that each condition has both columns, ORDER_TYPE and CANCELED in it.

decode(ORDER_TYPE = 'R' and CANCELED like '%', REVENUE * (-1),
       ORDER_TYPE = 'X' and CANCELED = 'Y', 0,

As CANCELED like '%' is true for every value of CANCELED it does not change the logic. And now we can replace the different conditions with pattern matching functions doing the same.

decode(match_simple(ORDER_TYPE, 'R') and match_simple(CANCELED, '*'), REVENUE * (-1),
       match_simple(ORDER_TYPE, 'X') and match_simple(CANCELED, 'Y'), 0,

Here we are, we know the lookup table:





ms( * )





For the lookup_ext() function everything is still the same except the operand is a ~ now.

Let us recap what above lookup will do assuming the input values are an 'R' for the ORDER_TYPE and a 'Y' for the CANCELED flag. DI will read the the first line of the lookup table, use the current field value 'R' as input parameter and the value stored in the lookup table as second argument - as the pattern. Then DI calls the appropriate pattern matching function match_simple('R', 'R') and evaluate the return. Does 'R' match 'R'? Yes. So we have a first candidate. Next it will evaluate the second argument match_simple('Y', '*') and the function does return a true as well, yes, the character 'Y' matches the pattern any-string. We found a row in the lookup table where both conditions apply.

This is exactly the same thing as we do we regular lookups, just there the only condition we can evaluate is in the form 'R' = 'R' and 'Y' = '???'.....there is no any-character in an equal.

In our example we could have used the = operand instead of the ~ operand for the ORDER_TYPE, so a mixture of different operands is fully supported. But with the pattern matching version the conditions can be more flexible. In the lookup table we require multiple lines, one for each of A,B,C,D,E because the condition should be
ORDER_TYPE in ('A', 'B', 'C', 'D', 'E')
now we can have just one line with the pattern

Even ORs are possible
ORDER_TYPE = ' ' or ORDER_TYPE is null
is the pattern
And if match_simple does not support a certain condition, we simply change to another pattern matching function, in worst case match_regex() - everything can be expressed as regular expression. Another neat trick is to keep the conditions overlapping, like if ORDER_TYPE < 'W' is one condition, the other is ORDER_TYPE = 'R'. Which one is taken? The first one, so we use an additional PRIORITY column, use the MIN policy and assign ORDER_TYPE = 'R' the top priority 1, the other gets a 2,.... When executing the dataflow I got the error "The type for pattern <Y> specified in the lookup_ext table is invalid. Valid pattern types are mr(), mp(), ms()." What does that mean? The data in my lookup table is: Very simple, we use the ~ operator, this needs the information which pattern matching function to use and hence requires the text mr, ms or mp. We have to replace the constants with a pattern matching code. For ORDER_TYPE we could stick to the = operand instead of the pattern but the second row in the cancelled column has to be changed to ms(Y).

  • No labels