Skip to end of metadata
Go to start of metadata

It is time for the coding - finally!

The sample testcase is that we need to read a really really weird file, it is using a character encoding never seen before, not ASCII but a byte with value 0x01 means 'A', 0x02 means a lowercase 'f' or anything that is absolutely uncommon and impossible to be build with anything other than a program language.

The first important decision to make is if this is really a TableRead Adapter. When should the file be read? Whenever DI starts a particular batch job or whenever a new file arrives it should trigger the load to DI? In other words does DI trigger the read or does the Adapter trigger DI? We want to load that file like any other regular file in a batch job every night. Therefore we can rule out the PollOperation - there is nothing to wait inside the Adapter. That leaves Document Source, FunctionCall or Table Source.

Is the data nested, so schemes with sub-schemes etc or is it just a list of scalar datatypes, chars, numbers etc. It is the latter, so we could use DocumentSource but there is no reason for.

We could implement that read from the flat file like a lookup_ext() function call: Whenever there is a row, we read the entire file or so. Nonsense, we want to read the file only.

So TableSource is the only one left. SQLEnabled maybe? Do we need to apply any highly restrictive "where" clauses on the file? We need to read the entire file anyway, only benefit we gain is to send less data from the Adapater to DI. Do we need to join two instances inside the Adapter rather than in DI? No reasons for anything like that. So it is decided, a TableSource and not SQLEnabled.

Now we need to decide how much we can configure. Is there just one kind of file with always n columns, and the column names are identical? And where do we specify the file name to read?

In this case, one file format shall correspond to one file only. So we click on "import" in the datastore, provide a name and this is the filename already. We read the first line, determine the number of columns and name all columns COL1, COL2 etc, all will be of datatype VARCHAR(100).

Anything else? Yes! The directory where the file resides. And where do we specify it? Is it the same for all Adapter Instances? So just because the directory changed a new Adapter instance has to be created? Should it be specified inside the DataStore? Yes, that makes sense.



  • No labels