With lookup_ext() you can lookup a row in a table or in a flat file. But sometimes this is not sufficient, e.g. you want to lookup the value of an aggregate function "select order_number, sum(revenue) from order_items group by order_number" or lookup in the "select * from order join order_items on (...)". For the latter you might be able to use two lookups, first lookup in the one table and return the join keys required for the second. But you can easily come up with a scenario this wouldn't help at all.
Join rather lookup
Yes, we could use a query to build the dataset and join it rather than using a lookup but as said before, lookup has advantages over the join.
Create a physical lookup table (stage table) or a virtual (database view)
Another option would be to create a stage table that has the dataset stored. Now it is a regular table we can lookup in. Another table just to pull the data together so we can lookup in? That does not sound right. A similar idea could be to create a database view rather than a table we can lookup in. And actually, that might not be such a bad idea at all. It requires that you have the permission to create a view though and if data has to be joined from different sources you are screwed, too.
Writing manual SQL code with all its downsides
The most direct method, you could call it the equivalent of the SQL Transform or a sql() function, would be the "custom sql" button at the bottom. Its use case is less to allow completely wild SQL statements but to override the DI generated SQL. You still have to pick a lookup table, even if it is not used at all. The only reason this is required is so the engine knows the datastore to use and the GUI can populate the drop down boxes with a list of available columns. The SQL you specify in the Custom SQL dialog only has to contain the columns mentioned in the condition, output and order by areas. Those columns do not even need to come from that table. As example, you could chose target_ds.TARGET.CUSTOMER_MASTER as lookup table, use the column CUSTOMER_ID as condition and CUSTOMER_NAME_1 as output column, but the custom SQL says "select order_number as CUSTOMER_ID, revenue as CUSTOMER_NAME_1 from ORDER". The datatypes should match or at least be compatible and it certainly is not a good technique either. You have to understand the idea of that feature was to modify the select statement DI would normally create slightly, e.g. yes, I want to lookup in the customer_master table but in case the customer has no order yet, treat it as non-existent. If you do not use the function wizard to create the lookup_ext() function you can actually specify any table of the datastore you want to select from and use completely different columns for the actual select, conditions, return columns etc. The engine can deal with that, the GUI is limited. But that also means no New-Function-Call but you have to use the lookup_ext in a mapping. Either way, a database view is a much better approach.
Persistent Cache table
Another solution is a variation of the stage table. But do not build the stage table in the database, use a persistent cache datastore. You simply create a new datastore of type "persistent cache" and build a dataflow doing whatever you want to finally load the data into a table of this datastore. Now it is a regular table like any other and you can use it as lookup table. The major advantage is, such a persistent cache table does not need to be read and cached, it is the cache already. All the data is binary stored in the pageable cache format, we just need to activate the pageable-cache database using that file and the cache exists. At the beginning the cache miss ratio will be high as all data is on disk and none in memory but that does change quickly. Performance is really not bad. And this approach has all the advantages of the above mentioned methods. It gives you full flexibility, all transforms, all functions, everything can be used to build the persistent cache table. The lookup is simple. No need to write any SQL at all. And you have full impact/lineage information from source to target.