Alteryx: How to perform dynamic incremental data extracts
Having used Alteryx with a major investment banking client, I was impressed by the number of features on offer. As well as quick data blending, transformation and loading activities, Alteryx performs data analysis and integrates with Tableau. In particular, I’ve found creating Tableau data extract files and publishing them to the Tableau server to be extremely fast and efficient, when compared to the many other ETL tools on the market.
One of my favourite features of Alteryx is the Dynamic Input tool. In this blog I cover how how to perform incremental data loading from a source system using this feature:
1. Get the last extract date from a Control table, which holds the previously extracted date
2. Send it as an input to the dynamic input tool, which filters the data from source based on the last extract date
3. Load data into a target table
4. Get the maximum extract date from processed data and update the control table with the latest extract date
A detailed description of each operation is discussed below.
- The control table holds the date of the last extract, which is passed as an input to the dynamic input as shown in Figure 1.
- Provide the required SQL query with a date filter within the Dynamic Input tool, as shown in Figure 2.
- Choose the modify SQL query option and then choose SQL: update WHERE clause option, as shown in Figure 3.
- Provide last extract date from the control table as a replacement field, as shown in Figure 4.
- The required data is extracted, processed and loaded into the target table. Get the latest date from the data processed and update the control table with this latest extract date as shown in Figure 1. Choose the update and insert' if new option' to update the control table, as shown in Figure 5.
By following the above steps, incremental data extracts from source systems can easily be performed using the Dynamic Input tool. The above example can be extended to cater for intraday data loading activities using datetime data type.
Updating the control table can also be performed as a part of the Post Create SQL Statement property of the Output tool. This makes sure that the control table gets updated only after successful completion of the data load. The downside of this is that the latest extract date needs to be derived again by scanning the whole table, which can be costly for huge volumes of data (by which I mean millions of rows).
Apart from performing incremental extracts, the Dynamic Input tool can also be used to import data from flat file sources, which have different naming conventions but the same structure.
I hope this gives an insight into the different use cases of the Dynamic Input tool. Please feel free to comment on the post with your experiences.