How to import multiple Excel sheets and files with different schemas in Alteryx
*Blog image by Jakub Szepietowski
Suppose you want to import multiple sheets from an Excel file that have the same structure. A common approach would be to use a Dynamic Input tool in Alteryx. In another scenario, imagine you want to import multiple Excel files with the same structure. A go-to solution is to use a Union tool or wildcard. However, these solutions only work well on one condition – the schema of all the sheets or files have to be identical. The sheet or file will be skipped if it has a different schema from the first sheet or file read by the Input tool. This prerequisite can be a bit tricky to meet as sometimes even though the structure of the sheets/files may look the same, they may have a different schema due to various reasons (e.g. empty cells that contain space).
A quick, simple workaround using a Batch Macro
So how can we ensure to batch import every Excel sheet and file in Alteryx regardless of their different schemas?
This can be done with a very simple Batch Macro.
For those of you who don't have the time to create a Batch Macro from scratch, you can simply download the packaged workflow I built (link at the end of this blog). Once you download it, follow the three steps below:
- Re-point the Input tool in the "Combine data with Batch import" workflow to the Excel file you would like to import
- Re-insert the Batch Macro (as it will show as missing when the directory changes) from the downloaded package and choose the right input fields for File name and Sheet name from the dropdown
- Run the workflow and check the result
For those who are keen to learn, read on. In the following step-by-step guide, I'm am using 'multiple sheets' scenario as an example, but importing multiple files would use the same principle.
Guide to successfully import multiple Excel sheets and files at once
Step 1. Create a Batch Macro to batch import multiple sheets
1.1 ADD TOOLS
To create a Batch Macro, start by creating a new workflow with an Input Data tool. Point it to the file you'd like to import and select one of the sheets. (You will see later that it actually doesn't matter what Excel file you point it to, as the file name and sheet name will be updated.) Select "Full Path" as the value for "Output File Name as Field". This is to ensure the sheet names are included in the output.
Then drag a Control Parameter tool onto the canvas and connect it to the Input tool with the lightning bolt (see Figure 2).
Notice here we want to update two fields in the Input tool – File name and Sheet name, so we need to add a second Control Parameter and connect it to the same Input tool. Name the two Control Parameter tools "File name" and "Sheet name", respectively, so that you can distinguish the input fields later. Now connect the output to a Macro Output.
(Now if you check the Workflow Type, you'll see it's automatically changed from Standard Workflow to Batch Macro.)
1.2 CONFIGURE INPUT
Click the Action tool connecting Control Parameter for File name (① in Figure 2). Select "Update Value (Default)" in the "Select an action type" dropdown. In the window shown of Figure 4 below, select File – value ="..." and then tick the box at the bottom that says "Replace a specific string:". Remove the sheet name in that string, which is everything from the pipe till the end of the string after the file name. It means only the file path will be updated by this Parameter Control.
Now click the Action tool connecting Control Parameter for Sheet name (② in Figure 2 above). Select "Update Value (Default)" in the dropdown. In the window shown on Figure 5 below, select File – value ="..." and then tick the box at the bottom that says "Replace a specific string:". Only keep the sheet name in that string, which is the text in between ' and $' after the pipe at the end of the file name (in this example the sheet name is Austria). It means only the Sheet name will be updated by this Parameter Control as the Batch Macro loops through the multiple sheets.
1.3 CONFIGURE OUTPUT
Now go to View -> Interface Designer -> Properties (left hand side panel). As illustrated in Figure 6, in "Output Mode", choose "Auto Configure by Name..." or "Auto Configure by Position..." to union output from each sheet by field names or field position at the end of the iterations. Once you've done this, it will not complain about different schema in different sheets or files. This is the key for combining tables with different schema.
Note: "Auto Configure by Name" requires the headers to be the same across all sheets/files. If the headers are different but the tables have the same structure across sheets/files (eg. Date is always in column A), use "Auto Configure by Position". In the batch macro, add a select tool to move FileName to the top to make sure the FileName column is always at the same position.
Save the Batch Macro and it's good to go!
Step 2. Use the Batch Macro in a standard workflow
The standard workflow I created has two parts – import & combine data, and extract the sheet names.
As shown in Figure 9 below, create a new workflow with an Input file pointing at the file you'd like to import. Choose "<List of Sheet Names>" for "Table or Query", and "Full Path" for "Output File Name as Field".
The "Sheet Names" column generated will be used as the input field for Sheet name in the Batch Macro. To get the input for File name, use Text to Columns (| as delimiter, split into 2 columns) to extract the file full path from FileName column. I then renamed the FileName1 column "File Name", which we will use as input for File name in the Batch Macro.
Next, connect the Batch Macro. From the dropdown lists select the correct input field for File name and Sheet name (see Figure 10).
The workflow after that is basically a standard Alteryx workflow to extract and add the sheet names back to the dataset in an extra column (Country in this case). Now you have a reusable Batch Macro and a repeatable workflow to import multiple Excel tables even if they have different schema.
In the example above, I've shown how to import multiple sheets with different schemas. To adjust the Batch Macro in order to import multiple files, simply remove one of the Parameter Control (see Figure 11). And in the Action configuration for the remaining Parameter Control, select "Update Input Data Tool" in the "Select an action type" dropdown.
In the standard workflow, use a Directory tool to obtain full paths for all files you want to import and use that as the Batch Macro input field (see Figure 12).
Note: this only works when you have only one sheet in every Excel file. If the Excel files have multiple sheets, you'll have to create full path that includes file path and sheet name in the format of File Path|'Sheet Name$'.
There you have it, a practical solution to import multiple Excel sheets and files regardless of different schemas all at once!
You can download the packaged workflow here
For more tips on using Alteryx, check out our blog and online resources.