Having trouble with data blending in Tableau? Fear not, Alteryx is here!
As part of a recent project, I received a packaged Tableau Workbook where multiple data sources were blended (click here to learn more about blending in Tableau). The author was trying to map organisations at the Postcode unit level, the most granular level of a UK postcode (click here to learn more about how UK postcodes work). Unfortunately, Tableau 9 is able to map only to the Postcode district level in the UK (like SE3 or EC2V). Armed with Tableau's blending capabilities, the author decided to blend a dataset containing latitude and longitude information for UK postcodes with his/her organisational dataset in Tableau. The result was a Tableau Workbook that took minutes to load when a filter was applied or a parameter was changed. As you can imagine, this massively compromised the user experience.
The solution to Tableau's data blending woes is to use the blending capabilities of Alteryx. The reasons will become self-evident as you read along.
Case in point
Imagine the following scenario: An analyst for a UK Car Dealership has been given the task of building a report (using Tableau's mapping capability!) showing whether individual dealerships have been meeting sales targets. To produce this report, two Microsoft (MS) Excel files have been provided:
1. Actual sales information for the dealerships (with postcodes)
2. Sales target information for the dealerships (with postcodes)
With the knowledge that Tableau 9 can map only to the Postcode district level in the UK, the analyst turns to 'Google' in search of a dataset containing latitude and longitude information for UK postcodes and soon stumbles upon a suitable CSV file. The caveat - it is too big for Excel so any sort of data manipulation using the Microsoft tool is off the cards. Two solutions to the predicament are described below (and you can choose the one you prefer).
Solution 1: Blending in Tableau
The analyst decides to connect Tableau directly to the three files (Excel files with sales information and CSV file with geographic info) and manipulate the data in the software. The following actions are performed:
1. Blending the actual sales data with the target sales data (on postcodes) and creation of a Calculated Field to determine whether the dealerships have met their sales target
2. Blending the actual sales data with the CSV file (on postcodes)
3. Building a drill-down report to visualise whether dealerships in a given Region and County have met their sales targets (as shown in Figure 1)
The Packaged Tableau Workbook can be downloaded here.
Solution 2: Blending in Alteryx
The analyst decides to blend the three data sources using Alteryx to create a Tableau Data Extract (.tde file) optimised for superior performance. The following actions are performed:
1. Connecting to the three data sources in Alteryx and blending them using the Join (on postcodes) and Union tools to generate a Tableau Data Extract (as shown in the workflow in Figure 2)
2. Connecting Tableau to the '.tde' file and creating a calculated field to determine whether individual dealerships have met their sales targets
3. Building an identical drill-drown report as described in Solution 1
The Alteryx workflow (along with the sample datasets) and packaged Tableau Workbook can be downloaded here.
Key stats of Blending in Tableau vs Alteryx
To understand which of the solutions would save time and lead to a better user experience, the following two metrics were recorded:
1. Time taken to blend data and build the dashboard (Figure 4)
2. Dashboard load time on changing Region and County of Car Dealerships (Figure 5)
Figure 4 illustrates that using Alteryx to blend data and then building the dashboard in Tableau led to a 25 % time saving (10 minutes). This gain was primarily achieved because the 30 second query time due whenever a filter is changed in 'Solution 1: Blending in Tableau' was avoided.
Figure 5 convincingly proves that the user experience is significantly better when data blending is done outside of Tableau. For the purposes of this blog, the load times for 'Solution 2: Blending in Alteryx' was calculated by allocating one second per click i.e. on Region and then County in the Filters section of the dashboard (shown in Figure 1).
Based on the above experiment, it should be clear that Alteryx should be used to prepare multiple data sources for Tableau because:
1. It removes the necessity to blend between data sources in Tableau which can have a significant performance implication
2. Allows you to have a visual and repeatable workflow for data blending that is easy to audit and tweak when the need arises
3. Create a Tableau Data Extract at the end of the blending process which is optimised for superior Tableau performance (which can be published to Tableau Server if need be)