Discover analytics solutions from Concentra
Concentra’s analytics and business intelligence teams turn information into insight to give you the edge from your data. Learn more.
In this post, I will go into more detail about how to clean up formatted, cross-tabulated and gap-filled Excel data. This can be easily done with Tableau 9.0’s excellent new data interpreter, pivot and data splitting tools. Let’s look at examples of how to use these tools to restructure the type of data we tend to see with clients.
At Concentra we work closely with health service providers to deliver analytics for providers and commissioners. Therefore, we are always integrating publically available demographic and outcomes data to enrich their solutions. To demonstrate how the new Tableau 9.0 data tools work, let’s use the following Patient experience data from hospitals:
If you were using an older version of Tableau, the primary issues with connecting to this sheet would have been:
- The data table does not begin with headers in the top row, meaning older version of Tableau would not be able to properly determine the table range
- There is a gap in the data located within column B. Tableau would have included this as a field, even though it does not provide any data
- The ‘Provider’ field has both the code and name; it would be better to have these as separate fields, especially when we are blending it with other data using Provider Code
- The patient experience scores are cross-tabulated by year, which would make it difficult to conduct a basic time series analysis
In older versions of Tableau, we would have had to help our clients and trainees restructure this sheet in Excel prior to connecting to the data. With Tableau 9.0, these common data preparation hurdles are a thing of the past. Let’s go through the steps.
Step 1: Interpret the data
First of all, there’s no need to ensure your data table headers start in cell A1 of a spreadsheet. If your data doesn’t look right to Tableau, it will warn you as shown below.
By turning on the Data Interpreter, Tableau will use an algorithm to search through your spreadsheet and identify the proper location of your table. In addition, it will automatically remove any gaps in your data. As shown below, Tableau has successfully identified the header row and removed the gap which was located in column B.
You can even click on “Review results” to see a marked-up version on your spreadsheet and how Tableau interpreted it.
Step 2: Split it up
The provider column contains both code and name, however we need the code in its own field in order to blend it with other data sets. Previously, in order to split out the ‘Provider’ column we would have had to create a relatively complicated calculated field using string functions. Now, using the split option within the data connection window, we can break up ‘Provider’ into ‘Provider Code’ and ‘Provider Name’ with a single click.
Simply select the column you want to split and choose “Split”. This will automatically detect the separated character, in this case a hyphen, and break the provide field up into two new fields. Note that you can also do a custom split if Tableau does not correctly choose the appropriate character to split on.
Step 3: Do a pivot
For years we’ve been using third party tools, such as the Tableau Add-in for Excel, to pivot data from a cross-tabulated view into the flat data tables that Tableau works best with. With Tableau 9.0 we can finally do all this data preparation in a single place. By selecting the fields we want to pivot, as shown below using the year fields, we can completely restructure the orientation of our data.
In this case we selected the year fields which each contain the Patient experience scores. Once pivoted, we have a date field and a value field. This improved structure will allow us to easily conduct time series analysis or use date as a filter.
Step 4: Make it business friendly
Lastly, the new metadata view allows us to easily see all of the fields and their types and to rename them as required. Below we’ve properly named the split and pivoted fields.