15 May 2013

Why SSIS always gets Excel data types wrong, and how to fix it!

Sandeep Chayanam

Working as a BI Developer with the Microsoft stack of tools, I often deal with multiple sources of data, one of which being Excel, a very common data source in Data Warehousing projects. As some of you may be aware, working with Excel in SSIS (SQL Server Integration Services) can have its issues, if you're aware of these, this post will help people in more detail and provide you with some possible solutions to save you some time.

This blog will also help answer questions about:

  • Why doesn’t SSIS pick up my data types correctly when using Excel as a source
  • Why are some of my columns NULL when the excel file contains data
  • How do I fix problems with my excel source
  • Excel SSIS driver issues

The main driver for the problems we experience whilst working with SSIS and Excel originate from the fact SSIS auto-determines the data types for when connecting to the Excel source. The SSIS Excel driver determines the data type for each column based on reading values of the first 8 rows.

  • If the top 8 records contain equal number of numeric and character types – then the priority is numeric (column2 in Figures 1 & 2)
  • If the majority of top 8 records are numeric then it assigns the data type as numeric and all character values are read as NULLs (column3 in Figures 1 &2
  • If the majority of top 8 records are of character type then it assigns the data type as string and all numeric values are read as NULLs (column4 in Figures 1 & 2)
Concentra - Excel input data

Figure 1: Excel input data

 

Concentra - SSIS Excel Source Component Metadata

Figure 2: SSIS Excel Source Component Metadata

 

Concentra - Output preview window in SSIS Excel source component

Figure 3: Output Preview window in SSIS Excel Source Component

 

As you can see from the above screenshots, we receive NULLs in our destination table where we had data to load, this is not the desired output. So, how do we fix this?

By adding IMEX=1 (Import Export Mode) property to the connection string, excel reads the data correctly (see Figure 4).

Set the connection string property as below:

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\Folder1\Book1.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";

Concentra - SSIS Excel Connection Manager Connection String Property

Figure 4: SSIS Excel Connection Manager Connection String Property

 

After adding this property, the excel driver reads the columns with mixed types as Unicode String data type (see Figure 5) and reads the data properly (see Figure 6).

Concentra - SSIS Excel Source Component Metadata with IMEX = 1

Figure 5: SSIS Excel Source Component MetadatA

 

 Concentra - Output Preview window in SSIS Excel Source Component with IMEX = 1

Figure 6: Output Preview window in SSIS Excel Source Component with IMEX = 1

 

Also in situations where the 1st 8 rows are NULLs in the excel file we need to edit the registry settings and set the TypeGuessRows property to 0 in order to let the excel driver to use all rows in the file to determine the data types properly (see Figure 7) by sampling all of the rows rather than the 1st 8. Range of values this property allows are from 0-16. So, we can have 1-16 rows, or all rows as permissible sample sizes.

The location of the key needing to changes is as follows:

HKEY_LOCAL_MACHINE -> SOFTWARE -> Wow6432Node -> Microsoft -> Jet -> 4.0 ->Engines -> Excel -> TypeGuessRows has the value 0. (It contains 8 by default)

Concentra - Registry Editor – Excel Driver TypeGuessRows property Figure 7: Registry Editor – Excel Driver TypeGuessRows property

 

I hope this helps investigating any Excel driver issues you may be experiencing. Please feel free to comment on the post with your experiences.

Sandeep Chayanam

About the author

I am a Senior Business Intelligence Developer at Concentra with in-depth knowledge in Data Warehousing and Data Mining. I’ve been working on various Data Warehousing and Data Visualisation projects over 3.5 years in different sectors using Microsoft technologies, Tableau software & more recently, Alteryx. My current interests are in delivering quick data blending, data analytics & data visualisations using Alteryx & Tableau tools.