Skip to content

Why SSIS always gets Excel data types wrong, and how to fix it!

Published by Concentra

Discover analytics solutions from Concentra


Concentra’s analytics and business intelligence teams turn information into insight to give you the edge from your data. Learn more.

Working as a BI Developer with the Microsoft stack of tools, I often deal with multiple sources of data, one of which being Excel, a very common data source in Data Warehousing projects. As some of you may be aware, working with Excel in SSIS (SQL Server Integration Services) can have its issues, if you’re aware of these, this post will help people in more detail and provide you with some possible solutions to save you some time.

This blog will also help answer questions about:

The main driver for the problems we experience whilst working with SSIS and Excel originate from the fact SSIS auto-determines the data types for when connecting to the Excel source. The SSIS Excel driver determines the data type for each column based on reading values of the first 8 rows.

Concentra - Excel input data

 

Figure 1: Excel input data

 

Concentra - SSIS Excel Source Component Metadata

 

Figure 2: SSIS Excel Source Component Metadata

 

Concentra - Output preview window in SSIS Excel source component

 

Figure 3: Output Preview window in SSIS Excel Source Component

 

As you can see from the above screenshots, we receive NULLs in our destination table where we had data to load, this is not the desired output. So, how do we fix this?

By adding IMEX=1 (Import Export Mode) property to the connection string, excel reads the data correctly (see Figure 4).

Set the connection string property as below:

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:Folder1Book1.xls;Extended Properties=”EXCEL 8.0;HDR=YES;IMEX=1″;

Concentra - SSIS Excel Connection Manager Connection String Property

 

Figure 4: SSIS Excel Connection Manager Connection String Property

 

After adding this property, the excel driver reads the columns with mixed types as Unicode String data type (see Figure 5) and reads the data properly (see Figure 6).

Concentra - SSIS Excel Source Component Metadata with IMEX = 1

 

Figure 5: SSIS Excel Source Component MetadatA

 

 
Concentra - Output Preview window in SSIS Excel Source Component with IMEX = 1

 

Figure 6: Output Preview window in SSIS Excel Source Component with IMEX = 1

 

Also in situations where the 1st 8 rows are NULLs in the excel file we need to edit the registry settings and set the TypeGuessRows property to 0 in order to let the excel driver to use all rows in the file to determine the data types properly (see Figure 7) by sampling all of the rows rather than the 1st 8. Range of values this property allows are from 0-16. So, we can have 1-16 rows, or all rows as permissible sample sizes.

The location of the key needing to changes is as follows:

HKEY_LOCAL_MACHINE -> SOFTWARE -> Wow6432Node -> Microsoft -> Jet -> 4.0 ->Engines -> Excel -> TypeGuessRows has the value 0. (It contains 8 by default)

Concentra - Registry Editor – Excel Driver TypeGuessRows property
Figure 7: Registry Editor – Excel Driver TypeGuessRows property

 

 

I hope this helps investigating any Excel driver issues you may be experiencing.

Discover analytics solutions from Concentra


Concentra’s analytics and business intelligence teams turn information into insight to give you the edge from your data. Learn more.