Skip to content

Excel data sources: no longer the scourge of SSIS Developers

Published by Concentra

Discover analytics solutions from Concentra


Concentra’s analytics and business intelligence teams turn information into insight to give you the edge from your data. Learn more.

To find out more about Concentra’s bespoke Analytics Services and Solutions see Services.

The Scenario

Anyone who has worked with Excel as a data source or destination in SSIS (SQL Server Integration Services) has probably found him/herself frustrated and fuming at how difficult it can be. Microsoft provided SSIS with a simple way to deal with the task of reading data from or writing data to Excel files. If the data you have to read from the spreadsheet is clearly formatted, then you are probably one of the few lucky people. However, the reality is never normally so straight forward.

The Business user

The fact is that Excel is widely used by our business users on a daily basis and the wide variety of ways to represent data in it has a significant impact on how the standard SSIS Excel data source and destination components will work. For example, how do your users represent negative amounts in Excel? Usually -13.8, or maybe (13.8) or perhaps as (£ 13,8). In most cases, you would expect the Excel data source task to read it as your standard representation for negative numbers, compatible with your regional settings, but in reality, the way the data is formatted in Excel does matter.

The Problem

One would expect a data source to return the data as you see it in your Excel sheet or at least how the data lies underneath, but with the standard components provided by Microsoft, this is often not the case.

The Excel data source task in SSIS sits over the old Jet 4.0 or the most recent ACE OLEDB 12.0 engines, depending on which versions of the Office drivers your environment has installed in it. These drivers have historically had problems in identifying column data types. Alternatively, you can enforce much stricter controls on the Excel data sources provided to you by the business. Seeing this attempted in numerous projects to date, I can guarantee that this is not an easy thing to do.

A 3rd party solution

Excel data sources in SSIS may not have to be so painful after all. Here at Concentra, we have successfully used 3rd-party tools to help us crack this Excel problem. One of the tools we have used is COZYROC.

COZYROC is a company that specialises in developing extended SSIS components, and they have developed one which solves many of the issues mentioned above. As an example, see several ways to read formatted cells:

1. Original values in EXCEL files

Concentra - Original values in Excel file
 

 

 

2. Reading the value with format

Concentra - Reading the value with a format

 

 
3. Reading the value underneath the format

Concentra - Reading the value underneath the format

 

 
4. Reading the formula underneath the values

Concentra - Reading the formula underneath the values

 

5. Reading the first 8 rows

Microsoft SSIS Excel driver reads the first 8 rows in order to decide the column data type, returning Null instead of the value if the type is incompatible, i.e. in the illustration text within a mainly numeric column returned as null. COZYROC reads all data disregarding its type.

Concentra - COZYROC reads all data disregarding its type
 

 

 

Microsoft vs COZYROC

Feature

Microsoft Driver

COZYROC

Connectivity
Partially supported
Dependent on Microsoft Office driver installed (e.g. Jet 4.0)
Available

Independent Connection Manager, equally compatible with 32 and 64 bits systems

Source Selection
Available
Select sheet, static identification of columns, selection of other areas through complex SELECT command
Available
Define source by selecting worksheet, named ranges, region row and column
Reading values
Partially supported
Issues with formatted cells, data types, unexpected results
Available
Read formatted value, raw value or formulas within the cells
Dynamic Manipulation
Available
Through SQL Command manipulation
Available
Dynamically define source, allows metadata-driven processing

 

Conclusion

The standard Excel driver provided by Microsoft works great with well-structured, simple spreadsheets. However, for more complex, heterogeneous data or obtain added functionality such as reading data as extracting the formulas underneath the in Excel or overcome data type issues you may find COZYROC a reliable component to manipulate your data.