29 May 2013

Excel data sources: no longer the scourge of SSIS Developers

Daniel Perianez

The Scenario

Anyone who has worked with Excel as a data source or destination in SSIS (SQL Server Integration Services) has probably found him/herself frustrated and fuming at how difficult it can be. Microsoft provided SSIS with a simple way to deal with the task of reading data from or writing data to Excel files. If the data you have to read from the spreadsheet is clearly formatted, then you are probably one of the few lucky people. However, the reality is never normally so straight forward.

The Business user

The fact is that Excel is widely used by our business users on a daily basis and the wide variety of ways to represent data in it has a significant impact on how the standard SSIS Excel data source and destination components will work. For example, how do your users represent negative amounts in Excel? Usually -13.8, or maybe (13.8) or perhaps as (£ 13,8). In most cases, you would expect the Excel data source task to read it as your standard representation for negative numbers, compatible with your regional settings, but in reality, the way the data is formatted in Excel does matter.

The Problem

One would expect a data source to return the data as you see it in your Excel sheet or at least how the data lies underneath, but with the standard components provided by Microsoft, this is often not the case.

The Excel data source task in SSIS sits over the old Jet 4.0 or the most recent ACE OLEDB 12.0 engines, depending on which versions of the Office drivers your environment has installed in it. These drivers have historically had problems in identifying column data types. Our previous blog gave details of how to fix this issue in each SSIS package that has an Excel data source. Alternatively, you can enforce much stricter controls on the Excel data sources provided to you by the business. Seeing this attempted in numerous projects to date, I can guarantee that this is not an easy thing to do.

A 3rd party solution

Excel data sources in SSIS may not have to be so painful after all. Here at Concentra, we have successfully used 3rd-party tools to help us crack this Excel problem. One of the tools we have used is COZYROC.

COZYROC is a company that specialises in developing extended SSIS components, and they have developed one which solves many of the issues mentioned above. As an example, see several ways to read formatted cells:

1. Original values in EXCEL files

Concentra - Original values in Excel file 

 

2. Reading the value with format

Concentra - Reading the value with a format

 
3. Reading the value underneath the format

Concentra - Reading the value underneath the format

 
4. Reading the formula underneath the values

Concentra - Reading the formula underneath the values


5. Reading the first 8 rows

Microsoft SSIS Excel driver reads the first 8 rows in order to decide the column data type, returning Null instead of the value if the type is incompatible, i.e. in the illustration text within a mainly numeric column returned as null. COZYROC reads all data disregarding its type.

Concentra - COZYROC reads all data disregarding its type 

 

 

Microsoft vs COZYROC

Feature

Microsoft Driver

COZYROC

ConnectivityPartially supportedDependent on Microsoft Office driver installed (e.g. Jet 4.0)

AvailableIndependent Connection Manager, equally compatible with 32 and 64 bits systems

Source SelectionAvailableSelect sheet, static identification of columns, selection of other areas through complex SELECT commandAvailableDefine source by selecting worksheet, named ranges, region row and column
Reading valuesPartially supportedIssues with formatted cells, data types, unexpected resultsAvailableRead formatted value, raw value or formulas within the cells
Dynamic ManipulationAvailableThrough SQL Command manipulationAvailableDynamically define source, allows metadata-driven processing

 

Conclusion

The standard Excel driver provided by Microsoft works great with well-structured, simple spreadsheets. However, for more complex, heterogeneous data or obtain added functionality such as reading data as extracting the formulas underneath the in Excel or overcome data type issues you may find COZYROC a reliable component to manipulate your data.

References

Daniel Perianez

About the author

Senior Business Intelligence developer at Concentra. With more than 10 years of experience developing software in a wide range of Microsoft technologies, my main focus is in data warehousing and, more recently, stream data processing applied to the field of Business Intelligence.

Subscribe to our newsletter

Get in touch
Get the edge

Go to webform

Subscribe to our newsletter

Get in touch
Get the edge

Go to webform