Excel data sources: no longer the scourge of SSIS Developers
To find out more about Concentra's bespoke Analytics Services and Solutions see Services.
Anyone who has worked with Excel as a data source or destination in SSIS (SQL Server Integration Services) has probably found him/herself frustrated and fuming at how difficult it can be. Microsoft provided SSIS with a simple way to deal with the task of reading data from or writing data to Excel files. If the data you have to read from the spreadsheet is clearly formatted, then you are probably one of the few lucky people. However, the reality is never normally so straight forward.
The Business user
The fact is that Excel is widely used by our business users on a daily basis and the wide variety of ways to represent data in it has a significant impact on how the standard SSIS Excel data source and destination components will work. For example, how do your users represent negative amounts in Excel? Usually -13.8, or maybe (13.8) or perhaps as (£ 13,8). In most cases, you would expect the Excel data source task to read it as your standard representation for negative numbers, compatible with your regional settings, but in reality, the way the data is formatted in Excel does matter.
One would expect a data source to return the data as you see it in your Excel sheet or at least how the data lies underneath, but with the standard components provided by Microsoft, this is often not the case.
The Excel data source task in SSIS sits over the old Jet 4.0 or the most recent ACE OLEDB 12.0 engines, depending on which versions of the Office drivers your environment has installed in it. These drivers have historically had problems in identifying column data types. Alternatively, you can enforce much stricter controls on the Excel data sources provided to you by the business. Seeing this attempted in numerous projects to date, I can guarantee that this is not an easy thing to do.
A 3rd party solution
Excel data sources in SSIS may not have to be so painful after all. Here at Concentra, we have successfully used 3rd-party tools to help us crack this Excel problem. One of the tools we have used is COZYROC.
COZYROC is a company that specialises in developing extended SSIS components, and they have developed one which solves many of the issues mentioned above. As an example, see several ways to read formatted cells:
3. Reading the value underneath the format
4. Reading the formula underneath the values
5. Reading the first 8 rows
Microsoft SSIS Excel driver reads the first 8 rows in order to decide the column data type, returning Null instead of the value if the type is incompatible, i.e. in the illustration text within a mainly numeric column returned as null. COZYROC reads all data disregarding its type.
Microsoft vs COZYROC
|Connectivity||Dependent on Microsoft Office driver installed (e.g. Jet 4.0)|
Independent Connection Manager, equally compatible with 32 and 64 bits systems
|Source Selection||Select sheet, static identification of columns, selection of other areas through complex SELECT command||Define source by selecting worksheet, named ranges, region row and column|
|Reading values||Issues with formatted cells, data types, unexpected results||Read formatted value, raw value or formulas within the cells|
|Dynamic Manipulation||Through SQL Command manipulation||Dynamically define source, allows metadata-driven processing|
The standard Excel driver provided by Microsoft works great with well-structured, simple spreadsheets. However, for more complex, heterogeneous data or obtain added functionality such as reading data as extracting the formulas underneath the in Excel or overcome data type issues you may find COZYROC a reliable component to manipulate your data.