We do a lot of ad-hoc data investigation and analysis around here, and are always on the lookout for tools that make our lives easier.
While raw SQL and a database remain a top weapon of choice for getting a sense of a new data set, it leaves a bit to be desired in the presentation department. Plus, a lot of people just aren’t comfortable with SQL and we often need to hand off our work to more businessy types that have more starch in their shirts than comp sci classes under their belts.
It’d be great to have something like Excel, but without the limitations. Excel’s fine for simple, small data sets but it:
- only works to a million rows
- isn’t great at selecting subsets of data
- isn’t good at stitching multiple tables together
Wouldn’t it be great to have something like Excel, but it serves more like the front end to multiple databases and DOESN’T cost a bajillion dollars like Tableau?
Well there IS such a beast, and the surprise is, it’s Excel itself! Excel 2010 has a free add-on called PowerPivot that addresses many of our longstanding issues with it as a business intelligence tool.
- Excel with PowerPivot maxes out after 100’s of millions of rows
- Excel with PowerPivot lets you combine multiple sources (database tables, web services, any feed!) in a single model (spreadsheet/pivot table)
But Does It, Really?
We didn’t buy the hype on the scalability, so we tried it for ourselves. Our preliminary test was a 1 million row table, imported from a CSV. PowerPivot made a nice fast pivot table over it, no problem. In addition, workbook on disk was 36MB whereas the raw CSV was 75MB, a 50% reduction in size. We were able to relate in a few other small tables to the pivot table as well with no loss in speed.
We were deeply impressed and will be shifting more of our ad-hoc analysis over to Excel.
The only downside? It’s only for Windows. No PowerPivot for Excel Mac yet. This test was run on OS X, within a 64 bit Windows 7 Parallels VM given 4GB RAM. Office itself was the 32 bit version.