Is Real-Time BI necessarily only for large projects?
- Published by Michael Harand
7 000 customers, 700 products, selling from coast to coast … could this be your business?
And are you also analysing all your sales KPIs in real-time and if so, can you be agile?
Yes, we can! Can we? In 3 days?
A food processing company asked us to prove it: to have over 12 million rows of data with 14 KPIs available for analysis in real time. The goal was to deliver a tool that enabled our client to make a point and push for innovation in the BI department. The problem is that you can’t really call it analysis if your ad-hoc queries take several minutes or even hours to return results.
Deliverable was a proof of concept, so we were basically free to choose the technology, but part of the deal was to use a standard server machine and not some unaffordable monster. We went for a virtual machine (2Cores, 8GB Ram), mainly because it was ready to go. Chances are the machine in front of you is faster…
At agileDSS we believe in agile methodology. With an uncomfortably small time box of just 3 days (including kickoff, data import and presentation) we were under some kind of pressure right from the start. However, following agile principles, a project with such a tight timeline doesn’t differ much from any other one: you simply go as far as possible within your given time-box. Agile methodology is especially suitable if a project is complex and its outcome is not clear right from the start. In other words, it’s the methodology of choice for a mandate like this.
File > Save. It’s all in your hands!
We started off with a couple of flat files that didn’t leave much room on a 2GB USB stick. We chose classic dimensional modeling onto which we wanted to plug Microsoft PowerPivot (on Excel 2010). All in all, our approach was fairly conservative, we’re not even talking about a column-based DBMS (of course we couldn’t help giving it a test-drive on SAP HANA later on!) and we were really curious to see whether we would end up with a 2GB Excel file…
Thanks to the simple dimensional data model, import into PowerPivot was extremely simple. However, transferring the 12 million records over a network takes a bit of time, but 3 to 4 minutes on a 100Mbps network should be enough. Add 2 more minutes to type in some calculated measures and you’re ready to go in literally 5 minutes.
So how does it feel to cut through a big ball of 12 million rows? Thrillingly unthrilling:
- Drag in the customer name field and ... here’s your list.
- Drag in the month as columns … here’s your crosstab, already far too big for a printout!
- Drag in the sales figures … there we go, a screen full of numbers!
- Zoom out by dropping the lowest customer granularity and exchanging it for a less detailed hierarchy level … done in a second.
- Insert a PivotChart and see the trends … there we go!
In short, it feels just like any other PivotTable, it’s just that it’s the whole company’s data that you can conveniently drill down to, so one has to be careful what to select: a bar chart with 7 000 data series doesn’t only look ugly, it also takes some time to render ;)
Yes … one more thing: File > Save … 41MB! A 2GB Data Warehouse in one 41 MB file! That’s agile!
Pros and Cons and Pros
Analysing data usually starts where classical reporting stops; most reports show the known and don’t help you discovering new trends. Moreover, if you don’t have all your data available, you can’t really investigate the unknown. That doesn’t mean that pre-designed reports (and other, usually highly aggregated forms of reporting such as dashboards) are useless, it’s just probably not the right thing for an analyst. For them, there is no real alternative to having it all as flexible as possible.
However, this convenience comes at a cost. First of all, you will quickly have to deal with many versions of the truth. Second, data quality management and governance will become an issue. If a calculation of a metric changes, how would you ensure everyone sees the same out there? Third, it’s small enough to fit on any device and thus also to get lost anywhere you don’t want it to. And last but not least, there has to be a process for integrating all the discoveries made as they will have to be integrated into an enterprise-wide system, and if you’re not just running SQLServers and Sharepoints, there are limits.
As with any other tool, there are pros and cons but the convenience of analysing all of your data and the leverage of your analysis is certainly a plus.
If we had more time...
In three days, you can’t do much optimization. And if the rows go into the millions, carefully choosing data types, indexes and constraints are just the start. Column-based storage and in-memory technologies would enable us to leverage compression on a backend-level and further lift performance dramatically without touching the basic data model. However, we proved that it doesn’t necessarily take months or years to get agile in analysing all of your data!
Interested in receiving the latest news about analytics and Big Data? Provide us with your email to receive the blog articles of our experts.