Do I Need a Data Warehouse?
Usually when we are talking with new clients we ask where the data is coming from that is going to be used in a planning or reporting solution. Invariably it comes from the client’s ERP and sometimes from their CRM or Payroll systems. We then need to determine if we can just load the data directly from the source or if we need to stage the data before using it in a BI or planning tool.
The key things we are looking for on that are:
- Do we need to merge data from multiple sources?
- What sort of transformations do we need to do to the data before loading into BI or planning?
- What impact could running reports have on the transaction system?
- Is the source data well enough structured to allow fast querying?
- Do we need to know a business structure as it was historically?
This is when we have data from multiple systems that needs to appear to the end user as it was coming from a single source. For example you might have a CRM with call and visit histories and an ERP with order histories. If I’m a sales manager I don’t really care that it comes from separate systems – what I want to see is how many calls and visits each member of my team made and what the impact on sales was from that interaction.
The problem might be that my ERP has one way of identifying a customer and the CRM another. Hopefully (but not always) the customer reference will be common, or at least held in one system to look up the other.
Either way, if I am running a report that is directly querying both systems, there needs to be some way of joining that data to produce my report. Then, even if there is, will it perform fast enough to execute the query on both systems, merge the data in real time and return the report. Maybe, but more than likely, probably not and if not, then chances are that I need to stage the data.
Most people who have got this far in reading something about data warehousing will have heard of the term ETL – or Extract, Transform and Load – or even ELT – Extract, Load and Transform. The Extract and Load parts are pretty self explanatory, however the Transform part often leaves people scratching their heads. Put simply it is like this. If one system has 0’s and 1’s for males and females and another has M’s and F’s, then you can’t just merge the data or you’ll end up with four genders. You will need to do some transformation of one set of data to standardise it. So that’s pretty easy for gender.
What about when you need to transform say distance in millimetres into regular bands of, say, 250mm each. Yes, you could do this in a query on the fly, but it’s going to slow down your report. Better practice might be to stage that and do the transformation in the staged data and then use it pre-calculated in the report.
Transaction System Impact
Most people don’t think of this until it is pointed out to them. If you have someone who can create a simple sales report, let’s say sales by customer for the last 12 months. Let’s say you have 1000 customers and 1000 products. That is potentially 12 million rows of data (1000 x 1000 x 12). Then we might want information about the customer – who they are, what channel or state they are in and then about products – the product group, business unit, division, the sales rep and team. Each of this information is usually stored in separate tables. so our simple sales report now has 12 million possible records and joins to another 10 tables.
That report will have an impact on the transaction system if allowed to run – slowing down all the customer service people entering orders or the accountants processing month or year end journals. Now I know we can put governors in place to stop that sort of query from every happening. But. You get the idea.
Source Data Structure
In the 12 million row example above we have assumed that all those joins we talked about actually work properly. So that when we have a product code on a transaction that there actually is a product with that code in the product table.
What if there is not? What if someone has deleted a product or customer but left the transactions there. Well, unless you allow for it, those transactions will be excluded from your report. Once again, we can manage that within the query, however it will again slow down the report.
Historic Business Structure
Do you need to know sales by customer for six months ago based on who the rep is now or do you need to know it based on who the rep was then? This is a classic situation where if you tried to do this using a traditional query you’d fail. Why? Well unless you are storing the rep structure as it was for each period historically, then the information just is not available in the source. The only way to resolve this is to have it stored in the datawarehouse using what is called a “slowly changing dimension” – slowly, because it changes at a period you define and it reflects the changes in the dimension over time.
Do We Always Recommend a Datawarehouse?
Hell no! We have many clients where we have the BI or Planning tool directly loading from the source working brilliantly. This is especially so when we have a cube that is updated overnight so the hit on the source system is removed. That being said, a cube – TM1 or PowerPlay or MSAS – is not a datawarehouse as it stores aggregated data and even if the cube was at a transactional level, will usually will not contain text at that level.
Need help with TM1?
We're here for you