I have just read a very intriquing paper called ‘A Common Approach for OLTP and OLAP using an In-memory Column Database’, written by Hasso Plattner.
It’s not a(nother) revolutionairy new technical approach for Data Warehousing a Business Intelligence. It’s just a series of smaller (mostly technical and some are even quite old) innovations that together could lead to a paradigma shift [1] in the area of Data Warehousing.
This paper is focussing on the transactional world, because that’s where the disruption will originate. In short;
- Ever increasing multi-CPU cores
- Growth of main memory
- Column databases for transactions (!)
- Shared Nothing approach
- Solid State Disks (SSD)
- In-memory access to actual data - historic data on slower devices (or not)
- Zero-update strategies in OLTP (recognizing the imporance of history as well as the importance of parallelism)
- Not in the paper; but I see datamodels for newly build OLTP systems increasingly resembling the datamodels of the HUB in the data warehouse architecture.
Modern day Data Warehouses and Business Intelligence architectures incorporates all the above mentioned technologies/methods (well, they should!) and such an architecture therefor compensates for the weaknesses that is intrinsic for OLTP regarding OLAP.
This paper is acknowledging the above technologies/methods and uses them in a OLTP context. The amazing thing is that the OLTP system is getting a lot faster and entail a lot less system maintenance (no indices, no aggregates, materialized views or what so ever, huge compression factors, etc..).
BUT, maybe more interesting. What’s the use of a data warehouse if the OLTP world is adopting these technologies/methods? Well, the case for a data warehouse becomes thinner. At least; data warehouses as we know it; ..a materialized store of (history and actual) data loaded from various sources ...
Simply put; with the above mix of technologies and methods we are able to stop propagating data. Or in other words; we can just leave the data where it initially is created. The data warehouse will then focus on the metadata (becomes hugely important!), business rules part (although advances in this area are also big), the integration part and the fit-to-task part (make it suitable for analytics, reporting, risc management etc..). Oh....data latency is non-existent.
Data architecture - truly independent of it's task (whether it's transactional or informational). Could I live to see that?
I advise people to read the paper from Lyytnen as well [1]:
Architectural innovations stand out as creative acts of adapting and applying latent technologies or potential to previously unarticulated user needs (Abernathy and Clark 1985). They radically deviate from an established trajectory of performance improvement, or redefine what performance means in a given industry (Chistensen and Bower 1996). They are radical (Zaltman et al. 1977) in that they significantly depart from existing alternatives and are shaped by novel, cognitive frames that need to be deployed to make sense of the innovation (Bijker 1987). Consequently, disruptive innovations are truly transformative (Abernathy and Clark 1985). To become widely adopted, disruptive architectural innovations demand provisioning of complementary assets in the form of additional innovations that make the original innovation useful over its diffusion trajectory (Abernathy and Clark 1985;Teece 1986). By doing so, disruptive innovations destroy existing competencies (Schumpeter 1934) and break down existing rules of competition.
I believe for the industry of data warehousing the above might apply. Nowadays, the new technologies and methods mentioned are increasingly used in the Data Warehouse and Business Intelligence scene. When it hits the OLTP scene it will radically change Data Warehousing and Business Intelligence as we know it.
How long will it take? Well, the latter alinea of the above quotation from lyytinen might slow things down considerably:
To become widely adopted, disruptive architectural innovations demand provisioning of complementary assets in the form of additional innovations that make the original innovation useful over its diffusion trajectory (Abernathy and Clark 1985;Teece 1986). By doing so, disruptive innovations destroy existing competencies (Schumpeter 1934) and break down existing rules of competition.
SAP, Oracle and all other vendors of OLTP applications will have some work cut out for them. But I know that these quys are working hard......just listen to the SAP folks on their last summit....
[1] The disruptive nature of information technology Innovations - Lyytinen, Rose, 2003, MISQ
Ronald,
The idea that DWH and operational systems converge isn't new.
While large scale implementations are getting more feasible(small scale was never an issue)
whidespread adoption is still difficult due to complexity (esp. temporal constraints, we e are talking OLTP not DWH!)
My current advice is still KISS for operational system, DV for history. It's easier for most programmers to understand than all-in-one(but all-in-one *is* more interesting).
Posted by: DM_Unseen | Saturday, December 12, 2009 at 02:26 AM
I did not say it was new....Disprutive innovations are typically not new. They just gain recognition in an ever increasing speed and they will be adopted by industries (or combined with other technologies) you might not initially think of. This is how the Internet came about, 100's of small innovative technologies that came together. Was it enough? No....it needed additional innovations and adoption by other businesses - that took several years.
All of the techniques and methods I mentioned are already used in the data warehouse scene.
Just try to imagine the impact it might have if they were used in the OLTP scene. You gotta think out of the box here. With the sum of the above technologies and methods the limitations of OLTP could be tackled. Reading the latest articles on these technologies I even believe the complexity will be much lower.
I also did not say we live in this reality already. Certainly not - so I agree with your final recommendation. Absorption is still very thin of these technologies in OLTP.
Posted by: Ronald Damhof | Saturday, December 12, 2009 at 03:10 AM
Martijn,
Help me out here;
If I understand KISS correctly, the data is very loosely coupled to data access and other layers? Correct?
What I tend to miss (but I have just browsed the websites regarding KISS) is the infrastructural layer. Storage etc..
My post is aiming at an initial disruption in this layer. If Kiss is a loosely coupled architecture (it surely looks that way) then I do not see major problems in adopting the above mentioned technologies in KISS. Utilizing them however might impose (big) challenges. Within the software design you wanna apply (for example) zero-update strategies and actual/historic partitioning.
What's your take?
Posted by: Ronald Damhof | Saturday, December 12, 2009 at 04:16 AM
Ronald,
There are several KISS acrhitectures, but simplicity, loosly coupled and standalone are key elements of most of them, and that is what I meant.
For the data-layer this means it can maintain integrity without constantly relying on a services/app layer. For simple systems this is doable, but doing this for temporal systems isn't because you need to impement the temporal logic. Esp temporal constraints are a real PITA(Not the zero update strategies, they are quite easy to do).
My remark is that for small systems all of this is archievable with current tools/hw, so why don't we see it that much? Only for the big corps the new dbms-iron is making a big difference. Maybe this will only take off when the big boys start playing with these kind of data-architectures?
Posted by: DM_Unseen | Saturday, December 12, 2009 at 06:55 AM
Good post - thx.
Take a look at:http://www.dbms2.com/2009/12/11/ray-wang-on-sap/?utm_campaign=Feed%3A+MonashInformationServices+%28Monash+Research%29
Like I said in my post SAP (that is a big boy) is more than experimenting with it....
Posted by: Ronald Damhof | Saturday, December 12, 2009 at 06:58 AM