The point am trying to make in this blogpost is that the classic distinction between operational data environment and informational data environment (where the data warehouse resides) is fading, thank god! Data is a company-wide concern...but guts is needed to actually achieve it.
For me to be able to make this point, I need to convey 5 important statements first,clarifying where I am coming from a bit more:
- I am going to elaborate a bit more on the Data Quadrant Model I wrote about some time ago. Remember quadrant I - the facts? A quadrant where data is factually registered conforming to several non-functionals (temporal, historical, standardized, etc..).
- It is important to emphasize that the data models in quadrant I are driven by centrally managed logical data models (preferably these are derived from a conceptual- or information model) and governed by means of Data Delivery Agreements. Physical models are preferably derived as much as we possibly can from the logical model. This is not trivial, but decades of science and more and more practitioners are actually doing it, (im)proving it.
- A data warehouse is an architectural construct and by such should be separated from its implementation or its technology. This is a distinction I rarely see, but is vital for the point I am trying to make.
- Data warehouses are traditionally fed with data coming from various sources, either internal systems (batch, services, whatever) or external systems.
- In the Netherlands (my work area) we evolved data warehouses the last decade by making a very sharp distinction between facts (quadrant I) and context (quadrant II). This distinction is being made on all levels; architecture, data modeling, technology, management, organization, people, processes, etc..
If we take a closer look to the fact-part (first quadrant) of the data warehouse, something interesting is going on. Although originally setup to accommodate feeds from various sources, it slowly evolved into an integrated data environment.
How so? Suppose I wanna build me a new order-intake system. We make use cases, do the sprints and just 'build me an app'. The datamodel is often something that 'organically' grows with the app.....brrrr...Data is a company-wide concern not an app-concern.
Classically the data warehouse guys came only into play the moment the app is finished and people needed to do some sophisticated analytics or just some management reporting. A feed was constructed and data was to be deployed to the data warehouse. Slowly and gradually this has changed in my current work environment using the Data Quadrant Model and some very smart co-workers. The app guys are still doing the use cases, but all the modeling is now done by the data modelers of quadrant I.
These data modelers first construct a logical model that the app guys can use as a stub to build upon all their 'app-stuff'. How can they? We do not want these app-guys to be concerned with stuff like auditability, traceability, temporality, extensibility, standardization of data definitions and types, etc.. Remember in quadrant I we have a 'fundamentalistic' opinion on how the data is to be managed.
Now, the data modelers construct the physical model by using the full force of temporal- fact based modeling and hide the complexity of the physical model by virtualizing/decoupling the data back to the original logical data model. The app is build and users can now insert, update, delete data.
The data is decoupled from the apps, other apps might be interested.....
The data is centrally managed and governed on at least the logical level....
The non-functionals regarding data are being guarded fiercely....
Interestingly, we still make a data delivery agreement. In this case with the functional owner of the system and the governor of quadrant I. We do a handshake on the logical data model, the validations we do before the data is registered and several other stuff we expect from each other regarding the data and its quality.
The result: data that orginally came from an operational enviroment and was fed to a data warehouse, is now directly registered in the fact-part (quadrant I) of the data warehouse. This part of the data warehouse has now faded into an operational environment with the non-functionals we learned to implement in the first quadrant. The first quadrant governers are now in charge of a company wide integrated data-environment.....
What about the data warehouse? Quadrant II and IV (stuff like reporting, dasbhboarding, analytics, visualisation) are still fed from the first quadrant and 100% demand oriented. Since we model and store all data in an integrated manner adhering to important non-functionals we are able to serve these quadrants (II en IV) cheaper, faster and with a (to some degree) guaranteed level of quality.
Two more things;
- What I do not say is that data is stored in one location. That can and probably will be hugely federated within the company or outside the company (e.g cloud). What I do say is that the logical model AND the metadata are crucial to govern in an integral and central manner.
- With the above in mind I strongly urge senior management to take the next step as well. Organize the data-competency; the people, governance (e.g. protection), architecture, design and implementation of the data needs firm management support. Centralize it....pls....A centralized shared data center preferably managed directly by a true CIO (Information officer, not technology officer) or even better, a Chief Data Officer...With means in terms of expertise and budgets.
If there is an organization outthere with the pain, the guts and the patience to really go for it, to assign and mandate a CDO; call me, I am game.