I have written numerous times about efficient and effective ways of deploying data in organizations. How to architect, design, execute, govern and manage data in an organization is hard and requires discipline, stamina and courage by all those involved. The challenge increases exponentially when scale and/or complexity increases.
On average a lot of people are involved and something of a common understanding is vital and unfortunately often lacking. As a consultant it is a huge challenge getting management and execution both on the same page, respecting the differences in responsibilities and the level of expertise regarding the field we are operating in.
An abstraction functions as a means of communication across the organization. An abstraction to which the majority can relate to and that can be used to manage, architect, design, execute, govern and manage the field that is at stake.
For data deployment I came up with the so-called ‘Data Quadrant model’ (DQM).
It starts with a basic assumption that data deployment starts with raw materials and ends up in some sort of product. And in the process of getting raw materials to end products, logistics and manufacturing is required.
It starts with the basic assumption that reliability and flexibility are both wanted, but are mutual exclusive.
It starts with the basic notion that that data needs to be separated into facts and context.
There is only a single version of facts
There are many truths out there
Push systems are to be standardized as much as possible. Only if we standardize we can automate. Holding off any product-specific demand features is vital within push systems. However, the table turns completely when we enter the pull-systems domain.
The push pull point is - in architectural terms - a decoupling point, separating fundamentally different concerns, but it is not the only one. Using this high level of abstraction another decoupling is crucial; development style. Missing this one often results in pain, misery, demotivation, failed projects, dissatisfied users, budget overruns, IT focus, etc…
Basically I tend to distinguish between a systematic and opportunistic development style.
This separation between systematic and opportunistic development style respects both the professionalism of the data scientists and the IT engineers. It respects both the organizational ambitions as the more local ambitions.
Quadrant IV is especially worth some attention. Typical use cases for this quadrant:
1) Reducing uncertainty/getting requirements: ‘I do not know what I need, what I want’, ‘I do not know the quality of the data yet’. Experiment a lot, change fast, instant feedback.
Quadrant IV is especially worth some attention.
2) Reducing lead times: Going from quadrant I to II might take to long. There needs to be an environment where products can be made a.s.a.p.
3) Stimulating innovation: Discover, experiment, throw away, testing hypothesis, etc.. An environment that is as friendly as possible for the creative mind.
In quadrant IV we are architecting some degree of chaos…which is fine.
Finally, it needs to be understood that there is no one way the data flows through the quadrants. Four process variants are depicted in the small picture on the right, but more can be thought of.For example, the quadrant can be used to architect a phased/managed approach towards migration of legacy data deployment systems (which I know you have!!).
The point is that there are several ways of manufacturing information products and you might wanna consider them all. Much to often I only see 1 way (or the highway) - USSR-Politburo-Fundamentalism - kind of deployment option.
I can write a book about the four quadrants and the operationalization of the decouple points. But this is a blog post and I really need to stop now.
My point is that the four quadrants are different in almost every way in terms of architecture, design, execution, governance and (project) management:
- qI and qII need a more centralized governance style, qIII and qIV require a more decentralized governance style
- qII might be more suitable for agile deployments, while qI might be more suitable for good old waterfall;
- In terms of data modeling, q1 needs a more temporal style, q2 might be more prone to dimensional;
- Both qI and qII need to be aligned with Enterprise Data Models, q3 and q4 do not;
- In terms of tooling, q1 needs more discover/analytic functionality, while q2 is more prone to reporting, dashboards, etc..;
- Different kinds of databases might be required for the quadrants and even within the quadrants;
- Ownership of definitions is in q1 more or less driven by source, while ownership of definitions in q2 is more or less driven by the product owner;
- qII publishes information products which are certified, trusted, centrally maintained. qIV publishes information products where the creator bares all responsibilities regarding trustworthiness, change management, etc..
- q1 the datamodel and the data logistics need to be standardized and automated as much as possible, there is more freedom (based on the requirements) in q2. In q4 is completely free how products are modeled.
- q1 might be the domain of the IT department or even prone to be outsourced eventually. Other domains will probably be situated close to the requirements in an organization;
- q3 is a suitable quadrant to situate sources which are hugely ad-hoc or use new innovative technologies;
- Every quadrant requires a specific education and competencies profile;