A while ago I wrote a blog about the Data Quadrant Model I developed. I use this model in my consultancy and speaking engagements. Increasingly I receive great feedback from organisations that are applying it, which is great.
So far, quadrant III has received little mention, even though it is incredibly important. It is the quadrant of data sources which are not under governance, like an ad hoc download which you obtain from an open data provider, a list in Excel that you want to use, or a set of verification data which you have received on a CD. But you might also use it to dump huge amounts of data that you want to explore or experiment on.
Quadrant III exists (implicitly) in every organisation. Only think of the huge amounts of Excel sheets or MS Access databases currently stored on your fileservers.
Entropy in quadrant III is at its height. There is data chaos. As Stephen Brobst (chief technology officer of Teradata) so eloquently put it when he described a data lake - slightly paraphrased; 'the only people that can use this data are the people that have put it in'.
And yes - the popular 'data lake' is an artefact that is situated in quadrant III.
Governance, control, checks and balances in quadrant III are virtually non-existent. People using this data (the peeps in quadrant IV) need to discover the structure/schema, make inferences and be extremely careful in drawing conclusions. The field of statistics is important in this regard.
Let me be absolutely clear; I am not against a data lake. I am just against the notion that a data lake is viable as a quadrant I artefact (and that is how the industry is trying to sell it). As a quadrant III artefact the notion of a data lake could be valuable. Is is a cost-effective artefact that has the potential to drive innovation.
Although I described quadrant III as data chaos which is unmanaged, ungoverned, etc... It is possible to manage and govern the infrastructure. Current technology is promising in offering a managed and governed infrastructure on top of which one can innovate with data in an unmanaged and ungoverned manner.
The cool thing of this technology is that quadrant III is being made explicit, from the shadows into the light. In quadrant III an organisation might want to offer its users an infrastructure-as-a-service.
Oh, one more thing; if you consciously architect/configure a quadrant III you are engaged with innovation. If you do not, you are just messing about. ;-)
The big danger of quadrant III (and quadrant IV) is that promising discoveries are never productised (I unfortunately see that a lot!). They remain to be high potentials that never ever see the light of day (promotion to quadrant I and II). The high level of entropy enforces this risks evenmore, since discoveries are by definition often related to the brilliance of individuals. Governing mechanisms are need to counter this risk, otherwise you are just playing with data.....a waste I'd say.
I have now discussed three quadrants (I, III and IV) in terms of various levels of order. I have one quadrant to go. Before I move on, let me again stress the following:
- None of the four quadrants is more desirable that any other; there are no implied value axes. All quadrants are needed in every organisation that wants to be more data-centric, data-driven, adaptable to changes, innovative, etc..
- The Data Quadrant Model is NOT a blueprint implementation model. It can however be used as a guide in how to organise, how to manage, how to govern, what technology choices to make, what people to hire, etc..