The last three days I was visiting the Gartner conference in London.
It has been a mixed experience. In terms of data strategy, organization and culture I can honestly say that some sessions were really inspiring. I caught myself re-thinking my own convictions and extending my scope to areas that matter as well, and that is always a good thing. I liked the keynote with the four central themes; Diversity, Trust, Complexity and (data) Literacy. These themes abstract – as it should – from technology and introduce explicitly the human factor and aspects of uncertainty. In data, I think Gartner kinda nailed it with these themes. I recognized them all in my daily work as a Data Architect.
The second keynote of David Rowan – founding editor-in-chief of WIRED - was simply brilliant and scary at the same time. His central theme was that we stand on the tippingpoint of finally having the ethical discussion. What an excellent timing, considering the facebook privacy scandal.
I also enjoyed sessions of Alan Duncan about the Chief Data Officer and Nick Heudecker. The latter analyst I followed for some time now and he certainly didn’t let me down. Excellent, down to earth sessions on blockchain and datalakes.
What I also liked were the datascience platform vendors, really cool stuff that let you leverage open source interpreted languages (R, Python), solves a great deal of governance pain (e.g. versioning, deployment, collaboration, protection, etc..), give you options as to where the processing is done, abstract (drag-and-drop interfaces) – if you desire – from the code, etc..
Gartner is also very aware of the operationalization issue with datascience. In their terms, going from mode 2 to mode 1. In our terms, going from
quadrant III/IV to quadrant I/II (see figure). In my opinion this is the holy grail in data, leveraging datascience brilliance for the whole organization and serving the bottom line.
To be honest, this holy grail is not solved, not by Gartner and not by me. On twitter some excellent discussions unfolded in this regard.
But now the not-so-good stuff. In terms of data architecture, Gartner is stuck. They confine data very narrow, as in data-for-analytics. And therein lies a fundamental problem. We have to move on and view the data holistically, from conception to retention where data is used for many use cases, not only analytics and subject to many constraints (like privacy, transparancy, etc..). Furthermore, lots (the majority) of problems (meaning, (time) consistency, validation, integration, transparency, etc..) in data are created upstream, but Gartner seems to have given up and decided that we need to deal with the problems downstream. A first example would be their logical datawarehouse concept. Now, I have to be honest, I do not like this concept, at all. I think it is deeply flawed. Why?
First, think of this logical datawarehouse as a beautiful patio, green grass, nice swimming pool, birds are singing. Underground however, right beneath the surface, the ground is highly contaminated. Would you want your children to play in this patio?
Second, Gartner seems to be stuck in metanarratives like a datalake, datawarehouse and even a datahub. The latter one is completely vague to me, it sounds like a datalake-warehouse-wannabe. These metanarratives are still very much technological bound, ambiguously defined and calling them logical is just a façade. Lets abstract from these technical bounded concepts and talk about an organisational capability to systematically ingest, create, connect, validate, integrate and disseminate data taking into account contextual aspects like transparancy, protection, consistency, etc..
Third, Gartner seems to be oblivious of vertical data architecture. This is the architecture of data in rest, as opposed to the horizontal data architecture, where data is in flow. Gartner seems to be hugely biased towards flow/logistics, getting data from a to b. A vertical data architecture would start (depending on the concerns you want to address) with natural language, translating it to ontologies, fact based models and finally truly logical models that drive the technical models (often automated), persist data only once and virtualize the logical domain model. Such an architecture would result in logical consistent data, not a logical data warehouse!
Fourth, the logical datawarehouse concept seems to be very much associated to datavirtualisation technology, which makes it a lot less logical.
A second, after the fact solution, Gartner seems to be very fond of, are Masterdata Management (MDM) solutions. In my view MDM is a legacy problem and stems from a lack of holistic (vertical) data-architecture. A solid vertical data architecture solves many masterdata problems at its core and prevents the constant remediating of symptoms.
A third after the fact solution is this infatuation with the availability of data. Lets get all the data!! Why? Because we (technically) can….There seems to be no conscious decision between availability and consistency. Two concerns that lie at the heart of data and where business needs to weigh in, not the technology. Furthermore, with GDPR coming, it deeply violates the data minimalisation principle.
Getting to the root of things; it is like modern medicine. We keep on curing the symptoms with nasty side effects, but we never engage in prevention. It is vital in my opinion that an holistic data-strategy and -architecture is adopted! It is like the brilliant closing keynote of Tuesday regarding sleep deprivation by Professor Matt Walker. If you sleep enough you will age healthier, the scientific research is overwhelming. If you don’t, your chances of getting sick, cancer, heart diseases or psychological disorders are increasing fast! We are now hooked to hugely expensive medicines that cure or alleviate only the symptoms…..
Why not try to sleep a bit longer…..