Last wednesday I attended TDWI in Munchen, did my rinse and repeat talk on the dataquadrants. Frankly, I am a bit done with these conferences. We have managed to create a whole industry that is fighting symptoms instead of the disease.
What do I mean? Let me generalize a bit, lets assume we have an average public organization, consisting of dozens or even hundreds of administrative systems that need to execute laws as our democracy has ordaned, combined with a huge influx of data coming in from other entities, public and private.
Lets face it, these systems have nearly all been developed with either an application or a process centric perspective; it needed to support some functional requirement or a business process. Nothing wrong with that, true?
Well, almost.....
The data of these systems have become a byproduct, a slave of the application/process. In architectural terms; in designing the data-part of administrative systems the concerns of the application/process trumps other concerns big time.
So, all the concerns that require a perspective (for our datanerds; a different reality/context) that superseded the application/process suddenly became very hard to execute. Even a different perspective of the data within the boundaries of the applicaties/process are a challenge. How did we decide to remedy that?
Lets extract the data, copy it, integrate it (often not possible btw), execute some logic (multiply the same logic over and over again), copy it again and send it somewhere (where it gets copied again and the whole shebang starts again).......oh, and lets do that a 1000 times.....oh, crap, do we need to maintain this logistic nightmare? Oh crap, the data does not really represent the actual administrative truth does it? Oh crap, the data is now proliferated (ungoverned) all over the place. Oh crap, we need to implement a change, what is the downstream impact? Oh crap, did the data-consumption-need of strategic initiatives like datascience, artificial intelligence and machine learning even deepened this logistic nightmare? Ah wait, we can buy technology and hire loads of consultants to help manage this horror.
Our data has fallen victom to the bias we have towards logistic solution designs often based in a technological fetish. If we have a dataproblem; lets copy, integrate, transform and disseminate and buy some nifty technology that can help us with that. It has become a conditioning.......
The problem we face is deadly; organizations are sick and instead of treating the illness we keep on fighting the symptoms and the symptoms of the symptoms. This is a yin and yang problem, not fighting the illness will make the patient sicker and sicker and will require evermore expensive medicins while the patient is of course degrading and can not perform her responsibilities anymore.
And the industry, software, technology, research institutes, system integrators, big consultancy firms are not helping either. An analogy can be made with the farmaceutical industry. They have huge marketing budgets that push the medicines for the symptoms, offering a quick fix (“we can help”), getting the patient hooked and then raise the prices. Even worse, they invent medicines for the symtoms of their medicines. Prevention campaigns do not have a chance.
Unfortunately, our higher education decided to ride the waves of opportunism as well, all the courses regarding rigor and relevance in the science of data and knowledge engineering are slowly making way for teaching tech-related ‘fun’ stuff.
The illness has to be cut out from the body, it will hurt and it will require, above all, executive discipline to resist the quick fix and the marketing pressure from the industry:
1. Data-at-rest; separate the “Know” and the “Flow” in data, design/model your domain in terms of data and rules first and separate that from the actual execution (function, application, process). The boundaries of these domains in government should be defined by the laws it needs to execute.
2. Data-at-flow; stop copying data unless you have a technical reason for it. More popular; from extract to connect. As an heuristic; copies can be thrown away without hurting processes.
My take away; addressing the fundamental issues in data requires us to move away from the symptoms like datawarehouses, datalakes or any other (often drenched in) technology-centric solution. Lets shift our attention towards the time and place a data-“fact” is born, our administrative sources, there lays our challenge.
I am not proposing something new or innovative. ‘ Common Ground’, a strategic initiative in the Netherlands, research like ‘Agile Legislation’ (M.H.A.F Lokin) or ‘Regie op gegevens’ from the Dutch Government shows the broader support.
Spot on Ronald...in order for this to succeed we will need 1) strong leadership (no not management) that withstands the political storms - so often infused by headlines from the media. That same leadership should also commit to long-term projects to realize the above and 2) our country a.k.a. its citizens and companies should accept the fact that this is 'open-heart' surgery to the extreme.. and therefor accept that the government will fail at times, but please give them the chance to learn from those failures.
I also believe that the above is a prerequisite to 'enable' the use of machine learning and (prescriptive) analytics in such a way that it integrates directly with the primary processes of an organisation - that is where the true added value is.
+1 like from me ;-)
Posted by: Roy Maassen | Friday, June 28, 2019 at 05:05 AM
That's the direction, and here is the path: a conceptual distinction between data (for facts), information (for data set into categories), and knowledge (for information tied to purposes).
https://caminao.blog/2019/03/04/focus-data-vs-information/
Posted by: Caminao | Friday, June 28, 2019 at 10:08 PM
Indeed, Ronald, I agree - we have to “move away from the symptoms like datawarehouses, datalakes or any other (often drenched in) technology-centric solution.” We have to shift our attention to something else. Something deeper. Something even deeper than “the time and place a data-“fact” is born”. We have to go to the depths of information: its immateriality.
The industry won’t help. They’re too much in love with their revenue model. They’d rather build a system to fight data-inconsistency than to really solve the issue. Higher education won’t help either. They depend too much on the idiosyncrasies of the industry.
We’re kind of stuck in our materialistic thinking. We’re so terribly blinded in our thinking that we talk seriously about, for example, the Internet of Things - even when it’s all about information. And information is not ‘thingy’, is not material. Information is … immaterial. But, without a moment’s thought, we ignore that and simply use the whole bunch of laws, rules etc. that apply for stuff to non-stuff, i.e. to information.
And that caused and still causes a lot of damage. Damage on which a golden revenue model is built. And anyone who threatens to harm/kill the goose with this golden eggs is, of course, met with fitting hostility.
What’s it all about with immaterial information? It’s all about meaning. Meaning attached to it by individuals - temporally as well as situationally. Meaning of information is directed by the context in which that information manifests itself and cannot be predefined. Modelling for contextual-and-temporal meaning of information differs widely from our current absolutistic way of modelling. “[T]here lays our challenge.”
Posted by: Jan van Til | Wednesday, July 03, 2019 at 01:04 AM