note: this blog post was first published on my B-eye-network blog.
It is only by means of good and respectfull discussion that knowledge
and insight will evolve. This post should be regarded as such.
This post is a second reaction to the first article in a series of three which were written by a highly respectfull thoughtleader in the field and publisher on the B-Eye-Network; Rick van der Lans. The papers are titled 'The Flaws of the Classic Data Warehouse Architecture'.
This blog post is a reaction to the first part. It deals with the flaws of the classic data warehouse architecture (CDWA).
Rick signals five flaws which will lead in article two and three to a new architecture. This post is addressing the second flaw.
- My reaction to flaw #1 can be read here.
Flaw 2 according to Rick
The
CDWA stores a lot of redundant data. The more redundant the data, the
less flexible the architecture is. We could simplify our data warehouse
architectures considerably by getting rid of most of the redundant
data. Hopefuly, the new database technology on the market, such as data
warehouse appliances and column-based database technologies, will
decrease the need to store so much redundant data. Rick commented on
this flaw in his closing keynote statement on a BI event we had last
week, stating basically that the DWH professional did an extremely
lousy job last decades in building these redundancy monsters. Like in
his article he strengthened this argument by research done by Nigel
Pendse claiming that the average BI application only needed a fraction
of the stored (redundant) data.
My reaction to flaw 2
First of all, I agree that new technologies can limit the volume of redundant data considerably.
But
to say that in the last decades the data warehouse professional did an
etremely lousy job because of the huge redundancy they created in their
data warehouses...well, that's just plain stupid and for the people
that are applauding this statement I would like to say; 'I bet you
never actually build a data warehouse'.
BI populism.....thats what it is.
As
for the flexibility argument; more redundant data kills flexibility.
Hmm...it's a bit of a bs-argument. Because flexibility is not only
affected by redundant data. If I had build my data warehouses in the
last decades without redundant data I would have ended up with huge
complex transformation rules and a big strain on processing capacity.
Both issues woud have killed the flexibility big time and I am leaving
aside the degradation of performance, degradation in ease of use,
degradation in maintainability and the degradation of the testability
of the system. But I agree - I would not have redundant data...I would
not have any quality of service either....but who cares.
BI populism.....thats what it is.
But
is the CDWA architecture flawed by this redundancy problem? I do not
think so at all. We would still need a datastore of some kind (Rick
seems to acknowledge that by advocating the use of appliances), we
would still have several layers after this datastore, preparing the
data for several different functionalities (reporting, mining, advanced
analytics, datasharing to third parties, etc.). Let's take the datamart
layer, will it dissapear? I don't think so. The question is whether it
needs to be materialized. And that's where new technology will be
extremely valuable. It seems that Rick is translating the word
'Architecture' with 'Technical Architectue' as a 1:1 relationship.
The
hub-spoke architecture of the CDWA model is still extremely valid. Off
course, technology within this architecture will evolve and will enable
us to deliver an even better quality of service.