Lately I have been involved in many discussions- online as well as off line - regarding Data Vault architectures. It seems that three ‘schools of thought’ are developing in the Netherlands. Separating and recognizing these three schools is important.
For the folks not familiar with Data Vault – google it, read one of my articles (e.g Download Damhof_DBM0508_ENG) or go to http://danlinstedt.com/ .
- Raw school; this school of thought makes a distinction between a raw data vault and a "business" data vault. The raw part is a copy of the source where entities are basically split up in a Hub, Satellite or Link construct. The structure is derived (nearly 100%) from the source structure and no integration is yet achieved. The rules for determining the target types are pretty fixed and can therefor be automated. Of course the associate loading scripts can be generated. This is a pretty straightforward structure transformation. One can argue whether or not this is some kind of persistant or historical staging environment. The "business" data vault (I do not like this terminology - but to distinguish it from its raw brother I use it here) is where the business rules are executed, where integration is achieved and where the result is stored. This part of the data logistic is typically a value-transformation and not so much a structure transformation.This school advocates that making datamarts is again a structural transformation (which can be automated, business rules ae already executed) to star- or snowflakes.
- Classic school; this school of thought advocates a strategy where the Data Vault model is designed according to a mix of the conceptual data model of an organization and the structure of its business transaction systems.Going into te Data Vault there is a light integration of data on the main business entities of the organization (based preferably on the conceptual data model). Designing the Data Vault data model is a creative process, loading the model can be generated. Business rules are executed going into the datamarts/downstream to the enduser. The later is a structural and a value transformation. This school of thought resembles the ideas of Dan Linstedt’s original methodology the closest.
- Mixed school; this school follows the classic school by designing a Data Vault model based on both the conceptual data model of the organization and the structure of its business transaction systems. It does however modularize the downstream business rules execution. The main differentiator for modularizing is maintenance, re-usability and auditability. When business rules are considered to be generic it might be wise to only develop it once and execute it many times. It might also be wise to audit the changes of the outcome of business rules over time. The staging out drafted in the figure below is modelled Data Vault style.
Just to separate between logical and technical architecture, especially with regard for the datamarts this is important. Datamarts for example can be cubes, virtual, pushed in memory or whatever.
In my next blog I will deliberate on the implications of these 'schools' regarding data logistics and the 'let's generate it all" lobby....;-)
***update***
The term Raw Data Vault as used in this post should be translated to Staging Vault. Conform a blog postI wrote with Dan Linstedt regarding terminology.
Hi Ronald,
Nice blog. Good idea to compare those schools for implementing Data Vaults.
I'm wondering whether there should'nt be a fourth (also mixed) school with a dimensional staging out.
Another point of discussion is whether we should model every source-table in the form of a DV. DV is specially a good way of modelling (important) masterdata. That does’nt mean IMO that all the tables should be modelled as HUBS or LINKS with underlying SATS.
I am also looking forward for your deliberations on the 'let's generate it all" lobby.
Posted by: Rob_mol | Saturday, January 29, 2011 at 03:26 PM
He Rob,
thx for the heads up - I deliberately skipped the staging layer for dimensional output. Technology is racing very fast at the moment and the datamarts architecture/strategy is influenced by it big-time (e.g. virtualisation, in-mem, massive proc power tc.). It deserves a separate blog post! Good point though.
Your second question is extremely valid! More and more I am convinced that the characteristics of the source data determines the way it should be propogated. For example the distinction between master data and transactions/events. Or the distinction between 'already historically staged data in the source' versus 'non already historically staged data in the source'....Again - seperate blog post. This one I had my eyes on for some time now. Again - good point.
Thx for the reply.
Posted by: Ronald Damhof | Sunday, January 30, 2011 at 03:22 AM
Hi Guys,
@Rob,
The answer is of course *no*
IMO the biggest friends of a DV modeler are the (histozrized) reference table and the transactional link, Neither of which are primary DV constructs. The real art of DV modeling is is not what you put in your DV (model) but what to leave out (kind of DV-ZEN;)
Posted by: DM_Unseen | Monday, January 31, 2011 at 12:36 AM
DV-ZEN ; lol
Posted by: Ronald Damhof | Monday, January 31, 2011 at 12:48 AM
@Ronald,
Back to School:
We *Always* separate the design of our Data Vault and Business Data Vault, because they are (more or less) independent. I classify Data Vaults and Business Data Vaults separately.
Data Vault schools:
School 1: "Source system(s) super ODS"
Raw non-integrated Data Vault
School 1a: "Super ODS"
Raw integrated Data Vault.
This is what we are doing at the RU BTW
School 1b: "Classic DV"
"Business Model Oriented Data Vault". There is a gliding scale between schools 1a and 1b.
Business Data Vault Schools:
Business Rule Vault:
(your option 3) Puts business rules in a DV structure for staging out. This is what we are doing also.
(Full) Business Data Vault.
If there is *no* staging out BR to the data marts I call it a BusVault (Kimball BUS architecture in a Data Vault). If there is some staging out (BR) it's a Full Business Data Vault.
I can (and will) combine any and all to get to the right architecture for the organization (there is no "best" solution IMO, just some sub-optimal choices).
I do have to remark that I consider source or business model orientation a business decision and not a DV architectural one. For me this is not an argument for or against a DV architecture, but one for running the organization.
Posted by: DM_Unseen | Monday, January 31, 2011 at 01:00 AM
@Ronald,
Surprisingly ZEN-DV is grounded in formal science/model transformation theory. The best proof is the definition and usability of the "Hub Minimalization Rule".
Posted by: DM_Unseen | Monday, January 31, 2011 at 01:35 AM
Some issues I have:
First; you seem to separate between upstream architecture and downstream architecture. Which is fine, but it lacks coherency I belief. For example; the Business rule vault as well as the BusVault - as you call it - are hard to combine with school 1 as mentioned by you. Even combining it with school 1a could have additional challenges.
Furthermore; I am not in favor of the term 'Business Vault'. Somehow it surfaced in the last years - it confuses discussions a lot in my opinion.
Second; the term ODS is tripping me off. It is a highly misused term and lots of peeps have a specific association with it. Your 'ODS' does not seem to be coherent with ,for example, Inmon's ODS. Why not call it as it is; a copy of the source where history is maintained. A persistant/historical staging environment might be a better word.
Third; school 1a - raw integrated Data Vault. Just to be curious, but what do you integrate if it’s not the business keys?
Fourth; 'source or business orientation is a business decision'. I do not get this at all. Architetcural choices like this one (which in my opinion is a Design decision) are based on requirements, leading architectural principles and information architecture. The choice for source or business orientation should always be argumented in line with the these requirements, principles and Information architecture. One can argue that ALL design decisions are business decisions, but I have never confronted my business with these detailed choices.
Last but not least; I think it is our job to think about decision trees; general guidelines for determining when to use what school of thought and what its implications are for business as well as IT.
And we just started that discussion – cool ain’t it ;-)
Posted by: Ronald Damhof | Monday, January 31, 2011 at 01:42 AM
Some feedback:
1. I agree they are not fully orthogonal, but that's because you want to put a 'split' somewhere even if you look at it conceptually there s no *real* split (THERE IS NO SPOON!) *unless* you consider a metadata split
I used Business Data Vault (or Business Vault) for anything Data Vault like not directly connected to source systems but YMMV, beter words create better worlds :)
2. I know ODS is misused al lot, and I actually don't care a fig about those idiotic definitions;)
An historized staging (HSA) is usulally in 5NF IMO, so that's why I do not use that name.
3. That's eaxactly what we integrate, but we have almost no busines model/analysis to optimize our DV further towards a classical DV
4. I think I understand your comment, but if business thinks sources need to drive the DV it's *their* descision, not mine. Of course it will become an architectural principle etc. etc. I'f like to live in your world for once, because *my* customers/endusers are confronting *me* with these detailed choices all the time:)
Posted by: DM_Unseen | Monday, January 31, 2011 at 02:03 AM
Some other thoughts:
I think it will help to differentiate between a conceptual model and a physical implementation.
The conceptual model consist IMO of three segments: source data, business model (in terms of keys and attributes of the business-entities) and information model (relevant facts and dimensions). On this conceptual level we have translation and integration rules for mapping the source data on the business model and business rules for enriching business data to information with business value. To construct this conceptual model you start with the business value and work your way back to the source data. On this level you don’t need any DV-school. Maybe some science will help (for instance about business rules methodology and development of ontologies and taxonomies). Conceptual modelling for BI looks to me like a rather uncultivated territory. I am very interested in experiences.
The physical implementation of the conceptual model depends mainly on non-functional requirements (history, traceability, timeliness, adaptability, cost, etc.) and possibilities of the available toolset. I think its good to have an overview of the different ways (schools) to physical implement a data warehouse. Here we need good craftsmanship. The carpenter who knows when to use a hammer and when to use a screwdriver.
Posted by: Rob_mol | Monday, January 31, 2011 at 02:38 PM
@Rob,
Interesting comment. At the Radboud University we're thinking along the same lines. We use what we call an 'internal ontology cloud' which is in fact an adapted FCO-IM representation. This will allows us to abstract from almost *anything* (Source systems, business models, business rules etc) and conceptually integrate all our models in one repository. Here we can tie all our models together and generate DV, star schemas, whatever-you like-schemas. The conceptual nature of FCO-IM used together with elementary fact-integration should allow for almost arbitrary models and concepts to relate and integrate. From there we can also generate mappings (note, mappings are not defined as such, but generated from the fact integration).
However, we're still on our first prototype, and a lot of (academic) research and development is still required to get all of this up and running.
Posted by: DM_Unseen | Tuesday, February 01, 2011 at 12:38 AM
@rob; I like the way you think! It is in fact exactly what I am doing at most client sides (although time is limited - and exepnsive- to really take a deepdive) and it is also conform the architectural framework of TOGAF or the EIA of IBM. For example the above schools are a physical instantiation of one type of conceptual model. There are more.....
Another blog, or...
We should just write a book guys...;-)
Posted by: Ronald Damhof | Tuesday, February 01, 2011 at 11:23 AM
@Ronald- the term Business Data Vault was coined by Dan, so I guess we're stuck with it (although he also calls it EDW+, wonder where that's from ;-)
(Interested in your current thoughts on generation as well)
@Rob: I see two implementations of BI solutions with build-in conceptual BI modeling, so it's not completely virgin territory:
- Kalido: appears to have a rather good conceptual modeling approach, but I haven't tried it myself yet. With their underlying Generic Data Model instead of DV they have of course technically a quite different approach.
- BIReady also has a conceptual modeler, but less powerful. Its data model is closely related to "raw" data vault.
- (Quipu has no conceptual modeler as yet, but they plan to do so)
Posted by: Elwin Oost | Wednesday, February 02, 2011 at 04:59 AM
@Elwin,
Kalido, BIReady and Quipu with their Business Models are well known. The issue is that they are by themselves totally business driven, and not source driven nor source oriënted. You have to do that all yourself.
If you like tooling then you want a tool that 'integrates' the business model and source models in a non destructive a consistent way and then creates/generates a 'Business Oriënted' (source auditable) Data Vault where possible, and an EDW+/Business (Rule) Data Vault where required.
Posted by: DM_Unseen | Wednesday, February 02, 2011 at 06:19 AM
@Elwin @DM_Unseen
Please look out for confusing the conceptual layer and the physical implementation.
Tools like Kalido, BIReady and Quipu have IMO to do with the physical implementation.
Of course there are also (other) tools that can be helpfull for developping and maintaining the models and rules in the conceptual layer.
Posted by: Rob_mol | Wednesday, February 02, 2011 at 02:56 PM
@Rob,
I know. However, I think that using FCO-IM (at the conceptual layer) can directly drive the physical layer as well (and should, because you want minimal manual intervention while travelling&transforming between those layers).
Note that FCO-IM can also play a big role in understanding and executing DV transformations as well.
This makes it a *VERY* interesting (conceptual) modeling technique indeed.
Posted by: DM_Unseen | Thursday, February 03, 2011 at 12:16 AM
@DM_Unseen @Rob
I agree these solutions are no panacea, but to call them either strictly source or business-based is imho (mostly) not doing them justice.
Only Quipu is currently still distinctly source-oriented. They're planning to extend it with a business model layer, but they're not there yet.
Kalido and BIReady both have conceptual modelers completely separated from the source systems (though you can reverse engineer from source if required). I particularly like Kalido's conceptual modeler.
There are indeed many other good (/better) solutions for conceptual modeling, but imho these have yet to be integrated better in existing BI platforms (or sparkling new ones) to have more impact on the BI market.
Posted by: Elwin Oost | Thursday, February 03, 2011 at 03:27 AM
Reading the latest comment there seems to be a fluent transtition towards two very interesting discussions:
- Bridging and differentiating between Concept, logic and technique
- The level of modeldriven data solutions (the 'lets generate it all lobby').
My 2 cents; both discussions show that - at least in the Netherlands - we are trying (and sometimes succeeding) in elevating the discussions to a higher abstraction- and professional level.
Posted by: Ronald Damhof | Friday, February 04, 2011 at 03:11 AM
@Ronald,
You might be tempted to think that these 2 issues are somehow related ;)
Posted by: DM_Unseen | Saturday, February 05, 2011 at 05:27 AM
Hi Guys,
Please don't take this the wrong way...
I see Busn Data Vault, Staging Out, and EDW+ as the same thing... regardless of what you call it... Regarding coining of the term, it really doesn't matter to me..
You all know that what I strive to do is seek consistency, so I think: we should start a poll somewhere (perhaps on LinkedIn?) and take a vote on what to call the layer - let everyone decide...
I guess the other question here is, is there truly only one layer? or are there multiple layers?
One other thought from my side of the house, BDV/Staging Out/EDW+ or whatever you want to call it, is a logical name, for (mostly) a subset of tables coming from the Raw DV (the true EDW), where the data is processed through common business rules used by all data marts.
It doesn't have to be a complete replication of all the data, and it doesn't even have to be a separate model...
Remember the certification class? I talked (albeit briefly) about the scale-free architecture, where you can stack one DV model on top of another in a tree-like fashion, using Links to hook them together.
In other words, you can create separate "master data" hubs, and "business driven sats", and "business driven links" on a new "layer" of Data Vault, that is linked (if you want it to be) by higher grain of links back to the raw Data Vault at points it makes sense.
This concept can get messy to manage, so generally I do recommend a "separate" model, and storage area for this data set, hence: business Data Vault...
But I really don't care what you call it, so long as everyone agrees to call it the same thing, and define it with a standard definition.
This would be a great one for the consortium to deliberate on.
Cheers,
Dan L
Posted by: Dan Linstedt | Saturday, February 12, 2011 at 03:15 AM
Dan,
Good post - thx for the feedback. And I don't take it the wrong way. This is why we have blogs! I made a new blogpost regarding the same subject. I have a strong need to discuss terminology and meaning surrounding DV a bit more. So I am going to polarise a bit - I am evil....
And yes - the consortium (or platform) will have its purpose in these discussions. However, they are not a politburo. I think "we", being the DV community, need to come to some kind of agreement on terminology and its definitions.
It is a bit of a mess right now.....
regards - hope the snow was any good!
Posted by: Ronald Damhof | Saturday, February 12, 2011 at 07:29 AM
Cialis and blood pressure http://www.maxipharmacy.com/ where to puchase cialis online usa.
Posted by: cialis to buy | Tuesday, May 17, 2011 at 01:49 PM
canada cialis compare cialis viagra fda approved cialis female viagra cialis fda approval http://www.maxipharmacy.com/.
Posted by: cialis to buy | Friday, May 27, 2011 at 04:26 AM
With us you might be sure to take pleasure from all the advantages involving small loans for quick debt consolidation loans despite your poor credit standing live tv online free the government has more information on school funding of single mothers, including a searchable small business loans and grants tool.
Posted by: live tv online free | Monday, September 30, 2013 at 12:18 PM
My spouse and I stumbled over here coming from a different page and thought I may as well check things out. I like what I see so i am just following you. Look forward to exploring your web page again.
Posted by: กลูต้า | Tuesday, October 08, 2013 at 04:31 PM
Magnificent beat ! I wish to apprentice while you amend your web site, how can i subscribe for a blog site? The account aided me a acceptable deal. I had been a little bit acquainted of this your broadcast provided bright clear idea
Posted by: facebook login proxy link | Sunday, October 13, 2013 at 04:03 AM