Last week I wrote a post on the Raw Data Vault that got some good insightful comments. This post is a joint effort of me and Dan Linstedt regarding this subject.
In his book – published this week - Dan mentions a Raw Data Vault as well. We have discussed this and came to the conclusion that the Raw Data Vault as mentioned by Dan in his book is in fact the actual DV (integrated on the HUB’s, using business keys). He used the term ‘Raw’ to distinguish from the Business Data Vault.
Let us be clear; the Raw Data Vault as described in my blog post “Data Vault Schools” is not the same Raw Data Vault as described in Dan’s book. In fact it’s a fundamental difference with regard to DV methodology as Dan intented it. This is in line with my blog post "the case against the Raw Data Vault".
We both agree that there is no way to generate everything, because identification of the business keys has to happen. We do however acknowledge the possibility that, if you can specify the business keys, there are options to generate the model.
Ronald Damhof & Dan Linstedt
***update***
The first two comments on this post are valid and we felt to be more precise on the terminology.
1) Raw Data Vault = A term that should no longer be used in DV methodology. If it is used in formal writings, communications, blogs or whatever, then it resembles a Data Vault (integrated on the HUB’s, using business keys etc..) as defined by Dan Linstedt.
I will also update my other 2 posts to reflect this terminology.
Ronald, can you rename your meaning of a raw DV? So if we talk about raw DV we talk about the same and that is the actual Data Vault as Dan also discribed in his book. The Raw DV you (and many other) are talking about can for example be named Staging Data Vault?
Posted by: Basstiekema | Thursday, February 17, 2011 at 02:48 AM
I'm a bit confused now. There are two types of raw datavaults? In what way are they different to each other? Please make al list of differences (and agreements) to clarify this! Please give examples!
The raw datavault defined by you is the one that is generated from the source (based on databasekeys) and the one of Linstedt is defined on the business keys (which is the craftman's job)?
Posted by: Hennie de Nooijer | Thursday, February 17, 2011 at 04:23 AM
Yeah, we are aware of the confusion. We were already busy with the terminology. ***update posted***
Posted by: Ronald Damhof | Thursday, February 17, 2011 at 05:04 AM
Staging Vault is too suggestive IMO.
If we start discussing names we could have:
Stovepipe Data Vault
Source (generated/oriented) Data Vault
Technical Data Vault
Posted by: DM_Unseen | Thursday, February 17, 2011 at 06:42 AM
DATPROF will soon release a data vault demo containing Historical Staging, Staging DV, Raw DV. It will contain sample data and will run on Oracle and SQL Server. This demo will show the differences between the three approaches in all its aspects (modeling, generation and deployment).
Posted by: Harald Kikkers | Thursday, February 17, 2011 at 07:39 AM
We just call it the source data vault (sDV). It's a source model converted to a data vault model. Hence the name.
How could this not be part of the DV methodology? I am baffled by that statement. Of course, on it's own, an sDV is pretty useless. It's whatever you do with your sDV next, what gives it value. In our case: a business data vault (bDV). It's the complete solution that gives you value, not a singled out component.
I think the only different in schools/approaches is _where_ in the stream you have placed your different EDW functionalities (history tracking, business key integration, semantic conversion, etc.).
As long ans you end up with a correct data vault _with business value_, I don't care how you get there (whether you generate an sDV or not, whether you integrate immediately or in a later stage).
Posted by: Johannesvdb | Friday, February 18, 2011 at 03:04 AM
@Johannes;The staging vault is a copy of the source, yes DV modelled, but not according to DV methodology. Why are you baffled? Is this not a 'fact'?
I do not pass judgement - good or bad - on your solution. You seem to think that?
Wrong. Tbh if it's working for you - perfect! No problem. But DV methodology is not just a DV modelled data model, it is more than that. We need to differentiate this approach from DV methodology as Dan intended it. Simple.
In fact I am hugely curious on how you implemented your solution and would like to sit with you and watch it work!
Posted by: Ronald Damhof | Friday, February 18, 2011 at 02:41 PM
Hi Ronald and Dan,
Nice discussions about names and terms. Let's join :)
I think 'staging DV' is a bad combination of terms, staging is volatile and a DV is not. Maybe just stick to the term Johannes uses, 'source DV'. It's directly derived from the source interface. Maybe 'interface DV' is a better term, who knows?
@Ronald, as you stated in a comment in the 'case against the raw DV', you often end up using TK's in the hubs. So in real life a (elementary/fact?) DV consists of hubs with TK's and BK's.
I think the discussion is to black and white, it should be more colorful (or should I say there is no one version of the truth, just a couple of view points?). I see the collection of sDV's (with TK's and BK's) as a starting point. It's an evolution, you start with the TK's and after some time (when business knowlidge grows) you see more BK's popping up from your sources. The collection of sDV's CAN be coupled on BK's more and more and it is looking more like a (elementary/fact) DV.
Source systems are often modeled crappy and users are creative, so we have to use business rules to further integrate. I like the definition of 'business DV' Dan uses in his comment on the DV schools post:
"a subset of tables coming from the Raw DV (the true EDW), where the data is processed through common business rules used by all data marts"
Dan also states that we can connect DV's, creating one big virtual DV (a nice hypergraph :)
But to keep it managable we create logical layers. So we end up with the four layers (and smash in some new terms):
- the Interface (Staging/CDC) layer
- the Fact (Dan's Raw DV, the collection of (mostly connected) sDV's) layer
- the Common Business layer (Staging out, EDW+, bDV)
- the User layer (Data Marts)
Just a little contribution from my side. Maybe not completly 'DV methodology' proof, but that's not a problem ;)
Regards,
JJ.
Posted by: Delostilos | Friday, February 18, 2011 at 06:29 PM
He JJ - good to see you here.
I started this discussion to polarise - yes, be black and white in order to more clearly see boundaries and get some kind of logic in the mishmash of terminology.
Your post is nuanced and rightly so.
Naming a 100% source driven model 'SourceDv' 'Staging Dv' ...I honestly do not care about it much. But, I saw the term 'Raw Data Vault' also mentioned in Dan's book and it made me/us wanna clarify it, because the 'Raw DV' used in the NL and the 'Raw DV' (=True EDW, DV) used by Dan, are not the same.
I still have extreme doubts on the usefulness of a 'Source DV', I have not heard any good arguments as opposed to a persistant staging area or a DV (True EDW) as it was meant.
I also doubt the evolutionary character your describing - from source DV's to a true EDW (=DV). I think it's an illusion. Would be nice if you could elaborate a bit more.
Finally - the origin of my worries (;-)) stem from the facts that I see certain practitioners/Service Providers/automated tooling selling generated DWH's, DV's like they sell cookies. In my opinion the customer is getting squad/zip/zero (the semantic gap is just a big as it was be4), it will hurt the DV community and will constrain DV innovation the coming years. This should not be confused with DV methodology.
A few weeks a go I met a 'consultant' saying he can generate any DV-DWH in 8 hours. In my opinion he generated a copy of a source, nothing to do with DV methodology.
I can make a copy even faster btw....
And btw - the concept of the bDV (EDW+, Staging Out), which I coined first with the Tax Authority service and was inspired by Albert Heijns' Pallas project (I discussed it with Dan in 2007 over beers), is mismatched already as well.
The bDV coming from the 'Source DV' guys is another bDV (EDW+, Staging Out) as I and Dan defined it.
Whatever opinion we all might have (Clint Eastwood; Opinions are like assholes, everybody got 1), I think we all agree that we need STANDARDS. For all I care we get to have several DV methodologies. I just want them out in the open, more transparancy and more discussion.
At this moment peeps/companies/products are all screaming they support/generate Data Vaults....But they do really? Or is it some kind of 'fork'/mutation.
Again - thx for your input!
Ronald
Posted by: Ronald Damhof | Saturday, February 19, 2011 at 02:08 AM
@Ronald & JJ
An evelutionary DV is what I'm currently working at the moment at the RU. This is mainly due to the fact that source system integration is ad hoc and incomplete.
I'm not against source driven DV's, but I *AM* against unnesecary usage of TK's in a DV, and I will go to great lengths to avoid them. While my current approach is generatable, current DV genaration tools are not sophisticated enough to handle this (JJ knows what I mean). Besides, solving TK issues is impossible in standard DV. You actually need to borrow transformations from Anchor Modeling to make this work in a generic and automatable fashion.
This state of affairs leaves correctly handling of TK issues beyond the scope of current DV generators.
Posted by: DM_Unseen | Sunday, February 20, 2011 at 05:37 AM
I would like to point back to a blog post of Dan Linstedt (august 2010). How can we place the "Staging Area" as described by Dan in the "Raw-Data Vault" discussion?
http://danlinstedt.com/datavaultcat/data-vault-and-staging-area/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+DataVaultCoaching+%28Data+Vault+Coaching%29
Posted by: Joey Moelands | Monday, February 21, 2011 at 01:10 AM
一度このは 想像を絶するがある 、実際よりカップル 品種。材料の難燃性 施設それを与えた のタッチ神秘的な アドレスギリシア私達は提案する 、土壌温度 に向かって70 度 W(約 21 資格C) の ココア。正しいは異なる 鳥の 入浴ヒーター。 エルメスベルトレディース
Posted by: エルメスベルトレディース | Saturday, October 12, 2013 at 12:03 PM
Thank you, I have recently been searching for information about this topic for a while and yours is the greatest I have found out till now. But, what in regards to the bottom line? Are you sure concerning the source?
Posted by: luxguru.typepad.com | Monday, October 14, 2013 at 06:15 AM
Now I am going away to do my breakfast, later than having my breakfast coming yet again to read more news.
Posted by: Grow XL | Wednesday, October 16, 2013 at 07:43 AM
As the admin of this web site is working, no question very soon it will be famous, due to its feature contents. https://svpply.com/harriettsamson
Posted by: http://musclexedgeblog.net/ | Wednesday, October 16, 2013 at 09:24 PM
When someone writes an paragraph he/she retains the idea of a user in his/her brain that how a user can understand it. So that's why this paragraph is great. Thanks!
Posted by: Anti Wrinkle Formula | Thursday, October 17, 2013 at 06:01 PM
Excellent post. I used to be checking constantly this blog and I'm impressed! Very helpful info particularly the remaining part :) I take care of such info much. I was seeking this certain info for a long time. Thank you and best of luck.
Posted by: http://xtremenitroreview.org | Friday, October 18, 2013 at 10:11 PM
Excellent blog you have got here.. It's hard to find high quality writing like yours these days. I really appreciate people like you! Take care!!
Posted by: Cenaless Dieting | Saturday, October 19, 2013 at 05:21 AM
I am extremely impressed with your writing skills as well as with the layout on your weblog. Is this a paid theme or did you modify it yourself? Anyway keep up the excellent quality writing, it's rare to see a nice blog like this one nowadays.
Posted by: Michael | Saturday, October 19, 2013 at 09:56 AM
My brother recommended I might like this blog. He was totally right. This post truly made my day. You cann't imagine just how much time I had spent for this information! Thanks!
Posted by: Amino Prime Review | Sunday, October 20, 2013 at 09:18 AM
I every time spent my half an hour to read this webpage's posts every day along with a mug of coffee.
Posted by: Anti Aging Serums | Sunday, October 20, 2013 at 07:04 PM
Hi, I would like to subscribe for this web site to take latest updates, thus where can i do it please help.
Posted by: Ultimate Candida Diets | Monday, October 21, 2013 at 11:21 AM
The most exciting opportunity for the deserving students will be the study loan provided by bank bluehost 4.95 getting a long period personal loan is specially difficult being a result from the strict conditions and of the loan as well as certain requirements that happen to be demanded through the lender.
Posted by: bluehost 4.95 | Tuesday, October 22, 2013 at 07:07 PM
It's actually very difficult in this full of activity life to listen news on TV, therefore I only use the web for that purpose, and take the latest news.
Posted by: Maine SEO REviews | Friday, October 25, 2013 at 02:50 PM
Hi I am so thrilled I found your blog, I really found you by error, while I was browsing on Bing for something else, Anyhow I am here now and would just like to say thanks a lot for a remarkable post and a all round thrilling blog (I also love the theme/design), I don't have time to look over it all at the moment but I have bookmarked it and also added in your RSS feeds, so when I have time I will be back to read more, Please do keep up the awesome job.
Posted by: Anti Aging Cream | Friday, October 25, 2013 at 08:46 PM