« DV - the case against a Raw Data Vault | Main | One time event with DIKW, Dan Linstedt and friends »

Thursday, February 17, 2011

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Basstiekema

Ronald, can you rename your meaning of a raw DV? So if we talk about raw DV we talk about the same and that is the actual Data Vault as Dan also discribed in his book. The Raw DV you (and many other) are talking about can for example be named Staging Data Vault?

Hennie de Nooijer

I'm a bit confused now. There are two types of raw datavaults? In what way are they different to each other? Please make al list of differences (and agreements) to clarify this! Please give examples!

The raw datavault defined by you is the one that is generated from the source (based on databasekeys) and the one of Linstedt is defined on the business keys (which is the craftman's job)?

Ronald Damhof

Yeah, we are aware of the confusion. We were already busy with the terminology. ***update posted***

DM_Unseen

Staging Vault is too suggestive IMO.

If we start discussing names we could have:

Stovepipe Data Vault
Source (generated/oriented) Data Vault
Technical Data Vault

Harald Kikkers

DATPROF will soon release a data vault demo containing Historical Staging, Staging DV, Raw DV. It will contain sample data and will run on Oracle and SQL Server. This demo will show the differences between the three approaches in all its aspects (modeling, generation and deployment).

Johannesvdb

We just call it the source data vault (sDV). It's a source model converted to a data vault model. Hence the name.

How could this not be part of the DV methodology? I am baffled by that statement. Of course, on it's own, an sDV is pretty useless. It's whatever you do with your sDV next, what gives it value. In our case: a business data vault (bDV). It's the complete solution that gives you value, not a singled out component.

I think the only different in schools/approaches is _where_ in the stream you have placed your different EDW functionalities (history tracking, business key integration, semantic conversion, etc.).

As long ans you end up with a correct data vault _with business value_, I don't care how you get there (whether you generate an sDV or not, whether you integrate immediately or in a later stage).

Ronald Damhof

@Johannes;The staging vault is a copy of the source, yes DV modelled, but not according to DV methodology. Why are you baffled? Is this not a 'fact'?

I do not pass judgement - good or bad - on your solution. You seem to think that?

Wrong. Tbh if it's working for you - perfect! No problem. But DV methodology is not just a DV modelled data model, it is more than that. We need to differentiate this approach from DV methodology as Dan intended it. Simple.

In fact I am hugely curious on how you implemented your solution and would like to sit with you and watch it work!

Delostilos

Hi Ronald and Dan,

Nice discussions about names and terms. Let's join :)

I think 'staging DV' is a bad combination of terms, staging is volatile and a DV is not. Maybe just stick to the term Johannes uses, 'source DV'. It's directly derived from the source interface. Maybe 'interface DV' is a better term, who knows?

@Ronald, as you stated in a comment in the 'case against the raw DV', you often end up using TK's in the hubs. So in real life a (elementary/fact?) DV consists of hubs with TK's and BK's.

I think the discussion is to black and white, it should be more colorful (or should I say there is no one version of the truth, just a couple of view points?). I see the collection of sDV's (with TK's and BK's) as a starting point. It's an evolution, you start with the TK's and after some time (when business knowlidge grows) you see more BK's popping up from your sources. The collection of sDV's CAN be coupled on BK's more and more and it is looking more like a (elementary/fact) DV.

Source systems are often modeled crappy and users are creative, so we have to use business rules to further integrate. I like the definition of 'business DV' Dan uses in his comment on the DV schools post:
"a subset of tables coming from the Raw DV (the true EDW), where the data is processed through common business rules used by all data marts"

Dan also states that we can connect DV's, creating one big virtual DV (a nice hypergraph :)
But to keep it managable we create logical layers. So we end up with the four layers (and smash in some new terms):
- the Interface (Staging/CDC) layer
- the Fact (Dan's Raw DV, the collection of (mostly connected) sDV's) layer
- the Common Business layer (Staging out, EDW+, bDV)
- the User layer (Data Marts)

Just a little contribution from my side. Maybe not completly 'DV methodology' proof, but that's not a problem ;)

Regards,
JJ.

Ronald Damhof

He JJ - good to see you here.

I started this discussion to polarise - yes, be black and white in order to more clearly see boundaries and get some kind of logic in the mishmash of terminology.

Your post is nuanced and rightly so.

Naming a 100% source driven model 'SourceDv' 'Staging Dv' ...I honestly do not care about it much. But, I saw the term 'Raw Data Vault' also mentioned in Dan's book and it made me/us wanna clarify it, because the 'Raw DV' used in the NL and the 'Raw DV' (=True EDW, DV) used by Dan, are not the same.

I still have extreme doubts on the usefulness of a 'Source DV', I have not heard any good arguments as opposed to a persistant staging area or a DV (True EDW) as it was meant.

I also doubt the evolutionary character your describing - from source DV's to a true EDW (=DV). I think it's an illusion. Would be nice if you could elaborate a bit more.

Finally - the origin of my worries (;-)) stem from the facts that I see certain practitioners/Service Providers/automated tooling selling generated DWH's, DV's like they sell cookies. In my opinion the customer is getting squad/zip/zero (the semantic gap is just a big as it was be4), it will hurt the DV community and will constrain DV innovation the coming years. This should not be confused with DV methodology.

A few weeks a go I met a 'consultant' saying he can generate any DV-DWH in 8 hours. In my opinion he generated a copy of a source, nothing to do with DV methodology.

I can make a copy even faster btw....

And btw - the concept of the bDV (EDW+, Staging Out), which I coined first with the Tax Authority service and was inspired by Albert Heijns' Pallas project (I discussed it with Dan in 2007 over beers), is mismatched already as well.

The bDV coming from the 'Source DV' guys is another bDV (EDW+, Staging Out) as I and Dan defined it.

Whatever opinion we all might have (Clint Eastwood; Opinions are like assholes, everybody got 1), I think we all agree that we need STANDARDS. For all I care we get to have several DV methodologies. I just want them out in the open, more transparancy and more discussion.

At this moment peeps/companies/products are all screaming they support/generate Data Vaults....But they do really? Or is it some kind of 'fork'/mutation.

Again - thx for your input!

Ronald

DM_Unseen

@Ronald & JJ

An evelutionary DV is what I'm currently working at the moment at the RU. This is mainly due to the fact that source system integration is ad hoc and incomplete.

I'm not against source driven DV's, but I *AM* against unnesecary usage of TK's in a DV, and I will go to great lengths to avoid them. While my current approach is generatable, current DV genaration tools are not sophisticated enough to handle this (JJ knows what I mean). Besides, solving TK issues is impossible in standard DV. You actually need to borrow transformations from Anchor Modeling to make this work in a generic and automatable fashion.

This state of affairs leaves correctly handling of TK issues beyond the scope of current DV generators.

Joey Moelands

I would like to point back to a blog post of Dan Linstedt (august 2010). How can we place the "Staging Area" as described by Dan in the "Raw-Data Vault" discussion?

http://danlinstedt.com/datavaultcat/data-vault-and-staging-area/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+DataVaultCoaching+%28Data+Vault+Coaching%29

エルメスベルトレディース

一度このは 想像を絶するがある 、実際よりカップル 品種。材料の難燃性 施設それを与えた のタッチ神秘的な アドレスギリシア私達は提案する 、土壌温度 に向かって70 度 W(約 21 資格C) の ココア。正しいは異なる 鳥の 入浴ヒーター。 エルメスベルトレディース

luxguru.typepad.com

Thank you, I have recently been searching for information about this topic for a while and yours is the greatest I have found out till now. But, what in regards to the bottom line? Are you sure concerning the source?

Grow XL

Now I am going away to do my breakfast, later than having my breakfast coming yet again to read more news.

http://musclexedgeblog.net/

As the admin of this web site is working, no question very soon it will be famous, due to its feature contents. https://svpply.com/harriettsamson

Anti Wrinkle Formula

When someone writes an paragraph he/she retains the idea of a user in his/her brain that how a user can understand it. So that's why this paragraph is great. Thanks!

http://xtremenitroreview.org

Excellent post. I used to be checking constantly this blog and I'm impressed! Very helpful info particularly the remaining part :) I take care of such info much. I was seeking this certain info for a long time. Thank you and best of luck.

Cenaless Dieting

Excellent blog you have got here.. It's hard to find high quality writing like yours these days. I really appreciate people like you! Take care!!

Michael

I am extremely impressed with your writing skills as well as with the layout on your weblog. Is this a paid theme or did you modify it yourself? Anyway keep up the excellent quality writing, it's rare to see a nice blog like this one nowadays.

Amino Prime Review

My brother recommended I might like this blog. He was totally right. This post truly made my day. You cann't imagine just how much time I had spent for this information! Thanks!

Anti Aging Serums

I every time spent my half an hour to read this webpage's posts every day along with a mug of coffee.

Ultimate Candida Diets

Hi, I would like to subscribe for this web site to take latest updates, thus where can i do it please help.

bluehost 4.95

The most exciting opportunity for the deserving students will be the study loan provided by bank bluehost 4.95 getting a long period personal loan is specially difficult being a result from the strict conditions and of the loan as well as certain requirements that happen to be demanded through the lender.

Maine SEO REviews

It's actually very difficult in this full of activity life to listen news on TV, therefore I only use the web for that purpose, and take the latest news.

Anti Aging Cream

Hi I am so thrilled I found your blog, I really found you by error, while I was browsing on Bing for something else, Anyhow I am here now and would just like to say thanks a lot for a remarkable post and a all round thrilling blog (I also love the theme/design), I don't have time to look over it all at the moment but I have bookmarked it and also added in your RSS feeds, so when I have time I will be back to read more, Please do keep up the awesome job.

The comments to this entry are closed.