« DV - the case against a Raw Data Vault | Main | One time event with DIKW, Dan Linstedt and friends »

Thursday, February 17, 2011

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8354d01ac69e2014e5f467df8970c

Listed below are links to weblogs that reference Dan Linstedt & Ronald Damhof; lets be clear about the Raw Data Vault:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Basstiekema

Ronald, can you rename your meaning of a raw DV? So if we talk about raw DV we talk about the same and that is the actual Data Vault as Dan also discribed in his book. The Raw DV you (and many other) are talking about can for example be named Staging Data Vault?

Hennie de Nooijer

I'm a bit confused now. There are two types of raw datavaults? In what way are they different to each other? Please make al list of differences (and agreements) to clarify this! Please give examples!

The raw datavault defined by you is the one that is generated from the source (based on databasekeys) and the one of Linstedt is defined on the business keys (which is the craftman's job)?

Ronald Damhof

Yeah, we are aware of the confusion. We were already busy with the terminology. ***update posted***

DM_Unseen

Staging Vault is too suggestive IMO.

If we start discussing names we could have:

Stovepipe Data Vault
Source (generated/oriented) Data Vault
Technical Data Vault

Harald Kikkers

DATPROF will soon release a data vault demo containing Historical Staging, Staging DV, Raw DV. It will contain sample data and will run on Oracle and SQL Server. This demo will show the differences between the three approaches in all its aspects (modeling, generation and deployment).

Johannesvdb

We just call it the source data vault (sDV). It's a source model converted to a data vault model. Hence the name.

How could this not be part of the DV methodology? I am baffled by that statement. Of course, on it's own, an sDV is pretty useless. It's whatever you do with your sDV next, what gives it value. In our case: a business data vault (bDV). It's the complete solution that gives you value, not a singled out component.

I think the only different in schools/approaches is _where_ in the stream you have placed your different EDW functionalities (history tracking, business key integration, semantic conversion, etc.).

As long ans you end up with a correct data vault _with business value_, I don't care how you get there (whether you generate an sDV or not, whether you integrate immediately or in a later stage).

Ronald Damhof

@Johannes;The staging vault is a copy of the source, yes DV modelled, but not according to DV methodology. Why are you baffled? Is this not a 'fact'?

I do not pass judgement - good or bad - on your solution. You seem to think that?

Wrong. Tbh if it's working for you - perfect! No problem. But DV methodology is not just a DV modelled data model, it is more than that. We need to differentiate this approach from DV methodology as Dan intended it. Simple.

In fact I am hugely curious on how you implemented your solution and would like to sit with you and watch it work!

Delostilos

Hi Ronald and Dan,

Nice discussions about names and terms. Let's join :)

I think 'staging DV' is a bad combination of terms, staging is volatile and a DV is not. Maybe just stick to the term Johannes uses, 'source DV'. It's directly derived from the source interface. Maybe 'interface DV' is a better term, who knows?

@Ronald, as you stated in a comment in the 'case against the raw DV', you often end up using TK's in the hubs. So in real life a (elementary/fact?) DV consists of hubs with TK's and BK's.

I think the discussion is to black and white, it should be more colorful (or should I say there is no one version of the truth, just a couple of view points?). I see the collection of sDV's (with TK's and BK's) as a starting point. It's an evolution, you start with the TK's and after some time (when business knowlidge grows) you see more BK's popping up from your sources. The collection of sDV's CAN be coupled on BK's more and more and it is looking more like a (elementary/fact) DV.

Source systems are often modeled crappy and users are creative, so we have to use business rules to further integrate. I like the definition of 'business DV' Dan uses in his comment on the DV schools post:
"a subset of tables coming from the Raw DV (the true EDW), where the data is processed through common business rules used by all data marts"

Dan also states that we can connect DV's, creating one big virtual DV (a nice hypergraph :)
But to keep it managable we create logical layers. So we end up with the four layers (and smash in some new terms):
- the Interface (Staging/CDC) layer
- the Fact (Dan's Raw DV, the collection of (mostly connected) sDV's) layer
- the Common Business layer (Staging out, EDW+, bDV)
- the User layer (Data Marts)

Just a little contribution from my side. Maybe not completly 'DV methodology' proof, but that's not a problem ;)

Regards,
JJ.

Ronald Damhof

He JJ - good to see you here.

I started this discussion to polarise - yes, be black and white in order to more clearly see boundaries and get some kind of logic in the mishmash of terminology.

Your post is nuanced and rightly so.

Naming a 100% source driven model 'SourceDv' 'Staging Dv' ...I honestly do not care about it much. But, I saw the term 'Raw Data Vault' also mentioned in Dan's book and it made me/us wanna clarify it, because the 'Raw DV' used in the NL and the 'Raw DV' (=True EDW, DV) used by Dan, are not the same.

I still have extreme doubts on the usefulness of a 'Source DV', I have not heard any good arguments as opposed to a persistant staging area or a DV (True EDW) as it was meant.

I also doubt the evolutionary character your describing - from source DV's to a true EDW (=DV). I think it's an illusion. Would be nice if you could elaborate a bit more.

Finally - the origin of my worries (;-)) stem from the facts that I see certain practitioners/Service Providers/automated tooling selling generated DWH's, DV's like they sell cookies. In my opinion the customer is getting squad/zip/zero (the semantic gap is just a big as it was be4), it will hurt the DV community and will constrain DV innovation the coming years. This should not be confused with DV methodology.

A few weeks a go I met a 'consultant' saying he can generate any DV-DWH in 8 hours. In my opinion he generated a copy of a source, nothing to do with DV methodology.

I can make a copy even faster btw....

And btw - the concept of the bDV (EDW+, Staging Out), which I coined first with the Tax Authority service and was inspired by Albert Heijns' Pallas project (I discussed it with Dan in 2007 over beers), is mismatched already as well.

The bDV coming from the 'Source DV' guys is another bDV (EDW+, Staging Out) as I and Dan defined it.

Whatever opinion we all might have (Clint Eastwood; Opinions are like assholes, everybody got 1), I think we all agree that we need STANDARDS. For all I care we get to have several DV methodologies. I just want them out in the open, more transparancy and more discussion.

At this moment peeps/companies/products are all screaming they support/generate Data Vaults....But they do really? Or is it some kind of 'fork'/mutation.

Again - thx for your input!

Ronald

DM_Unseen

@Ronald & JJ

An evelutionary DV is what I'm currently working at the moment at the RU. This is mainly due to the fact that source system integration is ad hoc and incomplete.

I'm not against source driven DV's, but I *AM* against unnesecary usage of TK's in a DV, and I will go to great lengths to avoid them. While my current approach is generatable, current DV genaration tools are not sophisticated enough to handle this (JJ knows what I mean). Besides, solving TK issues is impossible in standard DV. You actually need to borrow transformations from Anchor Modeling to make this work in a generic and automatable fashion.

This state of affairs leaves correctly handling of TK issues beyond the scope of current DV generators.

Joey Moelands

I would like to point back to a blog post of Dan Linstedt (august 2010). How can we place the "Staging Area" as described by Dan in the "Raw-Data Vault" discussion?

http://danlinstedt.com/datavaultcat/data-vault-and-staging-area/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+DataVaultCoaching+%28Data+Vault+Coaching%29

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

My Photo

Linkedin


  • View Ronald Damhof's profile on LinkedIn

Twitter Updates

    follow me on Twitter