« The next generation EDW 2/3 - DB/M 7-2008 | Main | Second Data Vault Seminar - Dan Linstedt »

Monday, October 27, 2008

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8354d01ac69e2010535c39416970c

Listed below are links to weblogs that reference Source Extraction.....:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Stephan Deblois

may be the first step into building a Data Vault could be to bring the source tables themselves into the Data Vault. Yes, they are not integrated yet but you have some benefits to bring them early in the DV. In real life, source systems are not always sending correct information and receiving corrected past records can be manage gracefully if the source tables are already stored in the DV with it's full history tracking. You can then load the integrated parts of the DV from the current view of the source DV tables. I don't see any problem with merging source and integrated data into one DV but you could also build two Data Vault, a "Staging Data Vault" and an "Integrated Data Vault". A benefit of a one DV architecture would be to give the business access to a layer of "grey" data...not scrubbed yet not integrated yet but still useful for some reporting scenarios.

Ronald

Stephan, the problem I mentioned with source extraction is typically a scale-problem. It's a freakin volume to get (delta) data from a source that is just huge in terms of volume.

Putting source tables next to the DV would result in huge databases....why? It's in my opion not necessary and not a valid strategy to put source tables inyyour DV. I really got questions about the extensiability of the DV model when you do such a thing. And how do you maintain a simple, yet standard and very performant load architecture?

Back to the topic....even if you put source tables in the data vault. How do you copy/extract/move the data from the source when it's huge, how do you do delta comparison when the volume is huge? That's the nasty stuff we gotta deal with in extraction of high-volume sources. They are not solved in moving data to the DV.

Stephan Deblois

Yes, you are right. The volume of source data makes a big difference in the strategy chosen. Currently, I am dealing with low volume but very complex data source. The sources are also all external (business partners) with not much control on the content. Fixing sources and keeping track of the changes is part of our daily routine...the source DV makes a lot of sense in our case. As usual, it depends :-)

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

My Photo

Linkedin


  • View Ronald Damhof's profile on LinkedIn

Twitter Updates

    follow me on Twitter