Lately I have been involved in many discussions- online as well as off line - regarding Data Vault architectures. It seems that three ‘schools of thought’ are developing in the Netherlands. Separating and recognizing these three schools is important.
For the folks not familiar with Data Vault – google it, read one of my articles (e.g Download Damhof_DBM0508_ENG) or go to http://danlinstedt.com/ .
- Raw school; this school of thought makes a distinction between a raw data vault and a "business" data vault. The raw part is a copy of the source where entities are basically split up in a Hub, Satellite or Link construct. The structure is derived (nearly 100%) from the source structure and no integration is yet achieved. The rules for determining the target types are pretty fixed and can therefor be automated. Of course the associate loading scripts can be generated. This is a pretty straightforward structure transformation. One can argue whether or not this is some kind of persistant or historical staging environment. The "business" data vault (I do not like this terminology - but to distinguish it from its raw brother I use it here) is where the business rules are executed, where integration is achieved and where the result is stored. This part of the data logistic is typically a value-transformation and not so much a structure transformation.This school advocates that making datamarts is again a structural transformation (which can be automated, business rules ae already executed) to star- or snowflakes.
- Classic school; this school of thought advocates a strategy where the Data Vault model is designed according to a mix of the conceptual data model of an organization and the structure of its business transaction systems.Going into te Data Vault there is a light integration of data on the main business entities of the organization (based preferably on the conceptual data model). Designing the Data Vault data model is a creative process, loading the model can be generated. Business rules are executed going into the datamarts/downstream to the enduser. The later is a structural and a value transformation. This school of thought resembles the ideas of Dan Linstedt’s original methodology the closest.
- Mixed school; this school follows the classic school by designing a Data Vault model based on both the conceptual data model of the organization and the structure of its business transaction systems. It does however modularize the downstream business rules execution. The main differentiator for modularizing is maintenance, re-usability and auditability. When business rules are considered to be generic it might be wise to only develop it once and execute it many times. It might also be wise to audit the changes of the outcome of business rules over time. The staging out drafted in the figure below is modelled Data Vault style.
Just to separate between logical and technical architecture, especially with regard for the datamarts this is important. Datamarts for example can be cubes, virtual, pushed in memory or whatever.
In my next blog I will deliberate on the implications of these 'schools' regarding data logistics and the 'let's generate it all" lobby....;-)
***update***
The term Raw Data Vault as used in this post should be translated to Staging Vault. Conform a blog postI wrote with Dan Linstedt regarding terminology.