Search This Blog

Thursday, May 14, 2009

"Where's the Process?"

Attended an MDM user conference today.

As the MDM Hub was described, I got the visual of an Operational Data Store on "Biz_rule steroids". Thought to myself, "Aren’t those biz rules about biz process? Where's the Process Governance role, the Process Steward role, Process Quality? Where's the definition of valid processes?" The message came out several times that MDM is functionally intense, process intense, a paradigm shifting experience. This is all reminiscent of a few years back at TDWI as the buzz about BPM and BAM whirled about. Lots and lots of implication and rhetoric of things that sound like “process", but we continue to talk and teach Data Governance, Data Steward, Data Quality, Data Integration, Data Models, and Data Profiling. We have even gone so far as coining "Data as a Service". Oh my! Here I always viewed a SERVICE (notice the “verb” feel to the word “Service”) as a logical unit of work; an encapsulation of process and data into a bundled logical unit of work (a little itty bitty OBJECT). My soap box on this topic (for the last 8 years) is data is, by its very essence, simply a by-product of Business Process execution. Without the execution of process, there really is no need for acquiring data and there will certainly be no creation of data. Isn’t the optimization of business process the Holy Grail to optimized business performance? Granted, given today’s technology, data IS our best available empirical means to objectively measure the performance of enterprise process. I do get it. Data is important. We're just missing the boat by doing lip service to process. The business doesn't do data. Business does process. And we wonder why the business looks at us as if we have a 3rd eye when we tell them they need to own Data Governance & Data Stewardship. MDM is tough to sell to the business (just like Data Quality has been tough to sell), wonder why that is? Maybe because we keep talking about data and the business keeps thinking "that's an IT thing or a data entry clerk thing!" We need more enterprise process focus in our teachings. We need more emphasis on the disciplines for driving out the integrated business process models and mapping data to those valid process model objects. I pine for the good old days when Sarson taught us the importance of structured analysis and design at Northern Illinois University and when Martin published the Information Engineering trilogy showing us the importance of an integrated business model (process & data interacting).

Sunday, June 8, 2008

Ever Wonder Why?

I've been doing this data warehouse thing for a few days now. I learned quite a bit during the process. Most of the more important learning came from the school of hard knocks. One of my favorite Data Warehouse Architecture lessons is around key management strategy. The great debate is; should we use Surrogate Keys or Natural Keys? Data Architects are an interesting breed. We will debate this topic with the kind of passion typically reserved for topics of significant social impact. Ever Wonder Why?

This is a simple decision. Ask the following questions to understand the characteristics of your environment.

Question: Can the source applications 100% guarantee there will never be a change in their key management practices? If you cannot get a resounding yes you DO NOT want to use the source application primary keys as your data warehouse architecture keys.
Question: Do you have only one source application that will be feeding data into the subject areas of your data warehouse? If you cannot get a resounding yes you DO NOT want to use the source application primary keys as your data warehouse architecture keys.
I get a lot of rhetoric around the value and promise of using "natural keys". I ask you, what is "natural" about the database management system keys used by any of your source applications? There is nothing natural about an order number that is generated from an Order Management system. It's a number, a code. Sometimes it is cooked up by an application developer as a means to try to ensure each order posted to the database will have a unique index address.
I also get heated opinions around assigning a surrogate key when the source application is already assigning surrogate keys. (Think Oracle Applications) If my concern is well behaved keys, well it's already a surrogate, why put a surrogate on top of a surrogate? Good point, but (behold the underlying truth) what should we do when the single instance of the source application simply cannot handle the workload of the entire enterprise and the source application team decides to clone the application for each major region? Same Code base, different physical instances each using Oracle Sequencers for IDs.
What about source data that is truly an event, once posted will never change, has a decent unique id and has extremely high volumes? Now this is a set of characteristics that make a case for a different PK management strategy reference pattern. I guess you can't ALWAYS say always or never.

Yes, Surrogate Key management requires some engineering. It is a valid and necessary component of your overall Data Warehouse Architecture. No, not in ALL cases will every table need to have a Surrogate key. There are exceptional reference patterns that will exist. Make sure you allow for that in your overall Data Warehouse Architecture.