Custom DataStage Operator for 2 temporal Dimensions
Add-On for IBM DataStage to automatically create 2 temporal dimensions on data streams
What is "Historisation"?
While systems with 2 temporal dimensions are still uncommon, the trend is towards system architects designing applications, databases and data warehouses with 2 time dimensions. Although bi-temporal systems are straightforward in theory, implementation and subsequent maintenance can be awkward at best and ensuring data consistency a major factor.
A single time dimension is often used in databases and data warehouses. This is usually implemented with two columns representing the start date and the end date; i.e. a customer address was valid from 2001-01-01 to 2004-04-12 and then a new address was valid from 2004-04-13 until 2008-01-01 and so on. This one-dimensional model, also referred to in data warehousing as a “Slowly changing dimension, type II” is simple to understand and, in most cases, offer sufficient information on the single time axis. This date range represents what the values are for any point in time.
By adding one more time dimension, it is possible to also determine when these values were changed or modified in the database. This is now bi-temporal or 2-dimensional. Given the address change above there is no information as when that change was made, if it weren’t entered into the system until 2012-01-01 then any reports run in 2011 that used the address in any way would suddenly report a different value for past events than the same report run in 2012. At the heart of 2-dimensional data storage lies the ability to ensure that every database query returns the same result regardless of when it is made.
By adding 2 temporal dimensions the state of a database can be recreated for any point in time. Any state of a record can be retrieved using a SELECT that contains 2 points in time:
– The technical point in time, i.e. a “snapshot” of the physical database contents.
– The effective point in time for the data that is to be queried.
While the query side is straightforward, and the database only needs some additional columns; ensuring that changes are historised correctly involve several computational steps and this site offers a DataStage operator called the “Hist-Op” which allows a flexible implementation of a 2-dimensional model to be efficiently used with DataStage.