Papers & Presentations
|Home | About | Specification | Papers & Presentations | Events | FAQ|
|CWM: A Model-based
Architecture For Data Warehouse Interchange
By Doug Tolbert
A presentation given at WESAS 2000 in UC Irvine, Calif. May 8.
The diversity of operational data sources and target data warehouse engines has made the construction and maintenance of data warehouses challenging. Source and target data engines may differ not only semantically (i.e., their core data models) but also infrastructurally (i.e., the operational details of how data is extracted and imported).
The absence of common, sharable descriptions for the structure of both data sources and target data warehouse engines has meant that data warehousing tool vendors must address the interchange of data in a pairwise fashion (Figure 1).
The combinatorics of the situation aggravate the already substantial semantic and infrastructure problems faced by tools that must cross data engine boundaries and have limited deployment and use of data warehouses in organizations with diverse assortments of source and target data engines. Even where data warehouses have been successfully deployed, keeping them synchronized with operational data sources remains a labor-intensive, pairwise process.
Furthermore, the flipside of this situation i.e., support for "drilldown" discovery of the operational origin of warehouse data remains a pairwise metadata analysis problem.
Fortunately, recent support for model-based software architectures in multi-vendor industry organizations offers hope that the pairwise combinatorics of these data warehousing tool scenarios can be reduced to something closer to the "hub" configuration in Figure 1 a substantial savings even when the number of operational data sources and warehouse targets is as low as 3!
In such configurations, data warehouse deployment and maintenance tools interface with a shared store containing metadata about the structure of various operational data sources and target data warehouses as well as descriptions of the transformations required to move data between them. Suitably designed tools can then use the metadata and transformations to orchestrate the extraction of data from operational sources and its transformation into forms appropriate for import into target data warehouse stores. In addition, the metadata and transformations in the common model store can be traversed in the opposite direction (i.e., from target to source) to support drilldown discovery of the origins of warehouse data.
Besides reducing the number of semantic connections with which data warehousing tools must contend, such model-based architectures modernize data warehouse applications by making them more compliant with a "component/connector" architecture in which the tools are the components and the shared model store serves as the connector.
The Common Warehouse Metamodel (CWM) specification recently adopted by the Object Management Group (OMG) is an important milestone on the way to fully model-based architectures supporting data warehouse interchange. The CWM is based on OMG? Meta Object Facility (MOF) specification and employs OMG? XML Metadata Interchange (XMI) specification to interchange CWM-resident metadata between MOF-compliant repositories.
The CWM metamodel is described in OMG? Unified Modeling Language (UML) and can be thought of as an extension specializing UML for data warehousing applications. To emphasize the importance of multi-vendor interchange, the CWM was developed as a single submission by eight co-submitting vendor and user-community companies (including IBM, NCR, Oracle, Hyperion, UBS, and Unisys) and supported by seven other companies (including HP, Sun, Hitachi, and John Deere). The CWM specification is available from the OMG? web site at http://www.omg.org.
The CWM metamodel is organized into 18 packages arranged in four layers on a UML base (Figure 2). CWM breaks new architectural ground by defining its sub-metamodel as individual packages. Because CWM uses modeling techniques that minimize the number of dependencies between its packages, tool integrators can select only those metamodel services they need while avoiding problems common to large, monolithic metamodels. The CWM co-submitters believe that similar modeling architectures can be leveraged to reduce the complexity of other monolithic models (such as UML itself).
The four layers of the CWM collect together different sorts of metamodel packages:
UML is the modeling foundation on which CWM is built. Wherever possible, the CWM co-submitters have directly reused existing UML classes and associations rather than creating CWM-specific versions of them. This choice both reduces the number of new CWM classes and associations and leverages the existing skills of UML-knowledgeable modelers.
For example, the Object-Oriented package in the Resource layer is really just a reuse of existing UML classes that are already sufficient for describing object-oriented data sources (such as object-oriented DBMSs or application systems built in object-oriented languages like Java).
Although pains were taken to make the CWM sufficient to support many data warehouse interchange scenarios, the co-submitters realize to no general metamodel will provide all of the support necessary for the diversity of scenarios that will be encountered in active information processing installations.
To accommodate this fact, the CWM co-submitters have planned for extensions in two ways:
The formal definition of the CWM metamodel is contained in a set of MOF-DTD based XML streams, one for each of the CWM packages, and corresponding CORBA IDL and XMI DTD files generated directly from the CWM specification itself. These files are available on OMG? web site at http://www.omg.org.
Contact the Webmaster at email@example.com