This project will apply Unified Modeling Language for the visual definition of data transformation rules for directing the execution of data migration from one or more source information repositories to a target information repository and will result in a UML profile optimized for defining data transformation and migration among repositories. I believe that a visual approach to specifying and maintaining the rules of data movement between the source and target repositories will decrease the time required to define these rules, enable less technical individuals to adopt, and provide a motivation to reuse these models to accelerate future migration and consolidation efforts.

Problem Statement and Background

My role in this project includes project planning and task management, primary researcher and developer of the deliverables of the project. My technical background includes being a certified OOAD designer in Unified Modeling Language by IBM and a software engineer for nearly two decades. I recently have been involved in the migration of several custom knowledge data repositories to an installation of IBM Rational Asset Manager.

This project will use a constructive ontology and epistemology to create a new solution in the problem space of the project. This is the most appropriate research ontology and epistemology because there is little precedence available in the exactly this area of research. Visually modeling program specifications have been studied in other problem domains and continue to be an area of interest. This particular problem space is unique, relatively untouched, and in an area of considerable interest to me. A possible constraint of the project includes shortcomings of the UML metamodel rules to allow the extension and definition of an effective rules-based data transformation and migration language. A second constraint of the project may be identification of one or more source repositories as candidates for moving to a new system. For the second constraint, one or more simulated repositories may need to be created.

This study is relevant to software engineering practitioners, information technology professionals, database administrators and enterprise architects who wish to consolidate data repositories to a single instance. Unified Modeling Language (UML) is primarily used today in information technology to visually specify requirements, architectures and designs of systems, to verify and create test scenarios, and to perform code generation. The UML metamodel was designed to make the language extensible, with the ability to support profiles that allow the language to be customized to support specific problem domains. Researchers and practitioners are finding innovative uses for UML as a visual specification language. Zulkernine, Graves, Umair and Khan (2007) recently published their results in using UML to visually specify rules for a network intrusion detection system. Devos and Steegmans (2005) also published their results in using Unified Modeling Language in tandem with Object Constraint Language to specify business process rules with validation and error checking.

This project will contribute to at least two fields of information technology: visual modeling languages, and information consolidation and management. This project will make a unique contribution to the subject area of domain-specific visual languages for the definition of rules. Additionally, a successful outcome from this project will contribute to knowledge in the area of lowering complexity of consolidating repositories to save operations costs and increase modernization of data access systems. An opposing approach to this project would be a federated solution to data consolidation. A federated solution would continue to maintain multiple data repositories and connect their operations via programming interfaces so that clients could access them and combine their data to create the appearance of a unified source.

The project area of focus was motivated by my desire to create a visual system for complete migration of a source repository of technical data, such as a technical support knowledge base, to a new product called Rational Asset Manager. My overall goal was to drive the entire migration visually using a single model specification. This specification would visually specify the rules in migrating and transforming data from one system to another as well as visually select the technical mechanisms used to communicate with each information repository, such as SQL databases, web services, XML translation, etc. In addition, I wanted to generate some executable code from the models that would carry out some or all of the movement of data between repositories. In scaling this broad problem area down, I decided to focus on using the model as a specification that would be read by an existing program to carry out the instructions in the model. This program already exists, but does not yet know how to read models. Finally, in focusing on a specific part of the visual specification, I decided to focus on an aspect of the model that locates data from one system, potentially re-maps it or transforms it, and places it into the target system. The final initial research focus would take the form of a UML profile that could be used to specify this aspect of the solution and extend the existing migration program to use the model to perform its work.

Project Approach and Methodology

This project will use a design science methodology to iteratively create, test, and refine the deliverables of the project’s outcome. The design science methodology defines five process steps in achieving the outcome of a research project: awareness of problem, suggestion, development, evaluation, and conclusion.

This project is currently at the awareness of the problem phase. The inputs to this phase have been my experiences in working within the problem space for the last several years and the secondary research into the problem area performed thus far. I have encountered shortcomings in automation to help accelerate solutions in this problem space. At the same time, I have observed closely related problems overcome using visual and declarative technologies. Additional secondary research is being conducted to understand the body of knowledge associated with this area of visual modeling. The output at this phase is this proposal for a project to develop a visual language to help accelerate solutions in this problem space. Significant elements of the proposal include the overall vision of the project, the risks of the project, tools and resources required to carry out the project, and the initial schedule to complete the project.

Following an accepted proposal, the next phase of this methodology is the suggestion phase, which involves a detailed analysis and design of the proposed solution. During the suggestion phase, several project artifacts will be created and updated with new information. Updated artifacts include the project risks and a refined schedule for completion of the project. New artifacts produced at this phase include early UML and migration tool prototypes to explore various technical alternatives, detailed test and validation plans, and most importantly the design plans for the following phase of the project. A significant activity performed at this phase is the acquisition and readiness of the project resources, such as physical labs, input test data from candidate repositories, access to networked systems to acquire the test data, and installation of hardware and software tools.

The development phase of the project uses the design plans established in the suggestion phase to focus on construction of the first iteration of the solution. Experiences during this phase also drive refinements to the project schedule, detailed test and validation plans, risks, and the design plan of the solution. The deliverable of this phase is the first generation of the UML profile and extensions to the existing migration tool to support parsing and using models created with the profile. The test specification models are used to move a larger portion of the candidate source repositories to the target repository. After conclusion of this phase, the project may return to an earlier phase to refine plans or project scope based on what is learned during the development of the solution. If acceptable progress is demonstrated at the conclusion of this phase, the project will continue to the evaluation phase.

The evaluation phase focuses most of its effort on formal testing and validation of the solution produced in the development phase. The evaluation of the work against the thesis includes working with specific individuals to determine if this is indeed an approach that will save time and simplify the specification of data migration and transformation rules. Documentation of the testing outcome and comparison to the anticipated outcome may cause the project to return to an earlier phase to adjust scope or expectations. If it is decided the project has met its goals, or the goals are not achievable by the project’s approach, the effort will conclude.

The conclusion of this project will involve final documentation of the outcome and packaging of all the project’s artifacts for future research studies. The project’s artifact package will be placed in the public location for others to review and use.

As mentioned above, this project will require several physical resources and cooperation from technical experts. The study will require access to two or more legacy data repositories as sources for information. The source repositories should ideally utilize different underlying database technologies and implement different information schemas to test variations of the proposed modeling language as it is developed and tested. Access to the technical administrators of the source repositories will be necessary to understand the repositories’ schema and obtain read-only access or a copy of their information. It would be preferred that the repositories be accessed read-only and utilized via a network, or the information is relocated to a computing system directly available to the research project. The study will require at least one server system running IBM’s Rational Asset Manager. This system will act as the target data repository. Data transformed from the source repositories will migrate into Rational Asset Manager, driven by a migration application that uses the visual specifications as direction. The study will also require a single workstation with IBM Rational Software Architect for development of the visual modeling language and extension of the existing migration programs to read the visual models and perform the migration work from the source to target repositories.

A requirement of the project’s determination of success is the need to measure the savings in the time to build a migration solution with and without visual specifications. The migration problems need to be varied as well, from simple one-to-one mappings from a single source repository to a single target repository, to more exotic migration scenarios, such as consolidating multiple source repositories to a single target repository and re-mapping values from the source to the target. Additionally, the reusability of previous solutions will be measured as well. This aspect of the project’s outcome will quantify how easily a specification model can be reused from a previous solution.

Definition of the End Product of Project

This project will produce several artifacts during the project’s life and at conclusion. Most importantly, a UML profile will be developed that can be imported into Rational Software Architect or Rational Software Modeler. The profile will include usage documentation and example models that demonstrate various types of rules that may be specified in a visual model and how that model is read and executed by the migration program. The migration program will be a reference-implementation of an existing tool program that can read the model configured with the UML profile and generates events for extension points on which to act.

In addition to technical deliverables, all project planning and process artifacts, such as the project plan, design plan, risks and mitigation notes, test criteria and test result data will be made available. The project will conclude with the development of at least one article or paper for submission to a research journal to document this project’s challenges and achievements, and an annotated bibliography of secondary research related to the project will be provided.

If successful, this project will contribute to simplifying part of the process of developing a migration solution without having to recreate the existing tool used today. The project will add a new component to the migration tool and consumers of the tool can choose to use this new component. An assumption made in this research project is that the UML profile developed as a deliverable will be an approachable alternative for less experienced IT professionals and software engineers. This will be a challenge for the project’s results.

References

Devos, F., Steegmans, E. (2005). Specifying business rules in object-oriented analysis. Softw Syst Model (2005) 4: 297–309 / Digital Object Identifier (DOI) 10.1007/s10270-004-0064-z.

Zulkernine, M., Graves, M., Umair, M., Khan, A. (2007). Integrating software specifications into intrusion detection. Int. J. Inf. Secur. (2007) 6:345–357. DOI 10.1007/s10207-007-0023-0.