January 2007
Sarka Svoboda from PIC Solutions provides some insights

This article discusses the important factors that should be considered when building a data warehouse.

Building a data warehouse
In building a data warehouse there are four key stages that should be followed: planning, development, implementation and review. The design of a data warehouse is an iterative process and results are often achieved in short bursts with only a fraction of the overall project completed in any one effort. This is typically because the warehouse requirements change continually to meet dynamic business needs.

This iterative process does not mean the scope of the warehouse is limited, rather, the organisation should build a complete project plan but only focus on a small subject area in which to initially implement the data warehouse. The project plan will become the vision and methodology for the next several phases in the growth of the warehouse. The first iteration should be the longest to complete with further iterations taking shorter time periods.

Planning Requirements gathering
In order to build a data warehouse the design team must have a clear understanding as to the business objectives. This would include performance requirements derived from the strategies, goals, objectives and the direction of the business. The objectives can be determined by conducting a number of JRP (Joint Requirements Planning) sessions involving all key personnel ranging from accountants, to executives, to IT professionals. Conducting JRP sessions will provide a better understanding as to needs of various business units, rather than a description as to what is already available. The information obtained should be formalised in Business and IT Requirements documentation.

When interviewing, certain basics should be followed:
Questions must be relevant to the audience or business area being canvassed
The inter-group sharing of information and findings facilitates trust, in turn allowing the acquisition of additional information
The interview script prepared should allow for statistical sampling of useful data
All potential information sources should be investigated and followed up
Answers should be summarised and validity confirmed once the interviewing process is complete

Nothing beats expert advice
Having an experienced data warehouse consultant available to provide input and advice during the planning process can be extremely valuable. This is due to the fact that many organisations lack the specialised in-house data warehouse skills to guide them through the multitude of critical decisions to be made during the course of planning. The data warehouse consultant can also be used for the duration of the project and could provide help in a number of areas.

These include:
Architecture
Application/access methods
Data management
Operational organisation

Matching the business requirements to the data
By using the Business and IT Requirements documentation, the warehouse planners compare the needs of the business to the data currently available. Based on key requirements, the planning team then develops a prototype. This prototype model not only allows relevant stakeholders to get a 'feel' for what the data warehouse will deliver, it also provides valuable insight as to any potential data problems, as well as giving management a sense of what is to follow. Importantly, this process encourages 'buy-in' from key personnel as it involves them in the concept and illustrates the type of future results to be expected.
 
Designing a data warehouse model
Due to the inherent complexities and far reaching business implications, establishing the correct direction for the data warehouse is critical to longer-term success. Once all the requirements are known, it is possible to identify the technology and architectural design elements needed for the data warehouse to be workable.

Decisions need to be made regarding: 
The variety of analytical requirements
Database design and sizing of the data warehouse
Interface design for the extraction, transformation and data load process
System metrics for the number of users, volume and types of information requests
Data warehouse location
Sources of data, data integrity, business rules and data relationships
User access requirements

The technical architecture and database design information can be represented in a simple diagram. The benefit of a pictorial diagram is that it assists staff to better understanding how the pieces fit together, as well as explaining their role in the project and the intended warehouse applications. The data model reflects the real objects (customer, order, product, etc.) and their relationships to each other. In order to represent these correctly in the business data model, business rules must be applied. It is critical that the design information be stored in a central repository for future retrieval and maintenance.

Physical data modelling
A data model is produced so that the actual construction of the data warehouse can begin. The model is based on the information collected during the 'requirements gathering' stage, which has been translated, into a physical database design.

Development
Data transformation
Data transformation is the process of developing utilities that will allow the loading of data into the data warehouse. These utility programs will locate, extract, condition, scrub and load the data onto the target platform. It is crucial to include quality control procedures in order to guarantee and maintain data integrity. 
Client server application development

Business users must have the ability to interact with the data in a quick, flexible and easy-to-use manner. Client server application development defines the tools and programmed applications that access and transform the data into a 'user-friendly' format via graphical user interfaces. The user should be able to access both pre defined and ad hoc analysis of the data. User training manuals and documentation are produced at this stage.

Implementation
Design review and sign-off
Conducting a design review and sign-off ensures smooth implementation. A complete and comprehensive analysis should be conducted regarding the organisation's current environment, data, technology, support and infrastructure. The review and sign-off process is final confirmation that the business is in agreement with the warehouse. Sign-off also reinforces the 'buy-in' of key stakeholders and warehouse users.
 
Data warehouse administration
Warehouse administration should be managed according to set procedures.
 
Including:

Back up and data archiving
Restoration and data recovery
Emergency operational procedures
User administration
Security maintenance and auditing
Usage reporting
Error analysis
Service level goals
Monitoring tools
Creation of new users and databases
Security controlling through various user levels and passwords
Verification that jobs run according to schedule
Tracking of service level agreements. For example, system 'up time'
Monitoring utilisation of space on the databases

Review

Once the data warehouse has been built there will be an on-going review process to ensure that the system continues to run smoothly and provide valuable data. This process would include the addition of new applications and tools to access and interpret warehoused data.

Integration

Integration procedures are needed for the installation and running of new applications or tools to be used with the data warehouse. If the new application requires additional processing, it is likely that new jobs will have to be integrated into the overnight routines. There may also be scheduling issues where the application requires certain processing data at particular times. System hardware may need to be evaluated to ensure that there is sufficient system resource to run the new jobs.

Post implementation review

Post implementation review takes place to asses whether or not the requirements originally developed during in the 'information gathering' stage are being successfully answered by the data warehouse. If the requirements are not being met further development may take place.

Data warehouse tuning

When performance problems are experienced a detailed analysis must be conducted to identify the source of the problem. The analysis should include a review of the networks, applications, users, structure and utilisation of the database.

As an example, a single query run by one set of users that takes hours to run may create overall performance problems for all users. The data warehouse consultant can be asked to review the queries and make suggestions for improved performance. Balancing the workload of the data warehouse server is one option, achieved by changing the table or query structure, or through scheduling large jobs to run over quiet periods.
 
Data warehouse audit
The data warehouse is audited by checking the current state of the data warehouse against best practice and recommending changes in order to maximise the business value of the warehouse. The audit would also identify opportunities for improvement.

Summary
There are many different approaches to building a successful data warehouse. Using an iterative approach minimises the risk of failure as it allows learning from past mistakes. Unlike the traditional 'waterfall' development approach with set start and end dates, the iterative process does not require the associated large investments of time and money. The iterative approach concentrates on small deliverables, making changes in direction easier to achieve whilst also ensuring that the basic warehouse design is appropriate. Future iterations should also be conducted in such a way that they continue to work towards a total warehouse solution for the organization.

© Banker Middle East 2007