Consider the revenue data with an additional dimension namely property type. In this case, the data represents the total revenue generated by the sale of each type of property, by city, and by time (quarters). Again, this data can fit into a four-filed table. However, the data fits more naturally into a three dimensional cube. The cube represents data as cells in an array by associating the total revenue with the dimensions property type, city and time.
OLAP database servers use multi-dimensional structures to store data and relationships between data. Multi-dimensional structures are best visualised as cubes of data and cubes within cubes of data. Each side of a cube is a dimension. Multi-dimensional are a compact and easy-to-understand way of visualising and manipulating data elements than have many inter-relationships. (d) The term slicing and dicing, or pivoting, refers to the ability to look at data from different viewpoints. Give an application of this in terms of the above data.
(New version: Discuss how could be represented in a multi dimension format?) For example, one slice of the revenue data may display all revenue generated per type of property within cities. Another slice may display all revenue generated by branch office within each city. Slicing and dicing is often performed along a time axis in order to analyse trends and find patterns. (e) Explain the relationship of the OLAP cube and the structure of equivalent star schema.
Question 2 (a) Bill Inmon, has defined a data warehouse as: “A subject oriented, integrated time variant and non-volatile collection of data in support of management’s decision making process.” (Case study for an insurance company) decide to build a data warehouse. Discuss the various aspects of this definition in relation to the data held within insurance company. Nowadays, organisations are focusing on ways to use operational data to support decision -making, as a means of regaining competitive advantage. Organisations need to turn their archives of data into a source of knowledge, so that a single integrated / consolidated view of the organisation’s data is presented to the user. A data warehouse was deemed the solution to meet the requirements of a system capable of supporting decision-making, receiving data from multiple operational data sources.
In this definition by Inmon (1993), the data is: Subject Oriented Warehouse is organized around major subjects of the enterprise (e.g. customers, products, sales) rather than major application areas (e.g. customer invoicing, stock control, product sales). This is reflected in the need to store decision-support data rather than application-oriented data. Integrated The data warehouse integrates corporate application-oriented data from different source systems, which often includes data that is inconsistent. The integrated data source must be made consistent to present a unified view of the data to the users.
Time Variant Data in the warehouse is only accurate and valid at some point in time or over some time interval. Time-variance is also shown in the extended time that data is held, the implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots. Non Volatile Data in the warehouse is not updated in real-time but is refreshed from operational systems on a regular basis. New data is always added as a supplement to the database, rather than a replacement. The database continually absorbs this new data, incrementally integrating it with the previous data.
The ultimate goal of data warehousing is to integrate enterprise-wide corporate data into a single repository from which users can easily run queries, produce reports, and perform analysis. In summary, data warehouse is data management and data analysis technology. (b) The role of the load manager Load Manager The load manager (also called the fronted component) performs all the operations associated with the extraction and loading of data into the warehouse. The data may be extracted directly from the data sources or more commonly from the operational data store. The operations performed by the load manager may include simple transformation of the data for entry into the warehouse.
Size and complexity will vary between data warehouses and may be constructed using a combination of vendor data loading tools and custom-built programs. (c) Small table for phone calls. Describe calculations, e.g. the duration of them. It is the job of the load manager to ensure that all data is of the same type, in order to maintain integrity within the data warehouse. It may be that different operational sources hold the same data in different formats, as in the examples above. This data needs to be standardised.
(d) List of the total duration cost.Calculations and transformations It may be necessary to transform data or carry out calculations on the data in order to store it in a particular way in the data warehouse. For example a mobile phone company may store data about its customers and the calls they make in the following format: However rather than having the start and end time, they may hold the data as duration of call, therefore a calculation will have to be carried out, using the start time and end time. i.e. (end time – start time=duration). The data may also be summarised in different ways, for example all calls made on a certain day may be summarised into a daily summary, or all calls made by a particular caller ID. In this way the data can be represented in different ways depending on the need of the decision maker.