OLAP Techniques for Approximation and Mining Query Answering

201 OLAP Techniques for Approximation and Mining Query Answering Murtadaha M. Hamd Waleed K. Hassan College of Computer University of Anbar Received on: 20/10/2010 Accepted on: 10/11/2010 ABSTRACT  Data warehouse (DW) systems have become a key component of the corporate information system architecture, in which they play a crucial role in building business decision support systems by collecting and consolidating data from a variety of internal and external sources. The content of a DW is analyzed by the On-Line Analytical Processing (OLAP) applications for the purpose of discovering trends, patterns of behavior, and anomalies as well as finding hidden dependencies between data . Through this research, theoretical concepts which are related with OLAP and Data Warehouse are studied in detail. In this work the SALES Data Warehouse (SALESDW) , procedures are implemented and algorithms are developed like ODBC (Open DataBase Connectivity) for different sources, and Data cleansing to carry out the consistency of Data Warehouse (DW). The aim of the research are design a prototype of a SALES Data Warehouse (SALESDW) by adding and implementing the essential concepts , and implementing the OLAP techniques ( tools) on SALESDW.


Introduction:
OLAP uses a Snapshot of a database taken at one point in time and then puts the data into a dimensional model. The purpose of this model is to run queries that deal with aggregations of data rather than individual transactions. [1] Data warehousing and On-Line Analytical Processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commercial products and services are now available, and all of the principal database management system vendors now have offerings in these areas. Decision support places some rather different requirements on database technology compared to traditional On-Line Transaction Processing (OLTP) applications. [2] OLAP tools allow users to make ad hoc queries or generate canned queries against the warehouse database. The OLAP category has been divided further into the multidimensional OLAP (MOLAP) and relational OLAP (ROLAP) markets. [3] 2 .Literature Survey: ➢ Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao. Proposed in 2001, "Efficient OLAP Operations in Spatial Data Warehouses ". They deal with the problem of providing OLAP operations in spatial data warehouses. Such warehouses should support spatial dimensions, i.e. allow the user to execute aggregation queries in groups, based on the position of objects in space. Although there exists well-known pre-aggregation techniques for non-spatial warehouses, which aim to speed up such queries. They presented an example of a traffic supervision system; other applications include decision support systems for cellular networks, weather forecasting, etc. [4] ➢ Yufei Tao, Xiaokui Xiao proposed in 2008 "Efficient Temporal Counting with Bounded Error ". Temporal aggregation is an important operator for two reasons. First, aggregates are the direct target of analysis in a large number of applications of temporal databases. Second, the numbers of objects qualifying various range predicates are essential inputs to many sophisticated data mining tasks, such as association rule mining, decision tree learning, etc. motivated by the fact that precise aggregation demands expensive space or query overhead. They propose a novel technique for efficiently computing approximate results with good quality guarantees. [5]

Data Warehouse:
A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. In addition to a relational database, a data warehouse environment can include an Extraction, Transportation, Transformation, and Loading (ETL) solution, Online Analytical Processing (OLAP) and Data Mining capabilities, Client Analysis Tools , and other applications that manage the process of gathering data and delivering it to business users . A common way of introducing data warehousing is to refer to the characteristics of a data warehouse as set forth by William Inmon: [6]  Subject Oriented. [6] Data warehouses are designed to help you analyze data. For example, to learn more about your company's sales data, you can build a data warehouse that concentrates on sales. Using this data warehouse, you can answer questions such as "Who was our best customer for this item last year?" .  Integrated. [6] Integration is the most important. Data is fed from multiple disparate sources into the data warehouse. As the data is fed , it is converted, reformatted, resequenced, summarized, and so forth. The result is that data once it resides in the data warehouse has a single physical corporate image . Figure -1 illustrates the integration that occurs when data passes from the application-oriented operational environment to the data warehouse.

Figure -1The issue of integration
 Nonvolatile. [6] Nonvolatile means that, once entered into the data warehouse, data should not change. This is logical because the purpose of a data warehouse is to enable you to analyze what has occurred.  Time Variant . [6] A data warehouse's focus on change over time is what is meant by the term time variant. In order to discover trends in business, analysts need large amounts of data.

4.OLAP :
Data warehouses and OLAP are necessary elements of Decision Support Systems (DSSs). They enable business decision makers to creatively approach, analyze and understand business problems. While data warehouses are built to store very large amounts of integrated data used to assist the decision-making process, the concept of OLAP, which is first formulated in 1993 by [7] to enable business decision makers to work with data warehouses, supports dynamic synthesis, analysis, and consolidation of large volumes of multidimensional data. Two of the most important ways to pursue high performance and usability of Data Cube computation are: [8] a) Speeding Up , b) Reducing Storage Space.

OLAP Guidelines: [1, 10]
Multidimensionality is at the core of a number of OLAP systems (database and front-end tools) available today. Dr. E. F. Codd, the "father" of the relational model, has formulated a list of guidelines and requirements as the basis for selecting OLAP systems: 1-Basic Features (Multidimensional Conceptual View, Intuitive Data Manipulation , Accessibility , Batch Extraction versus Interpretive Extraction , OLAP Analysis Models , Client / server architecture , Client / server architecture , Transparency , Multi-user support ) .

2-
Special Features (Treatment of Nonnormalized Data , Store OLAP Results ,

ROLAP (Relational OLAP):
This type uses relational databases (RDMS) to store the data, sometimes by using a star schema or snowflake schema. ROLAP tools present sophisticated SQL and navigational methods on top of traditional relational databases. Relational OLAP tools present scalable, manageable technologies for very large data. ROLAP databases can easily handle dimensions with high cardinality. [9] Figure -2 shows the architecture of the ROLAP model. What you see is three-tier architecture.  It is limited by SQL functionalities because ROLAP technology mainly relies on generating SQL statements to query the relational database, and SQL statements do not fit all needs .

MOLAP (Multidimensional OLAP):
In the MOLAP, data is extracted from the data warehouse and aggregated into a data structure, commonly referred to as a cube, for analysis. [11]Uses a specialized data store with preaggregated summaries to store the data. The MOLAP data store is built specifically to handle multidimensional queries and offers fast, efficient, and manageable access to multidimensional data. [9] Figure -

HOLAP (Hybrid OLAP):
It bridges the technology gap between ROLAP and MOLAP, enabling you to use both multidimensional data stores (MDDB) and RDBMS data stores. [9] Proposed SALESDW Design

Essential files structures for the (SALESDW):
First of all, the tables must be created according to the proposed system (i.e. SALES Data Warehouse) (SALESDW) which is used for storing the system information. Table  -1 explains the essential files structure for the SALESDW Data Warehouse.

System Design:
To start the system , go to start > all programs > oracle developer 6.0 > Form builder as in figure 4 :

Figure 4
The project will be loaded to the form builder application, to run the project , we should connect to oracle Database as following (see figure 5):

Figure 5
Then the project is connected to oracle Database and the main interface is comes up and is ready to execute queries, as showed in figure 6   The most important two Schemas (Star Schema and Snowflake Schema in addition to Fact Constellation) as theoretical, these Schemas are implemented practically on SALESDW to determine the response time for the all Queries as shown in table-2, in addition to the structure for all Queries using SQL in Visual Basic 6.0. Figures 10 show the implementation of snowflake schema with some of cods the implemented using SQL in Visual Basic 6.0 and with number of records (250000, 500000, and 1000000).  We obtained the following notices: 1. Snowflake schema will have opportunity more than star schema when we deal with a huge amount of records number. 2. Whenever the number of records decreases, this makes us convert to use the star schema.

Conclusions:
OLAP in the current systems allow analysts to improve the decision-making process by consulting and analyzing aggregated data. Data Cube computation, which has to handle massive data, is crucial for high performance in OLAP applications. Client / server architecture grant and allow the users to share data easily and to be able to use any front-end tool. The utility of representing data in the multidimensional space is that it is more natural to view certain features of the data in this way. The efficiency of OLAP in the current work is to conduct data analysis easily and rapidly has been recognized. OLAP allows for real-time access to pre-aggregated measures along important business dimensions. Applying the mining or approximating approach supports the OLAP tools with future predicable information .

Recommendations for Future Works:
Expanding the current idea of OLAP to include real and large companies of Sales. Dealing with the statistical analysis aspect more broadly to include the concept of prediction and expert or intelligent system, which increases the efficiency of the OLAP tools used in inducing information or knowledge that leads to making suitable decisions. Using the distributed DW concepts for this work to improve the OLAP efficiency.