Data warehouses that don't compromise™

 

   
 
 Value-Based Storage
 
   
 
Incremental Queries  
 
 Associative Queries  
 
 Data Mining Compared  










Home Technology Value-Based Storage

Value-Based Storage (VBS) Unleashes Fast Ad Hoc Query Performance

The core of illuminate's correlation database (CDBMS) is its unique value-based storage (VBS) structure. Unlike record- or column-oriented data storage structures, in the VBS model each record is deconstructed during loading to create a unique value, which is stored only once. The extremely compact and fast database automatically creates its own data-generated schema on the fly while loading, which indexes every correlation and provides context to the data values for analysis and reporting.

As a result of the VBS approach, a CDBMS is optimally structured for exceptional ad hoc query performance.

How VBS Works

The VBS structure in the iLuminate database engine consists of three physical sets of objects that are stored and managed:

  • a data dictionary (metadata);
  • an indexing and linking data set (additional metadata); and
  • the actual data values that comprise the stored information.

Unlike record-based structures (which store data in tables, conceptually similar to Excel files) or column-based structures (which store data in columns, achieving compression by automatically removing consecutive duplicate and null values), the VBS architecture stores each unique value only once. As with the column-oriented approach, separate indexing and linking information are stored.

In the VBS structure, unique values are stored together in value pools based on type: all integers in one pool, characters in another, etc. There is no need to build complex ordering and linking sets in the data store, as is necessary in column-oriented systems.

In addition to typical data values, the data value store contains a special type of data for storing correlations between tables. This functions similarly to foreign keys in an RDBMS, but with a CDBMS, the correlation is known by the dictionary and is stored as a data value. This makes navigation between logical tables completely automatic and provides extremely fast ad hoc query performance.

The data dictionary contains typical metadata plus additional statistical data about the tables, columns and occurrences of values in the logical schema. The index and linking storage contain all of the information required to reconstitute the stored unique values into columns, records, and tables. The dictionary, index and link storage in a CDBMS retain the link between the original records, regardless of any changes in data values.

With the VBS structure used in a correlation DBMS, there are no design decisions to be made, and evolving business requirements never force changes to the physical structure. All values are always stored in value pools and never need to be reorganized. In addition, since there is only one location to search when selecting records, all searches always gain the speed advantage of indexing.

Why VBS is Ideal for Data Warehouses and Data Marts

The VBS-based correlation database is a radically new data warehouse platform that delivers fast query performance without any design compromises. Query response time in a CDBMS is constant regardless of complexity (since there is no query optimizer to be overloaded) or database size. Because its data-generated schema includes all possible correlations, there is no need to restructure the database to accommodate new types of queries, and new data can be added without redesign.

Unique Ad Hoc Query Capabilities

VBS provides the flexibility to design and execute queries that would be very difficult—if not impossible—to model in SQL. Unlike RDBMS (whether software only or data warehouse appliances) and column-oriented databases, a CDBMS provides the ability to perform two unique types of queries. An Associative Query—a search for an unqualified value—is a simple and fast query that's not possible with other structures.

Another query type this structure enables is the Incremental Query—a series of progressively fine-grained searches wherein each new ad hoc query refines the results and is based on responses to previous queries. These queries are completely flexible and can drill down, drill up and drill across.

No More "Queries from Hell"

VBS eliminates the "query from hell" that can plague RDBMS data warehouses—those that cause full-table scans and consume all available resources, precluding access by all other users until processing is complete. This removes a major burden from the DBA, who is ultimately responsible for database and query performance. Consequently, even relatively unskilled users can create their own discovery queries without impacting other users.

VBS Advantages Over Record-Based and Column-Oriented Databases

Although record-based relational databases are (for the moment) the most common choice for data warehouse implementations, their structure is far from ideal. Record-base relational data warehouse systems are difficult to design; extremely inefficient in their use of disk space and I/O; challenging to maintain; and require designers to compromise between optimizing query performance and maximizing query flexibility.

Data warehouse appliances were developed as one means to address these shortcomings. Data warehouse appliances are relational databases (RDBMSs) delivered on custom configured hardware. They have the same strengths and weaknesses of the RDBMS, but the hardware/software bundle is optimized to deliver increased query speed at a lower total cost. Data warehouse appliances still need all of the planning, design, and management inherent with record-based relational databases, and query flexibility is still limited by SQL data access.

Column-oriented databases also demand upfront user requirements definition and database logical and physical design like RDBMS data warehouses. They also force their own compromise between optimizing for new record insertion versus data selection and retrieval. However, they use less disk space and are more efficient in their I/O demands than RDBMS data warehouses.

VBS overcomes the limitations of both alternatives, providing optimal query performance and flexibility with no performance/flexibility compromises. Using VBS, a correlation database is inherently compact and efficient, and new data can be added at any time without restructuring. The following table summarizes the differences between these three approaches to data storage.