Value-Based Storage (VBS) Unleashes Fast Ad Hoc Query Performance
The core of illuminate's correlation database (CDBMS) is its unique value-based
storage (VBS) structure. Unlike record- or column-oriented data storage structures,
in the VBS model each record is deconstructed during loading to create a unique
value, which is stored only once. The extremely compact and fast database automatically
creates its own data-generated schema on the fly while loading, which indexes every
correlation and provides context to the data values for analysis and reporting.
As a result of the VBS approach, a CDBMS is optimally structured for exceptional ad hoc query performance.
The VBS structure in the iLuminate database
engine consists of three physical sets of objects that are stored and managed:
- a data dictionary (metadata);
- an indexing and linking data set (additional metadata); and
- the actual data values that comprise the stored information.
Unlike record-based structures (which store data in tables, conceptually similar
to Excel files) or column-based structures (which store data in columns, achieving
compression by automatically removing consecutive duplicate and null values), the
VBS architecture stores each unique value only once. As with the column-oriented
approach, separate indexing and linking information are stored.
In the VBS structure, unique values are stored together in value pools based on
type: all integers in one pool, characters in another, etc. There is no need to
build complex ordering and linking sets in the data store, as is necessary in column-oriented
systems.
In addition to typical data values, the data value store contains a special type
of data for storing correlations between tables. This functions similarly to foreign
keys in an RDBMS, but with a CDBMS, the correlation is known by the dictionary and
is stored as a data value. This makes navigation between logical tables completely
automatic and provides extremely fast ad hoc query performance.
The data dictionary contains typical metadata plus additional statistical data about
the tables, columns and occurrences of values in the logical schema. The index and
linking storage contain all of the information required to reconstitute the stored
unique values into columns, records, and tables. The dictionary, index and link
storage in a CDBMS retain the link between the original records, regardless of any
changes in data values.
With the VBS structure used in a correlation DBMS, there are no design decisions to
be made, and evolving business requirements never force changes to the physical
structure. All values are always stored in value pools and never need to be reorganized.
In addition, since there is only one location to search when selecting records,
all searches always gain the speed advantage of indexing.
The VBS-based correlation database is a radically new data warehouse platform that
delivers fast query performance without any design compromises. Query response time in
a CDBMS is constant regardless of complexity (since there is no query optimizer
to be overloaded) or database size. Because its data-generated schema includes all
possible correlations, there is no need to restructure the database to accommodate
new types of queries, and new data can be added without redesign.
Unique Ad Hoc Query Capabilities
VBS provides the flexibility to design and execute queries that would be very difficult—if
not impossible—to model in SQL. Unlike RDBMS (whether software only or data
warehouse appliances) and column-oriented databases, a CDBMS provides the ability
to perform two unique types of queries. An
Associative Query—a search for an unqualified value—is a simple
and fast query that's not possible with other structures.
Another query type this structure enables is the
Incremental Query—a series of progressively fine-grained searches
wherein each new ad hoc query refines the results and is based on responses to previous
queries. These queries are completely flexible and can drill down, drill up and
drill across.
No More "Queries from Hell"
VBS eliminates the "query from hell" that can plague RDBMS data warehouses—those
that cause full-table scans and consume all available resources, precluding access
by all other users until processing is complete. This removes a major burden from
the DBA, who is ultimately responsible for database and query performance. Consequently, even
relatively unskilled users can create their own discovery queries without impacting
other users.
Although record-based relational databases are (for the moment) the most common
choice for data warehouse implementations, their structure is far from ideal. Record-base
relational data warehouse systems are difficult to design; extremely inefficient
in their use of disk space and I/O; challenging to maintain; and require designers
to compromise between optimizing query performance and maximizing query flexibility.
Data warehouse appliances were developed as one means to address these shortcomings.
Data warehouse appliances are relational databases (RDBMSs) delivered on custom
configured hardware. They have the same strengths and weaknesses of the RDBMS, but
the hardware/software bundle is optimized to deliver increased query speed at a
lower total cost. Data warehouse appliances still need all of the planning, design,
and management inherent with record-based relational databases, and query flexibility
is still limited by SQL data access.
Column-oriented databases also demand upfront user requirements definition and database
logical and physical design like RDBMS data warehouses. They also force their own
compromise between optimizing for new record insertion versus data selection and
retrieval. However, they use less disk space and are more efficient in their I/O
demands than RDBMS data warehouses.
VBS overcomes the limitations of both alternatives, providing optimal query performance
and flexibility with no performance/flexibility compromises. Using VBS, a correlation
database is inherently compact and efficient, and new data can be added at any time
without restructuring. The following table summarizes the differences between these
three approaches to data storage.
|