ECOS: Evolutionary Column-Oriented Storage - Advances in Databases

Database Reference

In-Depth Information

ECOS is autonomic, and it exploits evolution path to automatically evolve the

storage structures, i.e., our approach for self-tuning is online.

Consider the L ORDERKEY column of the LINEITEM table as shown in

Table 1. Suppose, as a database designer, we design this table. According to our

application design, we select the L ORDERKEY column as a part of the primary

key. As we already discussed in Section 3, we have to customize each column

as either ordered read-optimized or unordered write-optimized. Therefore, we

customize the L ORDERKEY column as ordered read-optimized. At the initial

design time, we design according to the domain knowledge, our experiences, and

predictions. As a designer, it is dicult to guarantee, how much this column

grows, and how long it takes to reach that size. When we customize the column

as ordered read-optimized, it is internally initialized as a sorted array. Now for

the L ORDERKEY column, three initial rows of the sample evolution path of

Table7arerelevant.

As we mentioned in Section 3, ECOS limits the storage capacity for each

storage structure. Therefore, the initial sorted array has a certain data storage

capacity limit. For example, consider it as 4KB. As long as data is within the 4KB

limits, sorted array is the storage structure for the L ORDERKEY column, and

we gather the heredity information for the column, such as the number of Get(),

the number of Put(), the number of Delete(), the number of range Get() (for

range queries), the number of Get() for all records (for scan queries), etc. What

heredity information should be gathered may vary from one implementation to

another. Here, we simplify our discussion by assuming that a system can identify

using heredity information that the workload is either read-intensive or write-

intensive and the access to data is either ordered (range queries) or unordered

(point or scan queries).

The moment the storage limit of the sorted array is consumed, an event is

raised for notification. This event triggers all three initial mutation rules of Ta-

ble 7. Now heredity based selection identifies, which one of them to execute. We

suppose that for the L ORDERKEY column, the workload is read-intensive and

the data access is unordered, this scenario executes the first mutation rule of

Table 7, which evolves the existing sorted array into a sorted list. Now sorted

list is the new storage structure, and it is also constrained with the storage limit

according to the design principle of ECOS. As long as the L ORDERKEY col-

umn data is within the storage limit of the sorted list, heredity information is

gathered, and it is used for the next evolution.

It is observed from Table 1 that only half of the LINEITEM columns, i.e.,

eight out of sixteen with high data growth evolve during the first evolution. The

rest of the columns can be stored within an array (either heap array or sorted

array). Furthermore, only half of the columns with high data growth, i.e., four

out of eight, which are evolved during the first evolution evolve again during the

second evolution (i.e., L ORDERKEY, L COMMENT, L EXTENDEDPRICE,

and L PARTKEY). The final state of the table presented in Table 1 shows that

each column is using the appropriate storage structure (we assume for expla-

nation) according to the stored data size and observed workload. We can add

more parameters for evolution decisions, but we only used limited parameters

Advances in Databases

Search WWH ::

Custom Search

Home