Database Reference
In-Depth Information
* Block size can be configured with the string_block_size and array_block_size parameters. Default block size is
120 bytes with 8 bytes for overhead.
The main store files have a fixed or uniform record size (14 bytes for nodes, 33 bytes for
relationships, and so on). Besides playing an important part in enabling fast lookups and
traversals, the fixed length makes calculations about how much space and memory to al-
locate for your graph a little easier to reason about and plan for, as mentioned in section
11.1.2 .
Thenodeandrelationshipstorefilessimplystorepointerstoothernodes,relationships,and
property records, and thus fit neatly into fixed record sizes. Properties, on the other hand,
areslightlyhardertodealwithbecausetheactualdatathattheyrepresentcanbeofvariable
length. Strings and arrays, in particular, will have variable length data in them, and for this
reason they're treated specially by Neo4j, which stores this dynamic type of data in one
or more string or array property blocks. For more information, refer to the Neo4j Manual
( http://docs.neo4j.org/chunked/stable/configuration-caches.html ). The details are scattered
throughout subsections of the manual, currently sections 22.6, 22.9, and 22.10.
How do fixed-length records improve performance?
Theuseoffixed-lengthrecordsmeansthatlookupsbasedonnodeorrelationshipIDsdon't
require any searching through the store file itself. Rather, given a node or relationship ID,
the starting point for where the data is stored within the file can be computed directly. All
node records within the node store are 14 bytes in length (at the time of writing). IDs for
nodes and relationships are numerical and are directly correlated to their location within a
store file. Node ID 1 will be the first record in the node store file, and the node with ID
1000 will be the thousandth.
If you wanted to look up the data associated with node ID 1000, you'd be able to calculate
that this data would start 14000 bytes into the node store file (14 bytes for each record x
node ID 1000). The complexity involved in computing the starting location for the data
is much less ( O(1) in big O notation) than having to perform a search, which in a typical
implementation could cost O(log n) . If big O notation scares you, fear not; all you really
need to understand here is that it's generally much faster to compute a start point than it
is to search for it. When a lot of data is involved, this can often translate into significant
performance gains.
 
Search WWH ::




Custom Search