Neo4j in production - Neo4j in Action

Database Reference

In-Depth Information

* Block size can be configured with the string_block_size and array_block_size parameters. Default block size is

120 bytes with 8 bytes for overhead.

The main store files have a fixed or uniform record size (14 bytes for nodes, 33 bytes for

relationships, and so on). Besides playing an important part in enabling fast lookups and

traversals, the fixed length makes calculations about how much space and memory to al-

locate for your graph a little easier to reason about and plan for, as mentioned in section

11.1.2 .

Thenodeandrelationshipstorefilessimplystorepointerstoothernodes,relationships,and

property records, and thus fit neatly into fixed record sizes. Properties, on the other hand,

areslightlyhardertodealwithbecausetheactualdatathattheyrepresentcanbeofvariable

length. Strings and arrays, in particular, will have variable length data in them, and for this

reason they're treated specially by Neo4j, which stores this dynamic type of data in one

or more string or array property blocks. For more information, refer to the Neo4j Manual

( http://docs.neo4j.org/chunked/stable/configuration-caches.html ). The details are scattered

throughout subsections of the manual, currently sections 22.6, 22.9, and 22.10.

How do fixed-length records improve performance?

Theuseoffixed-lengthrecordsmeansthatlookupsbasedonnodeorrelationshipIDsdon't

require any searching through the store file itself. Rather, given a node or relationship ID,

the starting point for where the data is stored within the file can be computed directly. All

node records within the node store are 14 bytes in length (at the time of writing). IDs for

nodes and relationships are numerical and are directly correlated to their location within a

store file. Node ID 1 will be the first record in the node store file, and the node with ID

1000 will be the thousandth.

If you wanted to look up the data associated with node ID 1000, you'd be able to calculate

that this data would start 14000 bytes into the node store file (14 bytes for each record x

node ID 1000). The complexity involved in computing the starting location for the data

is much less ( O(1) in big O notation) than having to perform a search, which in a typical

implementation could cost O(log n) . If big O notation scares you, fear not; all you really

need to understand here is that it's generally much faster to compute a start point than it

is to search for it. When a lot of data is involved, this can often translate into significant

performance gains.

Search WWH ::

Custom Search

Home