Database Reference
In-Depth Information
We are talking about four discrete properties of data that require special tools, pro-
cesses, and procedures to handle:
• Increased volumes (to the degree of petabytes, and so on)
• Increased availability/accessibility of data (more real time)
• Increased formats (different types of data)
• Increased messiness (noisy)
There is a paradigm shift seen as we now have technology to bring this all together
and analyze it.
Multi-structured data
In this section, we will discuss various data formats in the context of Big Data. Data
is categorized into three main data formats/types:
Structured : Typically, data stored in a relational database can be categor-
ized as structured data. Data that is represented in a strict format is called
structured data. Structured data is organized in semantic chunks called en-
tities. These entities are grouped and relations can be defined. Each entity
has fixed features called attributes. These attributes have a fixed data type,
pre-defined length, constraints, default value definitions, and so on. One im-
portant characteristic of structured data is that all entities of the same group
have the same attributes, format, length, and follow the same order. Rela-
tional database management systems can hold this kind of data.
Semi-structured : For some applications, data is collected in an ad-hoc man-
ner and how this data would be stored or processed is unknown at that
stage. Though the data has a structure, it sometimes doesn't comply with a
structure that the application is expecting it to be in. Here, different entities
can have different structures with no pre-defined structure. This kind of data
is defined to be semi-structured. For example, scientific data, bibliographic
data, and so on. Graph data structures can hold this kind of data. Some char-
acteristics of semi-structured data are listed as follows:
Search WWH ::

Custom Search