Databases Reference
In-Depth Information
Public web page data —Publicly accessible pages are full of information that orga-
nizations can use to be more competitive. They contain news stories, RSS feeds,
new product information, product reviews, and blog postings. Not all of the
information is authentic. There are millions of pages of fake product reviews
created by competitors or third parties paid to disparage other sites. Finding
out which product reviews are valid is a topic for careful analysis.
Remote sensor data —Small, low-power sensors can now track almost any aspect of
our world. Devices installed on vehicles track location, speed, acceleration, and
fuel consumption, and tell your insurance company about your driving habits.
Road sensors can warn about traffic jams in real time and suggest alternate
routes. You can even track the moisture in your garden, lawn, and indoor plants
to suggest a watering plan for your home.
Event log data —Computer systems create logs of read-only events from web page
hits (also called clickstreams ), email messages sent, or login attempts. Each of
these events can help organizations understand who's using what resources and
when systems may not be performing according to specification. Event log data
can be fed into operational intelligence tools to send alerts to users when key
indicators fall out of acceptable ranges.
Mobile phone data —Every time users move to new locations, applications can
track these events. You can see when your friends are around you or when cus-
tomers walk through your retail store. Although there are privacy issues
involved in accessing this data, it's forming a new type of event stream that can
be used in innovative ways to give companies a competitive advantage.
Social media data —Social networks such as Twitter, Facebook, and LinkedIn pro-
vide a continuous real-time data feed that can be used to see relationships and
trends. Each site creates data feeds that you can use to look at trends in cus-
tomer mood or get feedback on your own as well as competitor products.
Game data —Games that run on PC s, video game consoles, and mobile devices
have back-end datasets that need to scale quickly. These games store and share
high scores for all users as well as game data for each player. Game site back
ends must be able to scale by orders of magnitude if viral marketing campaigns
catch on with their users.
Open linked data —In chapter 4 we looked at how organizations can publish pub-
lic datasets that can be ingested by your systems. Not only is this data large, but
it may require complex tools to reconcile, remove duplication, and find invalid
items.
When looking at these use cases, you see that some problems can be described as
independent parallel transforms since the output from one transform isn't used as an
input to another. This includes problems like image and signal processing. Their
focus is on the efficient and reliable data transformation at scale. These use cases
don't need the query or transactions support provided by many NoSQL systems. They
Search WWH ::




Custom Search