Databases Reference
In-Depth Information
tabular - Much of the conventional data published on the web or used to drive web-based
interfaces are in traditional SQL databases. Furthermore, even new forms of web data are often
tabular: for example, Facebook offers a data API to developers which is basically tables, HTML5
and Google Gears allow web applications to store local information in SQL, and Freebase 5
allows
end-user content in the form of tables.
text - Of course, when we think of the web, the primary image is HTML and pages of text
(and pictures). In addition, blogs and microblogs such as Twitter or Facebook status updates are
primarily or solely text-based.
rich media - Many web 2.0 sites, such as Flickr and YouTube, contain user generated media
as well as pictures and various media.
links - The essence of the web is its links. Importantly these add structure to the web, which
can be used to simply surf or as a source of data to be mined - the source of Google's success.
tags - Whereas older portals used hierarchical structures, end-user sites such as del.icio.us
and Flickr are based on tagging. In addition to being less formal, social sites allow serendipitous
connections between similarly tagged material and the emergence of community vocabularies known
as “folksonomies.”
XML, RDF and semantic-web data - Various forms of structured data can often be found
either in standalone documents or embedded in HTML. These formats are designed to be machine
readable to encode information or meta-data, but they use textual formats so that they are in principle
viewable or editable. These are described later in Section 4.1.4 .
micro-formats - Micro-formats offer a form of 'nearly for free' semantics using lightweight
additional mark-up in a human readable web page. For example, Figure 4.2 shows a fragment of the
source of Alan's LinkedIn profile page. Note the < span > tag with class “given-name” so that a web
crawler or browser with a suitable plug-in can work out what “Alan” means.
Figure 4.2: Linkedin using vCard microformat.
5 http://www.freebase.com
Search WWH ::




Custom Search