SQL fundamentals (Hibernate)

 

A table, with its rows and columns, is a familiar sight to anyone who has worked with an SQL database. Sometimes you’ll see tables referred to as relations, rows as tuples, and columns as attributes. This is the language of the relational data model, the mathematical model that SQL databases (imperfectly) implement.

The relational model allows you to define data structures and constraints that guarantee the integrity of your data (for example, by disallowing values that don’t accord with your business rules). The relational model also defines the relational operations of restriction, projection, Cartesian product, and relational join [Codd, 1970]. These operations let you do useful things with your data, such as summarizing or navigating it.

Each of the operations produces a new table from a given table or combination of tables. SQL is a language for expressing these operations in your application (therefore called a data language) and for defining the base tables on which the operations are performed.

You write SQL data definition language (DDL) statements to create and manage the tables. We say that DDL defines the database schema. Statements such as CREATE TABLE, ALTER TABLE, and CREATE SEQUENCE belong to DDL.

You write SQL data manipulation language (DML) statements to work with your data at runtime. Let’s describe these DML operations in the context of some tables of the CaveatEmptor application.

In CaveatEmptor, you naturally have entities like item, user, and bid. We assume that the SQL database schema for this application includes an ITEM table and a BID table, as shown in figure A.1. The datatypes, tables, and constraints for this schema are created with SQL DDL (CREATE and ALTER operations).

Insertion is the operation of creating a new table from an old table by adding a row. SQL databases perform this operation in place, so the new row is added to the existing table:

Example tables with example data

Figure A.1 Example tables with example data

The real power of SQL lies in querying data. A single query may perform many relational operations on several tables. Let’s look at the basic operations.

An SQL update modifies an existing row:

tmp1DC452_thumb

A deletion removes a row:

tmp1DC453_thumb

Restriction is the operation of choosing rows of a table that match a particular criterion. In SQL, this criterion is the expression that occurs in the where clause:

tmp1DC454_thumb

Along with these basic operations, relational databases define operations for aggregating rows (GROUP BY) and ordering rows (ORDER BY):

tmp1DC455_thumb

Projection is the operation of choosing columns of a table and eliminating duplicate rows from the result. In SQL, the columns to be included are listed in the select clause. You can eliminate duplicate rows by specifying the distinct keyword:

tmp1DC456_thumb

A Cartesian product (also called cross join) produces a new table consisting of all possible combinations of rows of two existing tables. In SQL, you express a Cartesian product by listing tables in the from clause:

tmp1DC457_thumb

A relational join produces a new table by combining the rows of two tables. For each pair of rows for which a join condition is true, the new table contains a row with all field values from both joined rows. In ANSI SQL, the join clause specifies a table join; the join condition follows the on keyword. For example, to retrieve all items that have bids, you join the ITEM and the BID table on their common ITEM_ ID attribute:

tmp1DC458_thumb

A join is equivalent to a Cartesian product followed by a restriction. So, joins are often instead expressed in theta style, with a product in the from clause and the join condition in the where clause. This SQL theta-style join is equivalent to the previous ANSI-style join:

tmp1DC459_thumb

SQL was called a structured query language in reference to a feature called subse-lects. Because each relational operation produces a new table from an existing table or tables, an SQL query may operate on the result table of a previous query. SQL lets you express this using a single query, by nesting the first query inside the second:

tmp1DC460_thumb

The result of this query is equivalent to the previous one.

A subselect may appear anywhere in an SQL statement; the case of a subselect in the where clause is the most interesting:

tmp1DC461_thumb

This query returns the largest bids in the database. Where clause subselects are often combined with quantification. The following query is equivalent:

tmp1DC462_thumb

An SQL restriction criterion is expressed in a sophisticated expression language that supports mathematical expressions, function calls, string matching, and perhaps even more sophisticated features such as full-text search:

tmp1DC463_thumb

Next post:

Previous post: