[DB] Advanced SQL

10 minute read

Accessing SQL From a Programming Language

A database programmer must have access to a general purpose programming language for at least two reasons:

Not all queries can be expressed in SQL.
SQL does not provide the full expressive power of a general-purpose language.
Non-declarative actions cannot be done from within SQL.
Non-delcatarive actions include printing a report, interacting with a user, and sending the results of a query to a GUI.

There are two approaches to accessing SQL from a general purpose programming language:

Dynamic SQL: SQL statements that are constructed at runtime; for example, the application may allow users to enter their own queries
- A general purpose program can connect to and communicate with a database server using a collection of functions.
  - Enabling you to build SQL statements dynamically at runtime
  - Being able to create more general purpose, flexible applications by using dynamic SQL because the full text of a SQL statement may be unknown at compilation
  - e.g. Dynamic SQL statement in Python
Embedded SQL: SQL statements in an application that do not change at runtime and, therefore, can be hard-coded into the application
- Like dynamic SQL, it provides a means by which program can interact with a database server. However, under embedded SQL:
  - The SQL statements are identified at compile time, and are translated into function calls.
  - At runtime, these function calls connect to the database using an API that provides dynamic SQL facilities (but may be specific to the database being used).
  - e.g. Oracle SQL

Triggers

A trigger is a statement that is executed automatically by the system as a side effect of a modification to the database.
- e.g. whenever a tuple is inserted into the takes relation, updates the tot_creds attribute of tuple in the student relation.
The two requirements of a trigger design are:
- Specify the conditions under which the trigger is to be executed.
- Specify the actions to be taken when the trigger executes.

Triggering Events and Actions in SQL

Triggering event can be an insert, a deletion, or an update.
Triggers on updates can be restricted to specific attributes.
- e.g. AFTER UPDATE OF takes ON grade
Values of attributes before and after an update can be referenced.
- REFERENCING OLD ROW AS: For deletions and updates
- REFERENCING NEW ROW AS: For insertions and updates
BEGIN ATOMIC ... END can serve to collect multiple SQL statements into a single compound statement.

Triggers can be activated before and after an event.

AFTER UPDATE OF

BEFORE UPDATE OF: Serves as an extra constraint that can prevent invalid updates, inserts, or deletes

e.g., suppose the value of an inserted grade is blank, presumably to indicate the absence of a grade. We can define a trigger that replaces the value with NULL

CREATE TRIGGER setnull_trigger BEFORE UPDATE OF takes
  REFERENCING NEW ROW AS nrow
  FOR EACH ROW
    WHEN (nrow.grade = ' ')
    BEGIN ATOMIC
        SET nrow.grade = NULL;
    END;

Example

When the grade attribute is updated for a tuple in the takes relation, keep the tot_cred attribute value of student tuples up-to-date:

-- When (Trigger Conditions)
CREATE TRIGGER credits_earned AFTER UPDATE OF takes ON (grade)
  -- What (Trigger Actions)
  REFERENCING NEW ROW AS nrow
  REFERENCING OLD ROW AS orow
  FOR EACH ROW
    WHEN    (nrow.grade <> 'F' AND nrow.grade IS NOT NULL)
    AND     (orow.grade = 'F'  OR  orow.grade IS NULL)
    BEGIN ATOMIC
        UPDATE  student
        SET     tot_cred = tot_cred +
        (
            SELECT  credits
            FROM    course
            WHERE   course.course_id = nrow.course_id
        )
        WHERE   student.id = nrow.id;
    END;

Using triggers to maintain referential integrity

CREATE TRIGGER timeslot_check1 AFTER INSERT ON section 
  REFERENCING NEW ROW AS nrow 
  FOR EACH ROW
    WHEN (nrow.time slot_id NOT IN (
          SELECT time_slot_id 
          FROM time_slot)) /* time_slot_id not present in time_slot */
    BEGIN
      rollback 
    END;
  
CREATE TRIGGER timeslot_check2 AFTER DELETE ON timeslot 
  REFERENCING OLD ROW AS orow 
  FOR EACH ROW
    WHEN (orow.time_slot_id NOT IN (
          SELECT time_slot_id 
          FROM time_slot)/*last tuple for time_slot_id deleted from time_slot */
        AND orow.time_slot_id IN (
          SELECT time_slot_id
          FROM section)) /* and time slot id still referenced from section*/
    BEGIN
      rollback 
    END;

Place a reorder when inventory falls below a specified minimum

CREATE TRIGGER reorder AFTER UPDATE OF level ON inventory 
  REFERENCING OLD ROW AS orow, NEW ROW AS nrow
  FOR EACH ROW
    WHEN nrow.level <= (SELECT level
                        FROM minlevel
                        WHERE minlevel.item = orow.item)
    AND orow.level > (SELECT level
                      FROM minlevel
                      WHERE minlevel.item = orow.item)
    BEGIN ATOMIC
      INSERT INTO orders
        (SELECT item, amount 
        FROM reorder
        WHERE reorder.item = orow.item);
    END;

Statement level triggers

Instead of executing a separate action for each affected row, a single action can be executed for all rows affected by a transaction.
- Instead of FOR EACH ROW, use FOR EACH STATEMENT.
- Instead of REFERENCING [OLD/NEW] ROW AS ..., use REFERENCING [OLD/NEW] TABLE AS ... to refer to temporary tables (called transition tables) containing all the affected rows.
Statement level triggers can be more efficient when dealing with SQL statements that update a large number of rows.

When Not To Use Triggers

Earlier, triggers were used for tasks such as
- Maintaining summary data (e.g., total salary of each department).
- Replicating databases by recording changes to special relations (called change or delta relations), and having a separate process that applies the changes over to a replica
However, there are better ways of doing these nowadays.
- Databases today provide built-in materialized view facilities to maintain summary data.
- Databases provide built-in support for replication.
- Encapsulation facilities can be used instead of triggers in many cases. Define methods to update fields, which directly carry out actions as part of the update methods instead of through a trigger indirectly.

Risks of Triggers

There are some risks for using triggers. For example,

Risk of unintended execution of triggers. For example, when
- Loading data from a backup copy, or Replicating updates at a remote site
  - In this case, the triggered action has already been executed and reflected in the backup or replica in the past $\Rightarrow$ redundant
  - Thus triggers would have to be disabled initially and enabled when the backup site takes over processing from the primary system.
Trigger error at runtime might lead to the failure of critical transactions that set off the trigger.
Cascading execution of triggers.
- e.g., an insert trigger on a relation causes another insert on the same relation
- might lead to an infinite chain of triggering

Recursive Queries

`WITH RECURSIVE` clause

Recursive views make it possible to write queries that cannot be written without recursion or iteration.
- Without recursion, a non-recursive non-iterative program can perform only a fixed number of joins of relation with itself
  - This program can give only a fixed number of levels of prerequisites
  - Given a fixed non-recursive query, we can construct a database with a greater number of levels of prerequisites on which the query will not work
- Alternatively, we may write a procedure to iterate as many times as required
  - See the procedure findAllPrereqs in the textbook
Any recursive view must be defined as the union of two subqueries:
- base query: nonrecursive
- recursive query: uses the recursive view.
The meaning of a recursive view is best understood as follows:
1. compute the base query and add all the resultant tuples to the recursively defined view relation (initially empty)
2. compute the recursive query using the current contents of the view relation, and add all the resulting tuples back to the view relation
3. keep repeating the above step until no new tuples are added to the view relation.
4. at some point, no tuple is added to the view after recursion. This final result is called the fixed point of the recursive view definition.

Example. Find all (direct or indirect) prerequisite subjects of each course

WITH RECURSIVE rec_prereq(course_id, prereq_id) AS
(
    (
        SELECT  course_id,
                prereq_id
        FROM    prereq
    )
    UNION
    (
        SELECT  rec_prereq.course_id,
                prereq.prereq_id
        FROM    rec_prereq,
                prereq
        WHERE   rec_prereq.prereq_id = prereq.course_id
    )
)
SELECT  course_id
FROM    rec_prereq;

The example view rec_prereq is called the transitive closure of the prereq relation, meaning that it is a relation that contains all pairs (cid, pre) such that pre is a direct or indirect prerequisite of cid.

Recursive views are required to be monotonic.
- For each step of the recursion, the view must contain all of the tuples it contained in the previous step, plus possibly some more.
- Recursive queries may not use any of the following construct:
  - Aggregation on the recursive view
  - NOT EXISTS on a subquery that uses the recursive view.
  - Set difference (EXCEPT) whose RHS uses the recursive view.

Advanced Aggregation Features

The aggregation support in SQL is quite powerful and handles most common tasks with ease.

Ranking

Finding the position of a value in a larger set is a common operation. In SQL, ranking is done by RANK() OVER in conjunction with an ORDER BY specification.

Examples

Find the rank of each student

SELECT  ID,
        RANK() OVER (ORDER BY (GPA) DESC) AS s_rank
FROM    student_grades;
/* student_grades(ID, GPA) is a relation 
* giving the GPA of each student */

Here, an extra ORDER BY clause can return the results in sorted order.

SELECT  ID,
        RANK() OVER (ORDER BY (GPA) DESC) AS s_rank
FROM    student_grades
ORDER BY s_rank ASC;

A basic issue with ranking is how to deal with the case of mutiple tuples that are the same values on the ordering attribute(s).
- Naturally, the RANK() function leaves gaps.
  Meaning that if the highest GPA is shared by two students, both would get rank 1, and the next rank would be 3.
- There is a DENSE_RANK() function which does not leave gaps.

Ranking with partitions

Ranking can be done within partitions of data, using PARTITION BY.

Examples

Find the rank of students within each department

SELECT  ID,
        dept_name,
        RANK() OVER 
        (
            PARTITION BY dept_name
            ORDER BY GPA DESC
        ) AS dept_rank
FROM    dept_grades
ORDER BY dept_name, dept_rank;

Multiple rank expressions can be used within a single SELECT statement.
When ranking (possibly with partitioning) occurs along with a GROUP BY clause, the GROUP BY clause is applied first, and partitioning and ranking are done on the results of GROUP BY, allowing aggregate values to be used for ranking.

Several other functions can be used in place of RANK
- PERCENT_RANK: Gives the rank of the tuple as a fraction
- CUME_DIST: Cumulative distribution
- ROW_NUMBER: Sorts the rows and gives each row a unique number corresponding to its position
  - Non-deterministic in presence of duplicates
- NULLS FIRST, NULLS LAST
- NTILE(n): Takes the tuples in each partition in the specified order, and divides them into n buckets with equal numbers of tuples

Example

Find for each student the quartile they belong to

SELECT  ID,
        NTILE(4) OVER (ORDER BY GPA DESC) AS quartile
FROM    student_grades;

Find the rank of each student

  SELECT  ID,
          RANK() OVER (ORDER BY (GPA) DESC NULLS LAST) AS s_rank
  FROM    student_grades;
  /* student_grades(ID, GPA) is a relation 
  * giving the GPA of each student */

Windowing

Window queries compute an aggregate function over ranges of tuples. They are used to smooth out random variables.
Unlike partitions, windows may overlap, in which case a tuple may contribute to more than one window.

SQL provides a windowing feature to support such queries.

ROWS BETWEEN n_1 PRECEDING AND n_2 FOLLOWING

Example

Moving average: given sales values for each date, calculate for each date the average of the sales on that day, the previous day, and the next day

SELECT  date,
        SUM(value) OVER
        (
            ORDER BY date
            ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
        )
FROM    sales;

the average of total number of credits taken by students in each 3 year.

SELECT  year
        AVG (num_credits) OVER
        (
            ORDER BY year
            ROWS 3 PRECEDING
        )
FROM    tot_credits;

the average of total number of credits taken by students in all prior years

SELECT  year
        AVG (num_credits) OVER
        (
            ORDER BY year
            ROWS UNBOUNDED PRECEDING
        )
FROM    tot_credits;

Windowing with partitions

SQL supports windowing within partitions.

Example

Find total balance of each account after each transaction on the account

SELECT  account_number,
        date_time,
        SUM(value) OVER
        (
            PARTITION BY account_number
            ORDER BY date_time
            ROWS UNBOUNDED PRECEDING
        ) AS balance
FROM    transaction
ORDER BY account_number, date_time;

UNBOUNDED: The number of preceding / following rows are unbounded
CURRENT ROW: Specifies the current row
- e.g., ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
RANGE BETWEEN ...: Cover all tuples with a particular value rather than covering a specific number of tuples

Reference

[1] Silberschatz, Abraham, Henry F. Korth, and Shashank Sudarshan. “Database system concepts.” (2011).

Share on

Twitter Facebook LinkedIn

Youngdo Lee

[DB] Advanced SQL

Accessing SQL From a Programming Language

Triggers

Triggering Events and Actions in SQL

Statement level triggers

When Not To Use Triggers

Risks of Triggers

Recursive Queries

`WITH RECURSIVE` clause

Advanced Aggregation Features

Ranking

Ranking with partitions

Windowing

Windowing with partitions

Reference

Share on

Leave a comment

Youngdo Lee

Accessing SQL From a Programming Language

Triggers

Triggering Events and Actions in SQL

Statement level triggers

When Not To Use Triggers

Risks of Triggers

Recursive Queries

WITH RECURSIVE clause

Advanced Aggregation Features

Ranking

Ranking with partitions

Other ranking related features

Windowing

Windowing with partitions

Other windowing related features

Reference

Share on

Leave a comment

`WITH RECURSIVE` clause