SQL joins, union, intersection

We frequently need to combine data from several database tables and queries to extract information we want. The operators UNION, INTERSECT, EXCEPT and various types of joins are tools that help us accomplish such tasks.

In the examples below we will be using the SQLite database courses_data.sqlite. It can be downloaded here:

To execute SQL code in Jupyter Notebook we will use the ipython-sql notebook extension. Assuming that this extension is already installed (see SQL Basics for installation instructions), the code below will load it:

[1]:
%load_ext sql

Next, we connect to the courses_data.sqlite database:

[2]:
%%sql
sqlite:///courses_data.sqlite

This database consists of a three tables: students, mth141, and eng101. The students table contains information (names, id numbers, majors) of some college students:

[3]:
%%sql
SELECT *
FROM students
LIMIT 5;
 * sqlite:///courses_data.sqlite
Done.
[3]:
student_id first_name last_name major
196 Rosina Hovey biology
165 Stanton Mcgaugh biology
123 Jonnie Hillock mathematics
136 Chu Flax mathematics
122 Rosalinda Trippe biology

The tables mth141 and eng101 contain student grades from two courses: MTH 141 and ENG 101, respectively. Each grade is associated to an id number of a student:

[4]:
%%sql
SELECT *
FROM mth141
ORDER BY student_id;
 * sqlite:///courses_data.sqlite
Done.
[4]:
student_id grade
109 C
122 A
127 A
140 F
145 C
157 F
188 C
[5]:
%%sql
SELECT *
FROM eng101
ORDER BY student_id;
 * sqlite:///courses_data.sqlite
Done.
[5]:
student_id grade
122 A
123 C
140 D
165 F
188 A
189 A

INTERSECT

Lets say that we want to know id numbers of students who took both courses, MTH 141 and ENG 101. We can accomplish this using the INTERSECT operator. Given two queries, it returns records which are selected by both of them:

[6]:
%%sql
SELECT student_id
FROM mth141
INTERSECT
SELECT student_id
FROM eng101;
 * sqlite:///courses_data.sqlite
Done.
[6]:
student_id
122
140
188

UNION

The UNION operator combines two queries, returning records selected by either one of them. Identical records selected by both queries are returned only once.

As an example, we can use UNION to get a list of id numbers of students who took either MTH 141 or ENG 101, or both:

[7]:
%%sql
SELECT student_id
FROM mth141
UNION
SELECT student_id
FROM eng101;
 * sqlite:///courses_data.sqlite
Done.
[7]:
student_id
109
122
123
127
140
145
157
165
188
189

EXCEPT

Given two queries, the EXCEPT operator returns these records selected by the first query, which are not selected by the second query.

Below we use it to get a list of id numbers of students who took MTH 141 but not ENG 101:

[8]:
%%sql
SELECT student_id
FROM mth141
EXCEPT
SELECT student_id
FROM eng101;
 * sqlite:///courses_data.sqlite
Done.
[8]:
student_id
109
127
145
157

Note

The operators INTERSECT, UNION and EXCEPT can be used to combine only queries which return the same number of columns.

Joins

Let say that we want to combine records from the mth141 and eng101 tables so that we get student id numbers in one column, MTH 141 grades in the second column, and ENG 101 grades in the last column. This can be done in a few different ways depending on what precisely we want to accomplish:

Case 1. We can get records of only these students who took both courses and have a grade for both MTH 141 and ENG 101.

Case 2. We can get records of all students who took MTH 141. The column with ENG 101 grades will contain either a grade, for students who took ENG 101, or NULL otherwise.

Case 3. We can get records of all students who took ENG 101, and populate the MTH 141 column with either grades or NULL values depending on whether a student took MTH 141 or not.

Case 4. We can get records of students who took either course. The MTH 141 and ENG 101 columns will be populated with grades wherever they exist, and with NULL values elsewhere.

SQL lets us implement each of these scenarios using different types of table joins: the inner join in case 1, the left and right joins in cases 2 and 3, and the full outer join in case 4.

Inner join

The INNER JOIN clause is used as follows:

SELECT column1, column2, ...
FROM
    left_table INNER JOIN right_table
    ON condition;

This clause returns combinations of every record of left_table with every record of right_table for which the specified condition is satisfied. Records of either table which do not have a match are omitted.

For example, the query below returns combined records from tables mth141 and eng101 which have the same student id number:

[9]:
%%sql
SELECT *
FROM
    mth141 INNER JOIN eng101
    ON mth141.student_id = eng101.student_id;
 * sqlite:///courses_data.sqlite
Done.
[9]:
student_id grade student_id_1 grade_1
122 A 122 A
140 F 140 D
188 C 188 A

Notice that to identify which column comes from which table we need to prefix each column name with the name of a table. E.g. mth141.student_id denotes column student_id in table mth141.

The above query returns student_id column twice since this column appears in both tables. Also, names of columns with grades are not informative. We can fix this by specifying which columns we want to retrieve and using aliases:

[10]:
%%sql
SELECT
    mth141.student_id,
    mth141.grade AS mth141_grade,
    eng101.grade AS eng101_grade
FROM
    mth141 INNER JOIN eng101
    ON mth141.student_id = eng101.student_id;
 * sqlite:///courses_data.sqlite
Done.
[10]:
student_id mth141_grade eng101_grade
122 A A
140 F D
188 C A

The condition in INNER JOIN (and other types of joins) does not need to test just equality of column values. For example, below we return records of students who took both courses, and received a different grade in each course:

[12]:
%%sql
SELECT
    mth141.student_id,
    mth141.grade AS mth141_grade,
    eng101.grade AS eng101_grade
FROM
    mth141 INNER JOIN eng101
    ON mth141.student_id = eng101.student_id
       AND mth141.grade != eng101.grade;
 * sqlite:///courses_data.sqlite
Done.
[12]:
student_id mth141_grade eng101_grade
140 F D
188 C A

Left join

The LEFT JOIN clause is used as follows:

SELECT column1, column2, ...
FROM
   left_table LEFT JOIN right_table
   ON condition;

It combines every record of left_table with every record of right_table for which the given condition is satisfied. Records in the left_table that not have a match in right_table are combined with NULL values.

For example, we can use LEFT JOIN to list MTH 141 students, and include their grades in ENG 101 if they exist:

[13]:
%%sql
SELECT
    mth141.student_id,
    mth141.grade AS mth141_grade,
    eng101.grade AS eng101_grade
FROM
    mth141 LEFT JOIN eng101
    ON mth141.student_id = eng101.student_id;
 * sqlite:///courses_data.sqlite
Done.
[13]:
student_id mth141_grade eng101_grade
122 A A
109 C None
127 A None
140 F D
145 C None
157 F None
188 C A

Right join

RIGHT JOIN is similar to LEFT JOIN, but with roles of the right_table and left_table reversed: if a record of right_table does not have a matching record in the left_table, then it gets matched with NULL values.

While RIGHT JOIN is a standard part of SQL, it is not implemented by SQLite. This is a minor inconvenience since we can get the same results by switching tables and using LEFT JOIN. Below we do this to get grades of ENG 101 students, supplements by MTH 141 grades if they exist:

[14]:
%%sql
SELECT
    eng101.student_id,
    mth141.grade AS mth141_grade,
    eng101.grade AS eng101_grade
FROM
    eng101 LEFT JOIN mth141
    ON mth141.student_id = eng101.student_id;
 * sqlite:///courses_data.sqlite
Done.
[14]:
student_id mth141_grade eng101_grade
165 None F
123 None C
122 A A
189 None A
140 F D
188 C A

Full outer join

FULL OUTER JOIN is a combination of LEFT JOIN and RIGHT JOIN. It matches all records of left_table with all records of right_table based on a specified condition. If a record of either of the two tables does not have a match in the other table, it is matched with NULL values.

Again, while FULL OUTER JOIN is a standard SQL feature, SQLite does not implement it. However, we can get the same results using LEFT JOIN and UNION. For example, below we compute the full outer join of the mth141 and eng101 tables for records with the same student id number:

[18]:
%%sql
SELECT
    mth141.student_id AS student_id,
    mth141.grade AS mth141_grade,
    eng101.grade AS eng101_grade
FROM
    mth141 LEFT JOIN eng101
    ON mth141.student_id = eng101.student_id
UNION
SELECT
    eng101.student_id AS student_id,
    mth141.grade AS mth141_grade,
    eng101.grade AS eng101_grade
FROM
    eng101
    LEFT JOIN mth141
    ON mth141.student_id = eng101.student_id;
 * sqlite:///courses_data.sqlite
Done.
[18]:
student_id mth141_grade eng101_grade
109 C None
122 A A
123 None C
127 A None
140 F D
145 C None
157 F None
165 None F
188 C A
189 None A

Multiple joins

We can combine more that two tables by iterating the join operation. For example, below we use INNER JOIN twice to get grades of students who took both MTH 141 and ENG 101 combined with student names and majors:

[16]:
%%sql
SELECT
    students.*,
    mth141.grade AS mth141_grade,
    eng101.grade AS eng101_grade
FROM
    students INNER JOIN mth141 INNER JOIN eng101
    ON students.student_id = mth141.student_id
       AND students.student_id = eng101.student_id;
 * sqlite:///courses_data.sqlite
Done.
[16]:
student_id first_name last_name major mth141_grade eng101_grade
122 Rosalinda Trippe biology A A
140 Ludivina Hoskin computer science F D
188 Randall Lindbloom mathematics C A

Here is a more complex example. We retrieve grades of students who took either of the two courses and combine them with student names and majors. Notice that the first FROM is followed by a join of a query getting all grades and the students table. To identify columns of the query getting grades, we give it a name combined_grades:

[19]:
%%sql
SELECT
    students.*,
    combined_grades.mth141_grade,
    combined_grades.eng101_grade
FROM
    (SELECT
         mth141.student_id AS student_id,
         mth141.grade AS mth141_grade,
         eng101.grade AS eng101_grade
     FROM
         mth141 LEFT JOIN eng101
         ON mth141.student_id = eng101.student_id
     UNION
     SELECT
         eng101.student_id AS student_id,
         mth141.grade AS mth141_grade,
         eng101.grade AS eng101_grade
     FROM
         eng101 LEFT JOIN mth141
         ON mth141.student_id = eng101.student_id
     ) AS combined_grades
     INNER JOIN students
     ON combined_grades.student_id = students.student_id;
 * sqlite:///courses_data.sqlite
Done.
[19]:
student_id first_name last_name major mth141_grade eng101_grade
165 Stanton Mcgaugh biology None F
123 Jonnie Hillock mathematics None C
122 Rosalinda Trippe biology A A
109 Shawnna Coate mathematics C None
189 Marcia Hinkel mathematics None A
127 Adelina Brummitt biology A None
140 Ludivina Hoskin computer science F D
145 Ilona Rabe computer science C None
157 Wilbert Haskell computer science F None
188 Randall Lindbloom mathematics C A

Views

As the last example shows queries can get long and complicated if they involve multiple joins, unions etc. Views provide a tool for splitting such queries into simpler pieces. A view is a virtual table with results of a given query. This virtual table can be then used to construct other queries. For example, here we construct a view combined_grades listing all MTH 141 and ENG 101 grades:

[15]:
%%sql
CREATE VIEW combined_grades
AS
SELECT
    mth141.student_id AS student_id,
    mth141.grade AS mth141_grade,
    eng101.grade AS eng101_grade
FROM
    mth141 LEFT JOIN eng101
    ON mth141.student_id = eng101.student_id
UNION
SELECT
    eng101.student_id AS student_id,
    mth141.grade AS mth141_grade,
    eng101.grade AS eng101_grade
FROM
    eng101
    LEFT JOIN mth141
    ON mth141.student_id = eng101.student_id;


SELECT *
FROM combined_grades
 * sqlite:///courses_data.sqlite
Done.
Done.
[15]:
student_id mth141_grade eng101_grade
109 C None
122 A A
123 None C
127 A None
140 F D
145 C None
157 F None
165 None F
188 C A
189 None A

We can now obtain records of students who took either of these two courses, listing their names and majors as follows:

[16]:
%%sql
SELECT
    students.*,
    combined_grades.mth141_grade,
    combined_grades.eng101_grade
FROM
    combined_grades INNER JOIN students
    ON combined_grades.student_id = students.student_id;
 * sqlite:///courses_data.sqlite
Done.
[16]:
student_id first_name last_name major mth141_grade eng101_grade
165 Stanton Mcgaugh biology None F
123 Jonnie Hillock mathematics None C
122 Rosalinda Trippe biology A A
109 Shawnna Coate mathematics C None
189 Marcia Hinkel mathematics None A
127 Adelina Brummitt biology A None
140 Ludivina Hoskin computer science F D
145 Ilona Rabe computer science C None
157 Wilbert Haskell computer science F None
188 Randall Lindbloom mathematics C A

A view is automatically saved in the database so it can be used during subsequent database connections. In order to delete a view we can use the DROP VIEW statement:

[17]:
%%sql
DROP VIEW combined_grades
 * sqlite:///courses_data.sqlite
Done.
[17]:
[]

We can also create temporary views, which exist only during the current database connection and will not be saved:

[28]:
%%sql
CREATE TEMP VIEW mth141_students
AS
SELECT students.*, mth141.grade AS mth141_grade
FROM
    mth141 INNER JOIN students
    ON
    mth141.student_id = students.student_id;


SELECT *
FROM mth141_students;
 * sqlite:///courses_data.sqlite
Done.
Done.
[28]:
student_id first_name last_name major mth141_grade
122 Rosalinda Trippe biology A
109 Shawnna Coate mathematics C
127 Adelina Brummitt biology A
140 Ludivina Hoskin computer science F
145 Ilona Rabe computer science C
157 Wilbert Haskell computer science F
188 Randall Lindbloom mathematics C