SQL basics¶

SQL is a programming language for working with data stored in relational databases. There are many flavors of SQL used by various database systems (MySQL, Postgres, Oracle, SQL Server etc.), but they share the same basic syntax and functionality. Here we show how to use SQL while working with databases created using SQLite software.

SQL in Jupyter Notebook¶

In order to execute SQL code in Jupyter Notebook, we will use the ipython-sql extension of the notebook. The following command, executed in a notebook, will install this extension:

[1]:

%pip install ipython-sql

Next, we need to load the extension:

[2]:

 %load_ext sql

From now on, when the first line of a code cell is %%sql, the rest of the cell will be evaluated as an SQL (and not Python) code.

Creating tables¶

To access data stored in a database, we need to connect to the database first. Typically this requires connecting to a remote database server, but SQLite is special in this respect - its databases are saved in files stored locally on a computer. The command below connects to a database stored in a file tiny.sqlite. If such file does not exist, it will be automatically created:

[3]:

%%sql
sqlite:///tiny.sqlite

[3]:

'Connected: @tiny.sqlite'

The newly created database is empty. The code below creates in it a table named courses with three columns named course_id, course_name, and credits:

[ ]:

%%sql
CREATE TABLE courses (
    course_id TEXT,
    course_name TEXT,
    credits INT);

      
    

The code

course_id TEXT,
course_name TEXT,
credits INT

specifies that the columns course_id and course_name will be used to store text, and the credits column will contain integer values. While many database systems enforce such data type specifications, SQLite treats them merely as suggestions - e.g. it will let us store text values in the credits column without any problems.

SQL does not provide a standard way of listing tables existing in a database. However, every SQLite database contains a special table sqlite_master which records all database components. The code below retrieves the whole content of sqlite_master. It shows that the courses table now exists in the database:

[5]:

%%sql
SELECT *
FROM sqlite_master;

 * sqlite:///tiny.sqlite
Done.

[5]:

type	name	tbl_name	rootpage	sql
table	courses	courses	2	CREATE TABLE courses ( course_id TEXT, course_name TEXT, credits INT)

Deleting tables¶

In order to delete a table my_table from a database we use the statement

DROP TABLE  my_table;

This will generate an error if there is no table with the specified name:

[6]:

%%sql
DROP TABLE  my_table;

 * sqlite:///tiny.sqlite
(sqlite3.OperationalError) no such table: my_table
[SQL: DROP TABLE  my_table;]
(Background on this error at: http://sqlalche.me/e/e3q8)

This error can be avoided by adding a check if the table exists prior to deleting it:

[ ]:

%%sql
DROP TABLE IF EXISTS my_table;

Creating table records¶

The statement

SELECT *
FROM my_table

retrieves all content of my_table. Applied to the courses table, it will give just column headers, since at the moment this table is empty:

[8]:

%%sql
SELECT *
FROM courses;

 * sqlite:///tiny.sqlite
Done.

[8]:

course_id	course_name	credits

Records can be added to a table using the syntax

INSERT INTO my_table (column1, column2 ,..)
VALUES (value1, value2 ,...);

This adds a new record to my_table with value1 placed in column1 etc.:

[9]:

%%sql
INSERT INTO courses (course_id, course_name, credits)
VALUES ('MTH 141','Calculus I', 4);

 * sqlite:///tiny.sqlite
1 rows affected.

[9]:

[]

We can verify that a record has been added to the courses table:

[10]:

%%sql
SELECT *
FROM courses;

 * sqlite:///tiny.sqlite
Done.

[10]:

course_id	course_name	credits
MTH 141	Calculus I	4

Several records can be added to a table at once:

[11]:

%%sql
INSERT INTO courses (course_id, course_name, credits)
VALUES
  ('MTH 142', 'Calculus II', 4),
  ('MTH 448', 'Data Oriented Computing', 3),
  ('MTH 462', 'Topics in Analysis', 2);

SELECT *
FROM courses;

      
    

 * sqlite:///tiny.sqlite
3 rows affected.
Done.

[11]:

course_id	course_name	credits
MTH 141	Calculus I	4
MTH 142	Calculus II	4
MTH 448	Data Oriented Computing	3
MTH 462	Topics in Analysis	2

Modifying records¶

Existing records can be modified using the statement

UPDATE my_table
SET
    column_1 = new_value_1,
    column_2 = new_value_2
WHERE
    condition;

The condition given after WHERE specifies that only the records for which this condition holds true are to be changed. For example, the code below changes the number of credits of the MTH 141 course to 6:

[12]:

%%sql
UPDATE courses
SET
    credits = 6
WHERE
    course_id = 'MTH 141';

SELECT *
FROM courses;

      
    

 * sqlite:///tiny.sqlite
1 rows affected.
Done.

[12]:

course_id	course_name	credits
MTH 141	Calculus I	6
MTH 142	Calculus II	4
MTH 448	Data Oriented Computing	3
MTH 462	Topics in Analysis	2

Deleting records¶

The statement

DELETE FROM my_table
WHERE
    condition;

deletes from my_table all records for which the specified condition is satisfied. For example, we can use it to delete all courses with less than 4 credits from the courses table:

[13]:

%%sql
DELETE FROM courses
WHERE
    credits < 4;

SELECT *
FROM courses;

      
    

 * sqlite:///tiny.sqlite
2 rows affected.
Done.

[13]:

course_id	course_name	credits
MTH 141	Calculus I	6
MTH 142	Calculus II	4

Note

Here are a few comments on the general SQL syntax:

Indentations and newlines in SQL code are ignored, but they can be used to increase code readability.
While it is customary to capitalize keywords in SQL code, this is not necessary. select * from courses is equivalent to SELECT * FROM courses.
The semicolon ; marks the end of an SQL statement and separates each statement from the next one. If we are executing only one statement, its ending semicolon can be omitted.
In general, names of tables, columns, and other database objects can be used without quotes. The exception from this rule are names which contain special characters (e.g. spaces) or coincide with SQL keywords. Such names must be enclosed in “double quotes”. Good practice is to avoid names which require this treatment.
All other strings in SQL code (data in table records etc.) must be enclosed in ‘single quotes’.