SQL basics¶
SQL is a programming language for working with data stored in relational databases. There are many flavors of SQL used by various database systems (MySQL, Postgres, Oracle, SQL Server etc.), but they share the same basic syntax and functionality. Here we show how to use SQL while working with databases created using SQLite software.
SQL in Jupyter Notebook¶
In order to execute SQL code in Jupyter Notebook, we will use the ipython-sql
extension of the notebook. The following command, executed in a notebook, will install this extension:
Next, we need to load the extension:
From now on, when the first line of a code cell is %%sql
, the rest of the cell will be evaluated as an SQL (and not Python) code.
Creating tables¶
To access data stored in a database, we need to connect to the database first. Typically this requires connecting to a remote database server, but SQLite is special in this respect - its databases are saved in files stored locally on a computer. The command below connects to a database stored in a file tiny.sqlite
. If such file does not exist, it will be automatically created:
The newly created database is empty. The code below creates in it a table named courses
with three columns named course_id
, course_name
, and credits
:
The code
specifies that the columns course_id
and course_name
will be used to store text, and the credits
column will contain integer values. While many database systems enforce such data type specifications, SQLite treats them merely as suggestions - e.g. it will let us store text values in the credits
column without any problems.
SQL does not provide a standard way of listing tables existing in a database. However, every SQLite database contains a special table sqlite_master
which records all database components. The code below retrieves the whole content of sqlite_master
. It shows that the courses
table now exists in the database:
Deleting tables¶
In order to delete a table my_table
from a database we use the statement
This will generate an error if there is no table with the specified name:
This error can be avoided by adding a check if the table exists prior to deleting it:
Creating table records¶
The statement
retrieves all content of my_table
. Applied to the courses
table, it will give just column headers, since at the moment this table is empty:
Records can be added to a table using the syntax
This adds a new record to my_table
with value1
placed in column1
etc.:
We can verify that a record has been added to the courses
table:
Several records can be added to a table at once:
Modifying records¶
Existing records can be modified using the statement
The condition given after WHERE
specifies that only the records for which this condition holds true are to be changed. For example, the code below changes the number of credits of the MTH 141 course to 6:
Deleting records¶
The statement
deletes from my_table
all records for which the specified condition is satisfied. For example, we can use it to delete all courses with less than 4 credits from the courses
table:
Note
Here are a few comments on the general SQL syntax:
Indentations and newlines in SQL code are ignored, but they can be used to increase code readability.
While it is customary to capitalize keywords in SQL code, this is not necessary.
select * from courses
is equivalent toSELECT * FROM courses
.The semicolon
;
marks the end of an SQL statement and separates each statement from the next one. If we are executing only one statement, its ending semicolon can be omitted.In general, names of tables, columns, and other database objects can be used without quotes. The exception from this rule are names which contain special characters (e.g. spaces) or coincide with SQL keywords. Such names must be enclosed in “double quotes”. Good practice is to avoid names which require this treatment.
All other strings in SQL code (data in table records etc.) must be enclosed in ‘single quotes’.