Let’s do some magic! (with SQL and Python)

How to run SQL queries in “Jupyter Notebooks” Python’s IDE

Francisco Herrera González
5 min readAug 21, 2021

The world is continually changing and the different technologies are evolving everyday faster. Nevertheless, the are still “old” technologies and computing languages, which are used nowadays, sharing their place with the most cutting-edge technologies. I assume that, if you’re reading this, at least you know what the “Structured Query Language” is, but, did you know that we have been using this domain-specific language for around 50 years? That’s half of a century!

Anyway, in this article I will not deepen in what is SQL used for and how it works. If you are curious about this language, I link you to this page, where you can find detailed information about its origins and uses. Let’s just recall that SQL was designed for the only purpose of querying data contained in relational databases.

These queries can be used in order to communicate with and manipulate those databases, and that’s what we are going to do: we will use SQL statements in a relational database; but, as you may have guessed, we are going to do this by operating Jupyter Notebooks as interface and using some “magic functions”.

Why would we be interested in doing something like that? It’s simple, by operating SQL in Jupyter Notebooks we combine a powerful tool to make data analysis with the simple use of SQL language and its engine’s performance. In other words, in this way, we can use SQL to query the data and Jupyter’s technology to make operations with this data.

JupyterLabs uses a “%” sign before the command to indicate that the function will be a “magic function”. This “magic” is a group of pre-defined functions contained in the IDE’s kernel that allows us to execute provided commands.

First of all, let’s start by adding the SQL extension for Python: we use the “load_ext” magic to load the ipython-SQL extension:

Fine! Our environment is ready to operate with magic SQL!

Now we must identify the database we are going to connect with. In my case, I’m going to work with IBM’s DB2 Database, but it also works with SQLite, MySQL, Oracle, PostgreSQL, etc. Feel free to operate with the database you are more comfortable with.

In the next list you will find the connection examples for each database; you should find your user and password in the uri field in your database credentials:

-IBM DB2 Cloud Database Connection Example:

  • ibm_db_sa://my-username:my-password@dashdb-txn-sbox-yp-dal09–03.services.dal.bluemix.net:50000/BLUDB

-MySQL Database Connection Example:

  • mysql://user:pwd@localhost/database_name

-Oracle Database Connection Example:

  • oracle://user:pwd@127.0.0.1:1962/database_name

-PostgreSQL Database Connection Examples:

  • postgres://name:pwd@localhost:6259/database_name postgres://localhost:6259/database_name

Now, through the ibm_db API, we are connected to our database and we are ready to start using SQL and adding data. We can make the SQL magic in two ways:

· The cell magic: by using two percent signs (%%) at the top of the cell you will turn your whole code cell into a SQL script.

· The line magic: by using one percent sign (%) at the beginning of the code sentence you will only turn that line of your code cell to a SQL script.

Let’s use the cell magic and the SQL language to create and fill a new table. For example, let’s create a sales inventory for a fictional little greengrocer. We add the double percent sign at the top of the cell, then we start to script in SQL:

Now that we have the inventory table in the database, let’s try some basic SQL queries. For example, let’s select only the vegetables which have been sold:

We can also order the table by the item’s price in descending order:

You can also check which product is the cheapest and the most expensive by applying the MIN()/MAX() method:

In this way we can create and query our tables with Jupyter Notebooks and SQL, but this work would not be complete if I did not show you the real potential of combining SQL querying with the data science tools which are provided in our IDE.

Now that we have our table, we can put it into a dataframe and make our typical data analysis. For example, let’s create a frame with all the sold fruits in order to delve into details. First, we create the variable by using an ordinary IPython assignment and then we convert it to dataframe by typing Panda’s .DataFrame() method:

Great! Let’s operate a bit; for example, what are the total value of EUR earned for each fruit?

How much has earned our little greengrocer by selling fruits?

The dataframe objects are more versatile than SQL query result objects. For example, we are now able to easily make a bar chart of the profit for each sold fruit after converting it to a dataframe. We are going to use Seaborn library:

Once you have the dataframe of the appropriate query, it’s up to your imagination and your capacities to do the rest!

Now you know one of the many processes to mix the SQL querying of your relational database with Jupyter’s Lab technology and the Python language in order to apply data analysis “magic” operations to your own stored tables.

Thanks to you for reading! I hope it helps you! Check out my other articles and my GitHub in order to find more interesting exercises.

See you in the next article!

--

--