In this post, Lesson 007, we’ll learn about SQL select statements in more detail, and I’ll provide examples of select statements to query data from my_first_database. This is the first of 2 posts focused on SQL select statements. Since data analysis and management are best learned by doing, the objective of this and future posts is to get readers hands on experience using PostgreSQL and R.
Lesson 7 Video
Check out the accompanying video on the Elfinwood Data Science YouTube channel here: Elfinwood Data Science Blog Lesson 7 Video.
This lesson builds on earlier posts so, if this is your first time reading this blog and you’d like to follow along on your computer, then please go back and start from the post Learning Data Science and work through the lessons up to this point. For this post we’ll refer to files in the Lesson 7 folder in the learning-data-science repository. If you have not already forked the learning-data-science repository, and cloned the fork to your computer, please do so now following the instructions provided in the post Learning Data Science. If you have previously forked and cloned the repository, but don’t have the files for this lesson, then follow the instructions here to update your fork with the latest files.
If you are not familiar with version control using Git and GitHub, I recommend going back and working through the hands on tutorials in my series on version control starting with “Version Control: An Overview.”
A SQL select statement is quite possibly the most fundamental and most commonly used of the SQL statements. Select statements are used to query data from database tables, filter the results, and control how the results appear, for instance, by ordering them by a particular column. Select statements can also be used for data analysis, for instance creating a new column by performing math on one more columns or including an aggregate function and a GROUP BY clause to perform data summaries (e.g., calculating average and standard deviation).
Select statements are used to query data from database tables, filter the results, and control how the results appear, for instance, by ordering them by a particular column.Elfinwood Data Science
A basic select statement starts with the SELECT command followed by a list of columns, followed by a FROM clause, followed by a table expression. For instance, the following select statement queries 3 columns from table1:
SELECT column1, column2, column3 FROM table1
If you wanted to filter the results by one or more columns, and return the results in a specific order, WHERE and ORDER BY clauses would be included. For instance, the following select statement will select the same 3 columns as above but only return those rows where column 3 is greater than or equal to 100 and the results will be sorted in ascending order by column3.
SELECT column1, column2, column3 FROM table1 WHERE column3 >= 100 ORDER BY column3 ASC;
For the remainder of this post I’ll provide some examples of select statements to query data from my_first_database. For your reference, an annotated listing of these select statements is available in the Lesson_007 folder in the learning-data-science GitHub repository.
Selecting All Columns
A quick way to select all columns in a table is to use an asterisk instead of the list of column references. For instance, the following will return all of the columns and all 759 rows from the public.sample table.
SELECT * FROM public.sample
The above code will return the same as the following:
SELECT unique_id, study_value, sample_id, sample_type_code, latitude_dd, longitude_dd, location_accuracy_m, horz_datum_srid, elevation_m, sample_note FROM public.sample;
Selecting a Subset of Columns
To return a specific subset of columns from a table, list the columns of interest in the order in which you want them to be returned. For instance, the following will return three columns, study_value, sample_id, and elevation_m, and all rows from the public.sample table.
SELECT study_value, sample_id, elevation_m FROM public.sample
Filtering Rows using WHERE
The WHERE clause is used to filter rows in SELECT statements. The WHERE clause is placed after the FROM clause and is followed by one or more boolean expressions combined using AND or OR operators. Boolean expressions are written using comparison functions or operators and evaluate to true or false.
For example, the following select statement returns all columns and only those rows from the public.site table where the soil_restrict_layer_code column is equal to permafrost.
SELECT * FROM public.site WHERE soil_restrict_layer_code = 'permafrost'
Note the use of single quotes around the value permafrost. This is standard SQL syntax when referring to values in columns with text or character varying data types.
Here’s another example using the greater than or equal to operator, the following select statement will return only those rows in the public.sample table where the expression elevation_m >= 1000 is true (i.e., plots with an elevation_m greater than or equal to 1000 meters).
SELECT study_value, sample_id, elevation_m FROM public.sample WHERE elevation_m >= 1000;
There’s much more to filtering rows in SQL using the WHERE clause. I’ll cover filtering rows using WHERE in more detail in a later post.
Sorting the Results
When querying a database table using a select statement often times it’s useful to return the results in a specific order. The ORDER BY clause is used to sort the results of a select statement by one or more columns. ORDER BY is placed after the WHERE clause and body of which is a sort expression that consists a comma separated list of one or more columns, each followed by an optional sort direction key word. The sort direction key words are ASC (the default), for ascending order (smallest to largest), and DESC, for descending order (largest to smallest).
For example, the following select statement will order the results in ascending order (i.e., smallest to largest) by the elevation_m field (Figure 1).
SELECT study_value, sample_id, elevation_m FROM public.sample WHERE elevation_m >= 1000 ORDER BY elevation_m ASC;
The default in SQL is to sort the results in ascending order regardless if the ASC is, included in the sort expression so the above ORDER BY clause could be written more simply as:
ORDER BY elevation_m
However, to keep your code clean and readable I recommend always including a sort direction keyword in select statements.
To order by more than one column, the columns are included in a comma separated list after ORDER BY, and each each column can be given its sort order. Note that the order in which the columns are listed after order determines the priority each column is given in the sorted results. For example,
SELECT study_value, sample_id, physiography_value, geomorph_value,depth_to_restrict_layer_cm FROM public.site WHERE physiography_value = 'l' ORDER BY geomorph_value ASC, depth_to_restrict_layer_cm DESC;
The select statement above will order the results first by the geomorph_value column in ascending order, followed by the depth_to_restrict_layer_cm column in descending order.
PostgreSQL. 2021. Documentation – PostgreSQL 14. Available online: https://www.postgresql.org/docs/current/functions-comparison.html (Accessed 2022-01-01)
As a supplement to this post, I encourage you to review the PostgreSQL documentation on select statements and check out the W3 Schools select statements write up and online SQL interpreter here: https://www.w3schools.com/sql/sql_select.asp.
Next Time on Elfinwood Data Science Blog
In this post, the first of 2 posts focused SQL select statements, we learned about the basic structure of SQL select statements, including the SELECT command, and the FROM, WHERE, and ORDER BY clauses. We also looked at a number of examples of select statements to query data from a single table my_first_database. In the next post I’ll continue the lesson on SQL SELECT statements and I’ll cover LIMIT, OFFSET, mathematical operations, creating columns on the fly, data type casting, and column aliases.
If you like this post then please consider subscribing to this blog (see below) or following me on social media. If you’d like to follow this blog, but you don’t want to follow by subscribing through WordPress then a couple of options exist. You can “Like” the Elfinwood Data Science Blog Facebook Page or send me your email using the form on the Contacts Page and I’ll add you to the blog’s email list.
Follow My Blog
Copyright © 2021, Aaron Wells