Table of Contents
- Module Introduction
- Basics of Filtering with SQL
- Advanced Filtering:
IN,OR, andNOT - Using Wildcards in SQL
- Sorting with
ORDER BY - Math operations
- Aggregate Functions
- Grouping Data with SQL
- Putting it all together
This module introduces a number of operators and clauses to query data:
WHEREBETWEENINORNOTLIKEORDER BYGROUP BY
It also introduces wildcards, and math operators which can be used to aggregate data:
AVERAGECOUNTMAXMIN
- allows one to be specific about what they want
- reduces the number of records you receive
- increases query performance
- reduces strain on the client application
The WHERE clause filters queries by criteria you define.
SELECT [fields]
FROM [table]
WHERE [criteria]These criteria are determined through using a number of operators:
=- equal<>- not equal>- greater than<- less than>=- greater than or equal<=- less than or equalBETWEEN- between an inclusive rangeIS NULL- is a null value
To use BETWEEN we need to use it in conjunction with AND:
SELECT *
FROM my_table
WHERE size BETWEEN 5 AND 10;The IN operator specifies a range of conditions, delimited by commas, and
within parentheses.
-- select only suppliers with the provided ids
SELECT *
FROM suppliers
WHERE supplierId IN (9, 10, 15);To assert against string values, surround the strings in single quotes.
As with the disjunction operator in many programming languages, SQL will ignore
the second condition in an OR statement if the first condition is true.
SELECT *
FROM pets
WHERE name = 'sammy' OR 'hammy';IN works similarly to OR, but there are benefits to using IN:
- one can provide a long list of options
INexecutes faster thanOR- the condition used in
INare not subject to a specific order INclauses can container anotherSELECTstatement to make subqueries
NOT is used to exclude rows based on conditions:
SELECT *
FROM employees
WHERE NOT city = 'London' AND NOT city = 'Seattle';A wildcard is a special character used to match parts of a value, such as when you know only a portion of a value.
Search patterns are made from literal text, a wildcard character, or a combination of both.
Wildcard queries use the LIKE operator. LIKE is actually a predicate, but is
often referred to as an operator.
Wildcards can only be used for string data.
%text- matches anything ending withtexttext%- matches anything beginning withtext%text%- matches anything containingtexta%b- matches anything that is preceded byaand ends withb
Wildcards will not match NULL values.
SELECT DISTINCT(area)
FROM cities
WHERE name LIKE '%ville';Many RDMSs support _ as a wildcard. It functions in the same way that % has
been outlined here.
- wildcard queries take longer to run
- better to use other operators if possible
- statements with wildcards at the end of patterns take longer to run than patterns with wildcards at the beginning
- placement of wildcads is imporant
Using SELECT will return data in the same order that it was captured in the
database.
Furthermore, the order of data can change when it is updated or deleted.
Without specifying the order of data, its order cannot be assumed.
ORDER BY:
- takes the name of one or more columns
- columns are delimited by commas
- data can be sorted by columns that are not retrieved in the
SELECTstatement - must always be the last statement in a
SELECTstatement
SELECT *
FROM my_table
WHERE [criteria]
ORDER BY col1, col2;One can specify the direction to sort data:
DESC- descendingASC- ascending
The direction keywords apply only to the column name they precede.
...
ORDER BY DESC name;Though not readable, and prone to error, one can sort by column number:
...
ORDER BY 2, 3;Columns numbers are 1-indexed.
Available operators:
+-*/
SELECT
id
,unit_price
,unit_sold
,unit_price * units_sold AS total_sales
FROM my_table;SELECT
id
,unit_price
,units_sold
,unit_weight
,(unit_price * units_sold)/unit_weight AS weight_ratio
FROM my_table;Aggregate functios allow one to summarise data.
The aggregate functions available are:
AVGCOUNTMINMAXSUM
DISTINCT may be used with some of the aggregate functions to include only
values that are distinct.
Aggregate functions are useful for:
- summarising data
- finding the highest and lowest values
- finding the total number of rows
- finding averages
Aggregate functions are syntactic sugar for the existing math operators.
Rows containing NULL values are ignore by the AVG function.
SELECT
AVG(price) AS avg_price
FROM products;COUNT can be used to count rows, including those containing NULL values, or
individual columns, excluding those containing NULL values.
# count all rows, including those with NULL values
SELECT COUNT(*) AS customers
FROM customers;# count customers that have emails
SELECT COUNT(email) AS customers_with_email
FROM customers;NULL values are ignored by both MAX and MIN.
SELECT MAX(price) AS max_price
FROM products;SELECT MIN(size) AS smallest_size
FROM shoes;# get a range of shoe sizes
SELECT
MAX(size) AS shoe_size_upper
, MIN(size) as show_size_lower
FROM shoes;# simple query
SELECT SUM(price) as total_price
FROM products;# less simple query
SELECT SUM(price * units_sold) total_sales
FROM products;If DISTINCT is not defined, ALL is assumed.
DISTINCT can't be used in conjunction with COUNT(*).
DISTINCt doesn't make sense to use with MIN or MAX since they only return
a single value.
SELECT COUNT(DISTINCT price) AS distinct_prices
FROM products;Grouping data allows one to summarise subsets of data.
Data is grouped using two statements:
GROUP BYHAVING
Using these statements we can aggregate data by a particular value, e.g. grouping shoes by shoe size.
When simply aggregating total number of customers in the aggregating module, we didn't need to do anything specific to get a result.
If, however, another column is specified in the SELECT statement, we'd need to
indicate how we'd like the number of customers to be counted. This is where
GROUP BY is useful:
# this will error - we need to specify _how_ to count customer ids when queried
with region
SELECT
region
,COUNT(customer_id) AS total_customers
FROM customers;
# so we indicate that we want to count customers grouped by region
SELECT
region
,COUNT(customer_id) AS total_customers
FROM customers
GROUP BY region;- can contain multiple columns, delimited by commas
- every column in the
SELECTstatement must be present in aGROUP BYclause, with the exception of the aggregated values - NULLs can be grouped together if your
GROUP BYcolumn contains NULLs
WHEREfilters on rows, not groupsWHEREthen needs to be defined beforeGROUP BY- rows eliminated by a
WHEREclause will not be included in a group HAVINGis the equivalent toWHERE, but for groupsHAVINGmust come afterORDER BY
# count the number of orders, by customer, where there are 2 or more orders for
# each customer
SELECT
customer_id
.COUNT(*) AS total_orders
FROM orders
GROUP BY customer_id
HAVING COUNT(*) >= 2;As with other queries, GROUP BY will not sort results. It's good practice to
use ORDER BY with GROUP BY to organise the results of queries.
SELECT
supplier_id
,COUNT(*) AS num_products
FROM products
WHERE price >= 4
GROUP BY supplier_id
HAVING COUNT(*) >= 2
ORDER BY name ASC;[video](Putting it All Together)
Filtering is useful because:
- you're narrowing down your results
- it increases query and application performance
- makes understanding data easier by:
- finding specific values
- finding a range of values
- finding blank values
Order of key SQL clauses:
# always required
SELECT [columns]
# required if selecting table from a table
FROM [table]
WHERE [criteria]
# required if calculating aggregates by a group
GROUP BY [columns]
HAVING [criteria]
ORDER BY [columns [ASC|DESC]];