This quick start guide is designed to get the Python developer up and running with TimescaleDB as their database. In this tutorial, you’ll learn how to:
- Connect to TimescaleDB
- Create a relational table
- Generate a Hypertable
- Insert a batch of rows into your Timescale database
- Execute a query on your Timescale database
To complete this tutorial, you will need a cursory knowledge of the Structured Query Language (SQL). The tutorial will walk you through each SQL command, but it will be helpful if you've seen SQL before.
To start, install TimescaleDB. Once your installation is complete, we can proceed to ingesting or creating sample data and finishing the tutorial.
You will also need:
psycopg2
library. See here for installation instructions- An existing Python Virtual Environment. To set one up, follow this tutorial
Add the following import statements to the top of your python script:
#import psycopg2
Locate your TimescaleDB credentials in order to compose a connection string for psycopg2
to use in order to connect to your TimescaleDB instance.
You’ll need the following credentials:
- password
- username
- host URL
- port
- database name
Next compose your connection string variable, as a libpq connection string, using the following format:
CONNECTION = "postgres://username:password@host:port/dbname"
If you’re using a hosted version of TimescaleDB, or generally require an SSL connection, use this version instead:
CONNECTION = "postgres://username:password@host:port/dbname?sslmode=require"
Alternatively you can specify each parameter in the connection string as follows
CONNECTION = "dbname =tsdb user=tsdbadmin password=secret host=host.com port=5432 sslmode=require"
:WARNING: The above method of composing a connection string is for test or development purposes only, for production purposes be sure to make sensitive details like your password, hostname, and port number environment variables.
We’ll use the psycopg2 connect function to create a new database session
In your main
function, add the following lines:
def main():
…
with psycopg2.connect(CONNECTION) as conn:
#Call the function that needs the database connection
func_1(conn)
Alternatively, you can create a connection object as follows and the pass that object around as needed, like opening a cursor to perform database operations:
def main():
conn = psycopg2.connect(CONNECTION)
insert_data(conn)
cur = conn.cursor()
Congratulations, you’ve successfully connected to TimescaleDB using Python!
First, compose a string which contains the SQL state that you would use to create
a relational table. In the example below, we create a table called sensors
, with
columns id
, type
and location
:
query_create_sensors_table = "CREATE TABLE sensors (id SERIAL PRIMARY KEY, type VARCHAR(50), location VARCHAR(50));"
Next, we execute our CREATE TABLE
statement by opening a cursor, executing the
query from Step 1 and committing the query we executed in order to make changes we
made to the database persistent. Afterward, we close the cursor we opened to clean
up:
cur = conn.cursor()
#see definition in Step 1
cur.execute(query_create_sensors_table)
conn.commit()
cur.close()
Congratulations, you’ve successfully created a relational table in TimescaleDB using Python!
In TimescaleDB, the primary point of interaction with your data is a [hypertable][hypertable], the abstraction of a single continuous table across all space and time intervals, such that one can query it via standard SQL.
Virtually all user interactions with TimescaleDB are with hypertables. Creating tables and indexes, altering tables, inserting data, selecting data, etc. can (and should) all be executed on the hypertable.
A hypertable is defined by a standard schema with column names and types, with at least one column specifying a time value.
First, we create a variable which houses our CREATE TABLE
SQL statement for our
hypertable. Notice how the hypertable has the compulsory time column:
#create sensor data hypertable
query_create_sensordata_table = """CREATE TABLE sensor_data (
time TIMESTAMPTZ NOT NULL,
sensor_id INTEGER,
temperature DOUBLE PRECISION,
cpu DOUBLE PRECISION,
FOREIGN KEY (sensor_id) REFERENCES sensors (id)
);"""
Next we formulate the SELECT
statement to convert the table we created in Step 1
into a hypertable. Note that we must specify the table name which we wish to convert
to a hypertable and its time column name as the two arguments, as mandated by the
create_hypertable
docs:
query_create_sensordata_hypertable = "SELECT create_hypertable('sensor_data', 'time');"
Now we bring it all together by opening a cursor with our connection, executing our statement from step 1, then executing our statement from step 2 and committing our changes and closing the cursor:
cur = conn.cursor()
cur.execute(query_create_sensordata_table)
cur.execute(query_create_sensordata_hypertable)
#commit changes to the database to make changes persistent
conn.commit()
cur.close()
Congratulations, you’ve successfully created a hypertable in your Timescale database using Python!
Here’s a typical pattern you’d use to insert some data into a table. In the
example below, we insert the relational data in the array sensors
, into the
relational table named sensors
.
First, we open a cursor with our connection to the database, then using prepared
statements formulate our INSERT
SQL statement and then we execute that statement,
sensors = [('a','floor'),('a', 'ceiling'), ('b','floor'), ('b', 'ceiling')]
cur = conn.cursor()
for sensor in sensors:
try:
cur.execute("INSERT INTO sensors (type, location) VALUES (%s, %s);",
(sensor[0], sensor[1]))
except (Exception, psycopg2.Error) as error:
print(error.pgerror)
conn.commit()
A cleaner way to pass variables to the cur.execute
function is below, where we
separate the formulation of our SQL statement, SQL
, with the data being passed
with it into the prepared statement, data
:
SQL = "INSERT INTO sensors (type, location) VALUES (%s, %s);"
for sensor in sensors:
try:
data = (sensor[0], sensor[1])
cur.execute(SQL, data)
except (Exception, psycopg2.Error) as error:
print(error.pgerror)
conn.commit()
Congratulations, you’ve successfully inserted data into TimescaleDB using Python!
While using psycopg2
by itself may be sufficient for you to insert rows into your
hypertable, if you need quicker performance, you can use
pgcopy. To do this, install pgcopy
using pip3
or the like and
then add this line to your list of import
statements:
from pgcopy import CopyManager
Here’s some sample code which shows how to insert data into Timescale using pgcopy
,
using the example of sensor data from four sensors:
#insert using pgcopy
def fast_insert(conn):
cur = conn.cursor()
#for sensors with ids 1-4
for id in range(1,4,1):
data = (id, )
#create random data
simulate_query = """SELECT generate_series(now() - interval '24 hour', now(), interval '5 minute') AS time,
%s as sensor_id,
random()*100 AS temperature,
random() AS cpu
"""
cur.execute(simulate_query, data)
values = cur.fetchall()
#define columns names of the table you're inserting into
cols = ('time', 'sensor_id', 'temperature', 'cpu')
#create copy manager with the target table and insert!
mgr = CopyManager(conn, 'sensor_data', cols)
mgr.copy(values)
#commit after all sensor data is inserted
#could also commit after each sensor insert is done
conn.commit()
#check if it worked
cur.execute("SELECT * FROM sensor_data LIMIT 5;")
print(cur.fetchall())
cur.close()
First we generate random sensor data - you would replace this step with funneling in your real data from your data pipeline.
Then we define the column names of the table we want to insert data into. In this
case, we’re using the sensor_data
hypertable that we created in the
"Generate a Hypertable" section above. This hypertable consists of the columns
named time
, sensor_id
, temperature
and cpu
. We define these column names
in a tuple of strings called cols
.
Lastly we create an instance of the pgcopy
CopyManager, mgr
, and pass our
connection variable, hypertable name, and tuple of column names. Then we use
the copy
function of the CopyManager to insert the data into the database
performantly using pgcopy
and then commit when we’re done. There is also sample
code to check if the insert worked.
Congratulations, you’ve successfully performantly inserted data into TimescaleDB
using Python and the pgcopy
library!
First, define the SQL query you’d like to run on the database. The example below
is a simple SELECT
statement from our Hello Timescale tutorial.
query = "SELECT * FROM rates;"
Next we’ll open a cursor from our existing database connection, conn
,
and then execute the query we defined in Step 1:
cur = conn.cursor()
query = "SELECT * FROM rates;"
cur.execute(query)
To access all the resulting rows returned by your query, we’ll use
one pyscopg2
’s results retrieval methods,
such as fetchall()
or fetchmany()
. In the example below, we’re simply
printing the results of our query, row by row. Note the the result of fetchall()
is a list of tuples, so you can handle them accordingly:
cur = conn.cursor()
query = "SELECT * FROM rates;"
cur.execute(query)
for i in cur.fetchall():
print(i)
cur.close()
For more complex queries than a simple SELECT *
, we can use prepared statements
to ensure our queries are executed safely against the database. We write our
query using placeholders as shown in the sample code below. For more on how to
properly use placeholders in psycopg2, see the basic module usage document.
#query with placeholders
cur = conn.cursor()
query = """
SELECT time_bucket('5 minutes', time) AS five_min, avg(cpu)
FROM sensor_data
JOIN sensors ON sensors.id = sensor_data.sensor_id
WHERE sensors.location = %s AND sensors.type = %s
GROUP BY five_min
ORDER BY five_min DESC;
"""
data = (location, sensor_type)
cur.execute(query, data)
results = cur.fetchall()
Congratulations, you’ve successfully executed a query on TimescaleDB using Python! For more information on how to execute more complex queries, see the psycopg2 documentation
Now that you’re able to connect, read, and write to a TimescaleDB instance from your Python application, and generate the scaffolding necessary to build a new application from an existing TimescaleDB instance, be sure to check out these advanced TimescaleDB tutorials: