Load testing Postgres using `pgbench` utility️
A lot of times during testing my postrges db I have come across situations and asked questions like
- How do I generate lots of data to test my postgres db instance ?
- How do I test my db against a lot of concurrent users ?
- How do I test each user with more than 1 transaction being performed at a given time?
And that is where I researched a bit to look for a tool or utility that can help me with my requirements,
turns out there is an existing utility shipped with postgres called pgbench
. I will discuss below the use case
provide a quick guide on how to use this utility.
Use-Cases
This blog post covers pgbench
a postgres utility which you can use to
benchmark your PostgresDB. It has a lot of use cases and I would like
to mention below a small guide on using the utility.
But before jumping right into the commands, first lets understand what pgbench
is. pgbench
is a postgres extension binary that is shipped with postgres-client
.
pgbench
is a loose TPC-B type of benchmarking program. A TPC-B type of benchmark is one that
measures the performance of a database system in a transactional processing workload, you can
read about it more here [https://www.tpc.org/tpcb/]
By default pgbench
runs a sequence of same commands involving seven SELECT
, UPDATE
, and INSERT
commands per transaction.
Optionally you can also supply your transactional scripts to execute.
Usage
If you run pgbench
in default it needs to initialize some additional
tables in your database to perform load test, the tables being -
pgbench_accounts
, pgbench_branches
, pgbench_history
, and pgbench_tellers
.
Assuming you are running a local postgres instance to initialize pgbench
you can use the following command to initialize without options
```
pgbench -i -h localhost -p 5432 -U postgres -d mydb
```
this will generate the following new tables in the public schema
public | pgbench_accounts | table | postgres
public | pgbench_branches | table | postgres
public | pgbench_history | table | postgres
public | pgbench_tellers | table | postgres
In default settings the above tables have been generated with a scaling factor of 1 which means the number of rows will be:
```
table # of rows
---------------------------------
pgbench_branches 1
pgbench_tellers 10
pgbench_accounts 100000
pgbench_history 0
```
You can customize the scaling factor to increase the number of rows in the tables using -S
flag.
And each transaction performed as discussed above looks like
```
- BEGIN;
- UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
- SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
- UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
- UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
- INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
- END;
```
For each client database session simulation you can add more concurrent clients by using the flag -c
and providing a value
for ex : -c 100
which means you want to load test with 100 concurrent clients.
To distribute the load and utilize threads you can use the -j
flag which refers to jobs/threads.
Using the -t
flag you can specify the number of transactions you want each client to execute default is 10
.
Now combining the options we discussed we would test the postgres DB
pgbench -h localhost -p 5432 -U postgres -d mydb -c 100 -t 50 -j 4
Overall, this pgbench
command will simulate a workload where 50 clients each perform 50 transactions on the mydb
database using 4 threads for parallel execution.
Output and observation
```
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 50
number of threads: 4
number of transactions per client: 50
number of transactions actually processed: 2500/2500
latency average = 73.682 ms
initial connection time = 231.074 ms
tps = 678.588102 (without initial connection time)
```
If the all transactions are processed then the number of client * number of transaction per client
should be equal.
Latency average tells the average of latency of each transaction that was processed.
TPS is the transaction per second, the greater the number is the more transactions it can handle.
Additional resources
This is a very basic guide on using the utility you can additionally use a lot of options/flags to better suit your individual use case if you follow - https://www.postgresql.org/docs/current/pgbench.html
Initially reading more about the utility I came across another blog that you can read and follow to know more about
pgbench
and usage - https://kmoppel.github.io/2022-12-23-generating-lots-of-test-data-with-postgres-fast-and-faster/