My Blogs & Articles

Load testing Postgres using `pgbench` utility️

A lot of times during testing my postrges db I have come across situations and asked questions like

  • How do I generate lots of data to test my postgres db instance ?
  • How do I test my db against a lot of concurrent users ?
  • How do I test each user with more than 1 transaction being performed at a given time?

And that is where I researched a bit to look for a tool or utility that can help me with my requirements, turns out there is an existing utility shipped with postgres called pgbench. I will discuss below the use case provide a quick guide on how to use this utility.

Use-Cases

This blog post covers pgbench a postgres utility which you can use to benchmark your PostgresDB. It has a lot of use cases and I would like to mention below a small guide on using the utility.

But before jumping right into the commands, first lets understand what pgbench is. pgbench is a postgres extension binary that is shipped with postgres-client. pgbench is a loose TPC-B type of benchmarking program. A TPC-B type of benchmark is one that measures the performance of a database system in a transactional processing workload, you can read about it more here [https://www.tpc.org/tpcb/]

By default pgbench runs a sequence of same commands involving seven SELECT, UPDATE, and INSERT commands per transaction. Optionally you can also supply your transactional scripts to execute.

Usage

If you run pgbench in default it needs to initialize some additional tables in your database to perform load test, the tables being - pgbench_accounts, pgbench_branches, pgbench_history, and pgbench_tellers.

Assuming you are running a local postgres instance to initialize pgbench you can use the following command to initialize without options

```
pgbench -i -h localhost -p 5432 -U postgres -d mydb
```

this will generate the following new tables in the public schema

 public | pgbench_accounts      | table | postgres
 public | pgbench_branches      | table | postgres
 public | pgbench_history       | table | postgres
 public | pgbench_tellers       | table | postgres

In default settings the above tables have been generated with a scaling factor of 1 which means the number of rows will be:

```
table                   # of rows
---------------------------------
pgbench_branches        1
pgbench_tellers         10
pgbench_accounts        100000
pgbench_history         0
```

You can customize the scaling factor to increase the number of rows in the tables using -S flag.

And each transaction performed as discussed above looks like

```
- BEGIN;

- UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

- SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

- UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;

- UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;

- INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

- END;
```

For each client database session simulation you can add more concurrent clients by using the flag -c and providing a value for ex : -c 100 which means you want to load test with 100 concurrent clients.

To distribute the load and utilize threads you can use the -j flag which refers to jobs/threads.

Using the -t flag you can specify the number of transactions you want each client to execute default is 10.

Now combining the options we discussed we would test the postgres DB

pgbench -h localhost -p 5432 -U postgres -d mydb -c 100 -t 50 -j 4

Overall, this pgbench command will simulate a workload where 50 clients each perform 50 transactions on the mydb database using 4 threads for parallel execution.

Output and observation

```
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 50
number of threads: 4
number of transactions per client: 50
number of transactions actually processed: 2500/2500
latency average = 73.682 ms
initial connection time = 231.074 ms
tps = 678.588102 (without initial connection time)
```

If the all transactions are processed then the number of client * number of transaction per client should be equal.

Latency average tells the average of latency of each transaction that was processed.

TPS is the transaction per second, the greater the number is the more transactions it can handle.

Additional resources

This is a very basic guide on using the utility you can additionally use a lot of options/flags to better suit your individual use case if you follow - https://www.postgresql.org/docs/current/pgbench.html

Initially reading more about the utility I came across another blog that you can read and follow to know more about pgbench and usage - https://kmoppel.github.io/2022-12-23-generating-lots-of-test-data-with-postgres-fast-and-faster/