In today's dynamic digital landscape, building resilient and scalable architectures is paramount. Particularly, managing a highly available PostgreSQL database cluster, orchestrated by Pgpool and running on Kubernetes, presents a robust solution. This infrastructure entails multiple PostgreSQL pods configured for high availability and data replication, complemented by Pgpool for load balancing and automatic failover. Leveraging Kubernetes StatefulSets, this setup ensures the seamless operation of critical workloads. The PostgreSQL cluster employs Streaming Replication, both Synchronous and Asynchronous, to maintain data consistency across replicas. In this guide, we'll navigate through the intricacies of deploying and managing such a setup, highlighting the pivotal role of Kubernetes in achieving resilience, scalability, and sustainability in database management.
The infrastructure consists of a highly available PostgreSQL database cluster managed by Pgpool, running on Kubernetes. The setup includes multiple PostgreSQL pods for high availability and data replication, and Pgpool for load balancing and automatic failover. The PostgreSQL cluster uses Streaming Replication (Synchronous/Asynchronous) to maintain data consistency across replicas.
Prerequisites - Installation of kubectl.
Deploying the infrastructure on Kubernetes cluster on Digital Ocean -
You are now inside the PostgreSQL client container. You can connect to the cluster using the following command - psql -h pgpool-svc -p 5432 -U postgres -d postgres -W <<< "$PASSWORD"
PGPASSWORD=$PASSWORD psql -h <IP ADDR> -p <NODEPORT ADDR> -U postgresRemember to replace <IP ADDR> with the public IP address of one of your nodes and <NODEPORT ADDR> with the NodePort address of the pgpool-svc-nodeport service. In this case <IP ADDR> = 34.138.59.54 and <NODEPORT ADDR> = 32000. The password for this infra setup is WbrTpN3g7q .You are ready for testing !!
CREATE DATABASE testdb;
\c testdb;
CREATE TABLE test_table (id SERIAL PRIMARY KEY, name VARCHAR(50));
INSERT INTO test_table (name) VALUES ('Data 1'), ('Data 2'), ('Data 3');
SELECT * FROM test_table;
PGPASSWORD=${PASSWORD} psql -h <standby-node-ip> -p <standby-node-port> -U postgres
Replace <standby-node-ip> and <standby-node-port> with the appropriate values for your setup.
Example - PGPASSWORD=${PASSWORD} psql -h 10.244.0.193 -p 5432 -U postgres
\c testdb;
SELECT * FROM test_table;
PGPASSWORD=${PASSWORD} psql -h 10.244.0.193 -p 5432 -U postgres -c "SELECT datname FROM pg_database;"
This should list the databases that are available in the standby nodes. If testdb is present in the list, it would mean that it has been replicated.
PGPASSWORD=${PASSWORD} psql -h 10.244.0.193 -p 5432 -U postgres -d testdb
The test_table table data can be checked using - SELECT * FROM test_table;
PGPASSWORD=WbrTpN${PASSWORD} 3g7q psql -h pgpool-svc -p 5432 -U postgres -c "SELECT * FROM pg_stat_replication;"
Look for the node with state = streaming in the result. Note the client_addr and client_port values
for this node. This is the standby node.
No. of entries you should see = No. of replicas of Postgres running minus 1. (2 in this case).
kubectl scale statefulset postgres-sts --replicas=0
Wait for a few moments to allow the cluster to detect the primary node failure and promote a new primary.
PGPASSWORD=${PASSWORD} psql -h pgpool-svc -p 5432 -U postgres
\c testdb;
SELECT * FROM test_table;
The data in the test_table should still be available on the new primary node, indicating successful
failover.
kubectl delete pod postgres-sts-0 -n database
kubectl get pods -n database
One of the standby pods should now take over as the new primary, and the PostgreSQL cluster
should continue to function with the new primary pod.
The output of kubectl get pods -n database should contain a primary node that is
younger than the previous standby pods (just born), this is because the standby pod is promoted
to the primary pod.
Sample logs of follower pod when the primary pod goes down:
PGPASSWORD=${PASSWORD} psql -h pgpool-svc -p 5432 -U postgres -c "SELECT * FROM pg_stat_replication;"
Pg pool is a middleware component that sits in front of the Postgres servers and acts as a gatekeeper to the cluster.
It mainly serves two purposes: Load balancing & Limiting the requests.
RepMgr is an open-source tool shipped with Postgres which serves two purposes: Replication & Failover.
Failover: It can handle failovers in the cluster i.e. it can promote a read-only server to a primary server when required.
References