22.08.2025 00:25
PERMALINK Deneme 3

22788 Reactions

Building a PostgreSQL HA Cluster with Ubuntu VMs

In today’s digital landscape, data is the lifeblood of nearly every business. From e-commerce platforms and financial services to healthcare systems and enterprise SaaS applications, the reliability of your database infrastructure directly impacts user experience, revenue, and operational continuity. That’s why high availability (HA) is no longer a luxury—it’s a necessity.

PostgreSQL, one of the most powerful and widely adopted open-source relational databases, offers robust features for scalability, performance, and data integrity. However, out-of-the-box PostgreSQL does not provide built-in high availability. To achieve true resilience, you need a carefully orchestrated cluster architecture that can handle failovers, load balancing, and real-time monitoring.

This blog post introduces the PostgreSQL-HA-Cluster-VM project—a fully automated, Docker-free solution for deploying a PostgreSQL high availability cluster on Ubuntu virtual machines. Designed for DevOps engineers, system administrators, and database architects, this project provides:

Seamless replication between master and replica nodes
Intelligent failover using Pgpool-II and keepalived
Real-time observability with Prometheus and Grafana
A single setup script for full automation
Full transparency and control over every component

Licensed under the MIT License, this project is free to use, modify, and distribute. Whether you're building a production-grade database cluster or experimenting in a lab environment, this project gives you the tools to deploy a resilient PostgreSQL infrastructure with confidence.

Project Philosophy: No Docker, Full Transparency,

Why I Avoided Docker

Here’s why: Most modern HA setups rely on Docker or Kubernetes. While these tools are powerful, they abstract away too much for my taste—especially when you’re trying to learn or debug. I wanted:

Full control over every configuration file
No abstraction layers between the OS and PostgreSQL
Real-world networking with static IPs and Netplan
Modular architecture with clear separation of roles
Educational value for anyone who wants to understand HA deeply
Transparency: I wanted to see every config file and log
Real networking: Docker’s virtual networks hide real-world behavior
Systemd integration: Managing services natively is more realistic
Automation: A single script (setup.sh) handles full deployment

This project is not just a deployment tool—it’s a learning platform.

Architecture Overview

The cluster consists of six Ubuntu Server 22.04 LTS virtual machines, each with a distinct role:

Node Name	Role	IP	Description
`pg-master`	`Primary PostgreSQL node`	`10.0.2.101`	`Handles writes and WAL archiving`
`pg-replica1`	`Standby node`	`10.0.2.102`	`Receives streaming replication`
`pg-replica2`	`Standby node`	`10.0.2.103`	`Adds redundancy`
`pgpool-node`	`Connection manager`	`10.0.2.104`	`Manages load balancing and failover`
`pgha-node`	`Watchdog controller`	`10.0.2.105`	`Oversees VIP and Pgpool health`
`monitoring-node`	`Prometheus + Grafana`	`10.0.2.106`	`Enables PostgreSQL monitoring Grafana dashboards`

1. PostgreSQL Master & Replicas

Master Node: pg-master
Replica Nodes: pg-replica1, pg-replica2
Replication: Achieved via pg_basebackup
Health Checks: Monitored using healthcheck.sh

2. Pgpool-II

Acts as a connection pooler and load balancer
Provides automatic failover and query routing
Uses Virtual IP (VIP) for seamless client access

3. PGHA Controller

Built with keepalived and watchdog.conf
Manages VIP failover between Pgpool nodes

4. Monitoring Stack

Prometheus collects metrics via postgres_exporter
Grafana visualizes metrics with prebuilt dashboards
Tracks replication lag, active connections, CPU/RAM usage, and more

Installation and Automation

The installation is fully scripted, making it a great example of PostgreSQL cluster automation. The process includes:

Static IP configuration for all nodes
PostgreSQL installation and configuration
PostgreSQL streaming replication configuration
Pgpool-II setup for connection pooling and failover
PGHA deployment for VIP failover
Monitoring stack with Prometheus PostgreSQL metrics
Healthcheck scripts for replication lag alerts

This approach is perfect for DevOps best practices and can be integrated into CI/CD pipelines for infrastructure provisioning.

Download Setup Script

sudo wget https://raw.githubusercontent.com/koray-karaman/PostgreSQL-HA-Cluster-VM/main/setup.sh -O setup.shchmod +x setup.sh

Run Setup Script

./setup.sh

Configure Networking

sudo cp configs/network-setup.yaml /etc/netplan/00-installer-config.yaml
sudo netplan apply

Monitoring & Alerts

Grafana dashboards include:

postgres_up
postgres_replication_lag_seconds
postgres_replication_lag_alert
postgres_in_recovery

You can customize alert thresholds using the LAG_THRESHOLD variable.

Replication and Failover Scenarios

PostgreSQL’s streaming replication is used to keep replicas in sync with the master. In case of failure, Pgpool and watchdog coordinate to promote a replica and reassign the virtual IP.

Planned Failover

Stop PostgreSQL on master
Promote replica using pg_ctl promote
Update Pgpool configuration
Restart Pgpool

Unplanned Failover

Pgpool detects failure
Watchdog promotes a replica
VIP is reassigned
Clients reconnect automatically

This setup ensures minimal downtime and automatic recovery.

Recovery

Reinitialize old master as replica
Use pg_basebackup for sync

Post-Deployment Checklist

Can all nodes ping each other?
Are PostgreSQL services running?
Is replication visible in pg_stat_replication?
Is Pgpool VIP correctly assigned?
Are Prometheus and Grafana dashboards active?

Best Practices

Version control your configuration files
Monitor replication lag regularly
Test failover scenarios in staging
Backup VIP configuration
Use verify.sh for health checks

Health Checks and Cron Jobs

A custom script healthcheck.sh runs periodic checks on the cluster:

Verifies replication status
Checks Pgpool node health
Logs anomalies

You can schedule it with cron:

(crontab -l 2>/dev/null; echo "* * * * * /opt/pg-ha/healthcheck.sh") | crontab -

This ensures continuous monitoring even outside Grafana.

Troubleshooting Tips

Issue	Solution
`VIP not moving`	`heck watchdog.conf and Pgpool connectivity`
`Replication lag too high`	`Investigate disk I/O and network latency`
`Prometheus not scraping`	`Verify targets and firewall rules`
`Grafana dashboard empty`	`Ensure exporters are running`

Lessons Learned and Honest Reflections

This project taught me a lot—but it wasn’t all smooth sailing. Here are some hard-earned lessons:

1. Pgpool Is Powerful but Fragile

Pgpool-II is a beast. It handles connection pooling, load balancing, and failover—but its configuration is notoriously complex. A single misconfigured line in pgpool.conf can break the entire cluster. I spent hours debugging watchdog behavior and VIP transitions.

2. VIP Failover Isn’t Always Instant

While Pgpool and watchdog handle VIP reassignment, it’s not always seamless. In some cases, the VIP lingers on the failed node due to network delays or misconfigured ARP tables. This can cause brief outages.

3. Monitoring Needs More Alerts

Prometheus and Grafana provide great visibility, but alerting is minimal. I plan to add Slack and email notifications for critical events like failover or replication lag spikes.

4. No Backup System Yet

This cluster handles availability—but not backups. That’s a major gap. I’m exploring tools like pgBackRest and barman to integrate automated backups and point-in-time recovery.

5. Resource Usage Is High

Running six VMs locally is resource-intensive. This setup is better suited for cloud environments or powerful physical servers. On a typical laptop, you’ll hit CPU and RAM limits quickly.

MIT License

This project is released under the MIT License:

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction.

You are free to use, modify, and distribute this project with proper attribution.

Contribute

Want to improve or extend the project?

Fork the repo and submit a pull request
Open issues for bugs or feature requests
Star the project to support its visibility

GitHub Repository: PostgreSQL-HA-Cluster-VM

Conclusion

Building a high availability PostgreSQL cluster may sound complex—but with the right tools and architecture, it becomes a repeatable, scalable, and maintainable process. The PostgreSQL HA Cluster VM project simplifies this journey by offering a complete, open-source solution that runs on Ubuntu virtual machines without relying on containerization or external orchestration platforms.

By combining PostgreSQL replication, Pgpool-II failover logic, VIP management via keepalived, and robust monitoring with Prometheus and Grafana, this project delivers a production-ready HA setup that can be deployed in minutes. Whether you're managing mission-critical applications or preparing for disaster recovery scenarios, this architecture ensures your data remains available, consistent, and protected.

And because it's licensed under the MIT License, you're free to adapt it to your own infrastructure, contribute improvements, or integrate it into larger automation pipelines. The project is designed to be transparent, modular, and extensible—perfect for teams that value control and clarity.

If you're ready to take your PostgreSQL deployment to the next level, explore the GitHub repository, try the setup script, and start building a truly resilient database cluster today.

Koray Karaman

Browse on Mobile