PERMALINK Deneme 3
Building a PostgreSQL HA Cluster with Ubuntu VMs
In today’s digital landscape, data is the lifeblood of nearly every business. From e-commerce platforms and financial services to healthcare systems and enterprise SaaS applications, the reliability of your database infrastructure directly impacts user experience, revenue, and operational continuity. That’s why high availability (HA) is no longer a luxury—it’s a necessity.
PostgreSQL, one of the most powerful and widely adopted open-source relational databases, offers robust features for scalability, performance, and data integrity. However, out-of-the-box PostgreSQL does not provide built-in high availability. To achieve true resilience, you need a carefully orchestrated cluster architecture that can handle failovers, load balancing, and real-time monitoring.
This blog post introduces the PostgreSQL-HA-Cluster-VM project—a fully automated, Docker-free solution for deploying a PostgreSQL high availability cluster on Ubuntu virtual machines. Designed for DevOps engineers, system administrators, and database architects, this project provides:
- Seamless replication between master and replica nodes
- Intelligent failover using Pgpool-II and keepalived
- Real-time observability with Prometheus and Grafana
- A single setup script for full automation
- Full transparency and control over every component
Licensed under the MIT License, this project is free to use, modify, and distribute. Whether you're building a production-grade database cluster or experimenting in a lab environment, this project gives you the tools to deploy a resilient PostgreSQL infrastructure with confidence.
Project Philosophy: No Docker, Full Transparency,
Why I Avoided Docker
Here’s why: Most modern HA setups rely on Docker or Kubernetes. While these tools are powerful, they abstract away too much for my taste—especially when you’re trying to learn or debug. I wanted:
- Full control over every configuration file
- No abstraction layers between the OS and PostgreSQL
- Real-world networking with static IPs and Netplan
- Modular architecture with clear separation of roles
- Educational value for anyone who wants to understand HA deeply
- Transparency: I wanted to see every config file and log
- Real networking: Docker’s virtual networks hide real-world behavior
- Systemd integration: Managing services natively is more realistic
- Automation: A single script (
setup.sh) handles full deployment
This project is not just a deployment tool—it’s a learning platform.
Architecture Overview
The cluster consists of six Ubuntu Server 22.04 LTS virtual machines, each with a distinct role:
| Node Name | Role | IP | Description |
|---|---|---|---|
pg-master |
Primary PostgreSQL node |
10.0.2.101 |
Handles writes and WAL archiving |
pg-replica1 |
Standby node |
10.0.2.102 |
Receives streaming replication |
pg-replica2 |
Standby node |
10.0.2.103 |
Adds redundancy |
pgpool-node |
Connection manager |
10.0.2.104 |
Manages load balancing and failover |
pgha-node |
Watchdog controller |
10.0.2.105 |
Oversees VIP and Pgpool health |
monitoring-node |
Prometheus + Grafana |
10.0.2.106 |
Enables PostgreSQL monitoring Grafana dashboards |
1. PostgreSQL Master & Replicas
- Master Node:
pg-master - Replica Nodes:
pg-replica1,pg-replica2 - Replication: Achieved via
pg_basebackup - Health Checks: Monitored using
healthcheck.sh
2. Pgpool-II
- Acts as a connection pooler and load balancer
- Provides automatic failover and query routing
- Uses Virtual IP (VIP) for seamless client access
3. PGHA Controller
- Built with
keepalivedandwatchdog.conf - Manages VIP failover between Pgpool nodes
4. Monitoring Stack
- Prometheus collects metrics via
postgres_exporter - Grafana visualizes metrics with prebuilt dashboards
- Tracks replication lag, active connections, CPU/RAM usage, and more
Installation and Automation
The installation is fully scripted, making it a great example of PostgreSQL cluster automation. The process includes:
- Static IP configuration for all nodes
- PostgreSQL installation and configuration
- PostgreSQL streaming replication configuration
- Pgpool-II setup for connection pooling and failover
- PGHA deployment for VIP failover
- Monitoring stack with Prometheus PostgreSQL metrics
- Healthcheck scripts for replication lag alerts
This approach is perfect for DevOps best practices and can be integrated into CI/CD pipelines for infrastructure provisioning.
Download Setup Script
sudo wget https://raw.githubusercontent.com/koray-karaman/PostgreSQL-HA-Cluster-VM/main/setup.sh -O setup.shchmod +x setup.sh
Run Setup Script
./setup.sh
Configure Networking
sudo cp configs/network-setup.yaml /etc/netplan/00-installer-config.yaml
sudo netplan apply
Monitoring & Alerts
Grafana dashboards include:
postgres_uppostgres_replication_lag_secondspostgres_replication_lag_alertpostgres_in_recovery
You can customize alert thresholds using the LAG_THRESHOLD variable.
Replication and Failover Scenarios
PostgreSQL’s streaming replication is used to keep replicas in sync with the master. In case of failure, Pgpool and watchdog coordinate to promote a replica and reassign the virtual IP.
Planned Failover
- Stop PostgreSQL on master
- Promote replica using
pg_ctl promote - Update Pgpool configuration
- Restart Pgpool
Unplanned Failover
- Pgpool detects failure
- Watchdog promotes a replica
- VIP is reassigned
- Clients reconnect automatically
This setup ensures minimal downtime and automatic recovery.
Recovery
- Reinitialize old master as replica
- Use
pg_basebackupfor sync
Post-Deployment Checklist
- Can all nodes ping each other?
- Are PostgreSQL services running?
- Is replication visible in
pg_stat_replication? - Is Pgpool VIP correctly assigned?
- Are Prometheus and Grafana dashboards active?
Best Practices
- Version control your configuration files
- Monitor replication lag regularly
- Test failover scenarios in staging
- Backup VIP configuration
- Use
verify.shfor health checks
Health Checks and Cron Jobs
A custom script healthcheck.sh runs periodic checks on the cluster:
- Verifies replication status
- Checks Pgpool node health
- Logs anomalies
You can schedule it with cron:
(crontab -l 2>/dev/null; echo "* * * * * /opt/pg-ha/healthcheck.sh") | crontab -
This ensures continuous monitoring even outside Grafana.
Troubleshooting Tips
| Issue | Solution |
|---|---|
VIP not moving |
heck watchdog.conf and Pgpool connectivity |
Replication lag too high |
Investigate disk I/O and network latency |
Prometheus not scraping |
Verify targets and firewall rules |
Grafana dashboard empty |
Ensure exporters are running |
Lessons Learned and Honest Reflections
This project taught me a lot—but it wasn’t all smooth sailing. Here are some hard-earned lessons:
1. Pgpool Is Powerful but Fragile
Pgpool-II is a beast. It handles connection pooling, load balancing, and failover—but its configuration is notoriously complex. A single misconfigured line in pgpool.conf can break the entire cluster. I spent hours debugging watchdog behavior and VIP transitions.
2. VIP Failover Isn’t Always Instant
While Pgpool and watchdog handle VIP reassignment, it’s not always seamless. In some cases, the VIP lingers on the failed node due to network delays or misconfigured ARP tables. This can cause brief outages.
3. Monitoring Needs More Alerts
Prometheus and Grafana provide great visibility, but alerting is minimal. I plan to add Slack and email notifications for critical events like failover or replication lag spikes.
4. No Backup System Yet
This cluster handles availability—but not backups. That’s a major gap. I’m exploring tools like pgBackRest and barman to integrate automated backups and point-in-time recovery.
5. Resource Usage Is High
Running six VMs locally is resource-intensive. This setup is better suited for cloud environments or powerful physical servers. On a typical laptop, you’ll hit CPU and RAM limits quickly.
MIT License
This project is released under the MIT License:
MIT License
Copyright (c) 2025 Koray Karaman
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction.
You are free to use, modify, and distribute this project with proper attribution.
Contribute
Want to improve or extend the project?
- Fork the repo and submit a pull request
- Open issues for bugs or feature requests
- Star the project to support its visibility
GitHub Repository: PostgreSQL-HA-Cluster-VM
Conclusion
Building a high availability PostgreSQL cluster may sound complex—but with the right tools and architecture, it becomes a repeatable, scalable, and maintainable process. The PostgreSQL HA Cluster VM project simplifies this journey by offering a complete, open-source solution that runs on Ubuntu virtual machines without relying on containerization or external orchestration platforms.
By combining PostgreSQL replication, Pgpool-II failover logic, VIP management via keepalived, and robust monitoring with Prometheus and Grafana, this project delivers a production-ready HA setup that can be deployed in minutes. Whether you're managing mission-critical applications or preparing for disaster recovery scenarios, this architecture ensures your data remains available, consistent, and protected.
And because it's licensed under the MIT License, you're free to adapt it to your own infrastructure, contribute improvements, or integrate it into larger automation pipelines. The project is designed to be transparent, modular, and extensible—perfect for teams that value control and clarity.
If you're ready to take your PostgreSQL deployment to the next level, explore the GitHub repository, try the setup script, and start building a truly resilient database cluster today.
Koray Karaman