How to Set Up Apache DolphinScheduler with PostgreSQL and Zookeeper on Linux



This content originally appeared on HackerNoon and was authored by William Guo

This article will guide you step-by-step on how to start Apache DolphinScheduler using external PostgreSQL and Zookeeper. Whether you’re a beginner or an experienced developer, you can easily follow these steps to complete the installation and configuration in a Linux/Unix environment. In addition to the standard installation steps, we also share some cluster deployment tips to help you scale your system with ease.

Of course, if you encounter issues such as database connections, Zookeeper connections, or service startup problems, don’t worry—this tutorial includes detailed troubleshooting steps to help you resolve them quickly.

System Requirements

  • Operating System: Linux/Unix (CentOS 7+ or Ubuntu 16.04+ recommended)
  • Java Environment: JDK 1.8+
  • Database: PostgreSQL 9.6+
  • Distributed Coordination Service: Zookeeper 3.4.6+
  • Memory: At least 4GB recommended
  • Disk Space: At least 10GB recommended

Preparations

  1. Install and Configure PostgreSQL
# Install PostgreSQL (CentOS example)
sudo yum install -y postgresql-server postgresql-contrib

# Initialize the database
sudo postgresql-setup initdb

# Start the service
sudo systemctl start postgresql
sudo systemctl enable postgresql

# Create DolphinScheduler database and user
sudo -u postgres psql -c "CREATE USER dolphinscheduler WITH PASSWORD 'yourpassword';"
sudo -u postgres psql -c "CREATE DATABASE dolphinscheduler OWNER dolphinscheduler;"
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE dolphinscheduler TO dolphinscheduler;"

# Modify pg_hba.conf
sudo vi /var/lib/pgsql/data/pg_hba.conf
# Add or modify the following line:
host    all             all             0.0.0.0/0               md5

# Modify postgresql.conf
sudo vi /var/lib/pgsql/data/postgresql.conf
# Change listen_addresses to:
listen_addresses = '*'

# Restart PostgreSQL
sudo systemctl restart postgresql
  1. Install and Configure Zookeeper
# Download Zookeeper
wget https://downloads.apache.org/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz
tar -xzf apache-zookeeper-3.7.1-bin.tar.gz
mv apache-zookeeper-3.7.1-bin /opt/zookeeper

# Configure Zookeeper
cd /opt/zookeeper/conf
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
# Set data directory and server configuration (if clustered)
dataDir=/opt/zookeeper/data
# No need to change server settings for standalone mode

# Create data directory
mkdir /opt/zookeeper/data

# Start Zookeeper
/opt/zookeeper/bin/zkServer.sh start

Install and Configure DolphinScheduler 3.1.9

  1. Download and Extract
wget https://downloads.apache.org/dolphinscheduler/3.1.9/apache-dolphinscheduler-3.1.9-bin.tar.gz
tar -xzf apache-dolphinscheduler-3.1.9-bin.tar.gz
mv apache-dolphinscheduler-3.1.9-bin /opt/dolphinscheduler
  1. Modify Configuration Files Edit common.properties
vi /opt/dolphinscheduler/conf/common.properties

Make the following changes:

# Database config
spring.datasource.driver-class-name=org.postgresql.Driver
spring.datasource.url=jdbc:postgresql://your-postgresql-server:5432/dolphinscheduler
spring.datasource.username=dolphinscheduler
spring.datasource.password=yourpassword

# Zookeeper config
registry.plugin.name=zookeeper
registry.plugin.type=zookeeper
registry.servers=your-zookeeper-server:2181

Optional: Modify environment variables

vi /opt/dolphinscheduler/conf/env/dolphinscheduler_env.sh

Add or update Java environment variables:

export JAVA_HOME=/usr/java/jdk1.8.0_291
export PATH=$JAVA_HOME/bin:$PATH
  1. Initialize the Database
/opt/dolphinscheduler/script/create-dolphinscheduler.sh
  1. Start Services Start Master Server
/opt/dolphinscheduler/bin/dolphinscheduler-daemon.sh start master-server

Start Worker Server

/opt/dolphinscheduler/bin/dolphinscheduler-daemon.sh start worker-server

Start API Server

/opt/dolphinscheduler/bin/dolphinscheduler-daemon.sh start api-server

Start Alert Server

/opt/dolphinscheduler/bin/dolphinscheduler-daemon.sh start alert-server

Verify Installation

  1. Check process status:
ps -ef | grep dolphinscheduler
  1. Access the Web UI: Default Port: 12345 Access URL: http://your-server-ip:12345/dolphinscheduler Default username/password: admin/dolphinscheduler123

Cluster Deployment Guide

Cluster Mode Deployment Steps

If you want to deploy in cluster mode, follow these steps:

  1. Deploy Worker Servers on Multiple Nodes

Node Requirements

  • Deploy Worker Servers on at least 3 nodes (odd number recommended)
  • Each node must have the same package version
  • Recommended server specs:
  • CPU: 4 cores or more
  • Memory: 8GB or more
  • Disk: 100GB+ (adjust based on data volume)

Example Deployment Plan

  • Node 1 (Primary): Master Server + Worker Server
  • IP: 192.168.1.101
  • Role: Master + Worker
  • Node 2 (Worker): Worker Server
  • IP: 192.168.1.102
  • Role: Worker
  • Node 3 (Worker): Worker Server
  • IP: 192.168.1.103
  • Role: Worker

Installation Notes

  1. Run the same installation script on all nodes
  2. Ensure the installation paths are consistent across nodes
  3. Verify network connectivity between nodes (use ping/telnet)
  4. Configure registry.servers

Detailed Configuration Steps

  1. Edit common.properties on all nodes
  • File path: /opt/your_app/conf/common.properties
  1. Set registry.servers to your Zookeeper cluster addresses
  2. Example format:
registry.servers=zk1:2181,zk2:2181,zk3:2181

Configuration Verification

  1. Use zkCli.sh to verify Zookeeper config
./zkCli.sh -server zk1:2181
  1. Check node registration:
ls /your_app/nodes
  1. Time Synchronization Configuration

Detailed Time Sync Plan All nodes must maintain time sync (within 1-second drift). Recommended steps:

NTP Setup

  1. Install NTP:
yum install -y ntp
  1. Sync with NTP server (Aliyun example):
ntpdate ntp.aliyun.com
  1. Set auto-sync:
# Enable at startup
systemctl enable ntpd
# Start service
systemctl start ntpd
  1. Verify sync:
ntpq -p
date

Alternative Time Sync Option If external NTP server is inaccessible, set up an internal time server:

  1. Designate one server as the time source
  2. Sync all other nodes with that server
  3. Example config:
ntpdate 192.168.1.100

Time Sync Notes

  • Recommended to set up a crontab job for periodic sync:
*/5 * * * * /usr/sbin/ntpdate ntp.aliyun.com >/dev/null 2>&1
  • For systems sensitive to time (e.g., finance), maintain <100ms drift

Common Troubleshooting

Database Connection Issues

  1. PostgreSQL Remote Access Config
  • Check pg_hba.conf file and ensure it includes:
host    all             all             0.0.0.0/0               md5
  • Restart PostgreSQL after changes
  1. Credential Verification
  • Test connection with psql:
psql -h [host] -U [username] -d [database]
  • Ensure password is correct
  1. Firewall Check
  • Check if port 5432 is open:
firewall-cmd --list-all
  • Open the port if needed:
firewall-cmd --zone=public --add-port=5432/tcp --permanent
firewall-cmd --reload

Zookeeper Connection Issues

  1. Basic Connection Test
  • Use telnet:
telnet your-zookeeper-server 2181
  • Should show: “Connected to your-zookeeper-server”
  1. Log Check
  • View Zookeeper logs:
tail -f /var/log/zookeeper/zookeeper.log
  • Common issues:
  • Insufficient disk space
  • Low memory allocation
  • Improper cluster config

Service Startup Issues

  1. Log Analysis
  • Check main log file:
tail -n 100 /opt/dolphinscheduler/logs/dolphinscheduler-api.log
  • Check other component logs:
/opt/dolphinscheduler/logs/
├── dolphinscheduler-alert-server.log
├── dolphinscheduler-api-server.log
├── dolphinscheduler-master-server.log
└── dolphinscheduler-worker-server.log
  1. Java Environment Check
  • Verify Java version:
java -version
- Requirement: JDK 1.8+
  • Check JAVA_HOME:
echo $JAVA_HOME
  • Check memory settings:
jmap -heap <pid>
  1. Port Conflict Check
  • Check port usage:
netstat -tunlp | grep [port]
  • Default ports:
  • Master Server: 5678
  • Worker Server: 1234
  • API Server: 12345

\


This content originally appeared on HackerNoon and was authored by William Guo