This content originally appeared on HackerNoon and was authored by William Guo
This article will guide you step-by-step on how to start Apache DolphinScheduler using external PostgreSQL and Zookeeper. Whether you’re a beginner or an experienced developer, you can easily follow these steps to complete the installation and configuration in a Linux/Unix environment. In addition to the standard installation steps, we also share some cluster deployment tips to help you scale your system with ease.
Of course, if you encounter issues such as database connections, Zookeeper connections, or service startup problems, don’t worry—this tutorial includes detailed troubleshooting steps to help you resolve them quickly.
System Requirements
- Operating System: Linux/Unix (CentOS 7+ or Ubuntu 16.04+ recommended)
- Java Environment: JDK 1.8+
- Database: PostgreSQL 9.6+
- Distributed Coordination Service: Zookeeper 3.4.6+
- Memory: At least 4GB recommended
- Disk Space: At least 10GB recommended
Preparations
- Install and Configure PostgreSQL
# Install PostgreSQL (CentOS example)
sudo yum install -y postgresql-server postgresql-contrib
# Initialize the database
sudo postgresql-setup initdb
# Start the service
sudo systemctl start postgresql
sudo systemctl enable postgresql
# Create DolphinScheduler database and user
sudo -u postgres psql -c "CREATE USER dolphinscheduler WITH PASSWORD 'yourpassword';"
sudo -u postgres psql -c "CREATE DATABASE dolphinscheduler OWNER dolphinscheduler;"
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE dolphinscheduler TO dolphinscheduler;"
# Modify pg_hba.conf
sudo vi /var/lib/pgsql/data/pg_hba.conf
# Add or modify the following line:
host all all 0.0.0.0/0 md5
# Modify postgresql.conf
sudo vi /var/lib/pgsql/data/postgresql.conf
# Change listen_addresses to:
listen_addresses = '*'
# Restart PostgreSQL
sudo systemctl restart postgresql
- Install and Configure Zookeeper
# Download Zookeeper
wget https://downloads.apache.org/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz
tar -xzf apache-zookeeper-3.7.1-bin.tar.gz
mv apache-zookeeper-3.7.1-bin /opt/zookeeper
# Configure Zookeeper
cd /opt/zookeeper/conf
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
# Set data directory and server configuration (if clustered)
dataDir=/opt/zookeeper/data
# No need to change server settings for standalone mode
# Create data directory
mkdir /opt/zookeeper/data
# Start Zookeeper
/opt/zookeeper/bin/zkServer.sh start
Install and Configure DolphinScheduler 3.1.9
- Download and Extract
wget https://downloads.apache.org/dolphinscheduler/3.1.9/apache-dolphinscheduler-3.1.9-bin.tar.gz
tar -xzf apache-dolphinscheduler-3.1.9-bin.tar.gz
mv apache-dolphinscheduler-3.1.9-bin /opt/dolphinscheduler
- Modify Configuration Files Edit
common.properties
vi /opt/dolphinscheduler/conf/common.properties
Make the following changes:
# Database config
spring.datasource.driver-class-name=org.postgresql.Driver
spring.datasource.url=jdbc:postgresql://your-postgresql-server:5432/dolphinscheduler
spring.datasource.username=dolphinscheduler
spring.datasource.password=yourpassword
# Zookeeper config
registry.plugin.name=zookeeper
registry.plugin.type=zookeeper
registry.servers=your-zookeeper-server:2181
Optional: Modify environment variables
vi /opt/dolphinscheduler/conf/env/dolphinscheduler_env.sh
Add or update Java environment variables:
export JAVA_HOME=/usr/java/jdk1.8.0_291
export PATH=$JAVA_HOME/bin:$PATH
- Initialize the Database
/opt/dolphinscheduler/script/create-dolphinscheduler.sh
- Start Services Start Master Server
/opt/dolphinscheduler/bin/dolphinscheduler-daemon.sh start master-server
Start Worker Server
/opt/dolphinscheduler/bin/dolphinscheduler-daemon.sh start worker-server
Start API Server
/opt/dolphinscheduler/bin/dolphinscheduler-daemon.sh start api-server
Start Alert Server
/opt/dolphinscheduler/bin/dolphinscheduler-daemon.sh start alert-server
Verify Installation
- Check process status:
ps -ef | grep dolphinscheduler
- Access the Web UI: Default Port: 12345 Access URL: http://your-server-ip:12345/dolphinscheduler Default username/password: admin/dolphinscheduler123
Cluster Deployment Guide
Cluster Mode Deployment Steps
If you want to deploy in cluster mode, follow these steps:
- Deploy Worker Servers on Multiple Nodes
Node Requirements
- Deploy Worker Servers on at least 3 nodes (odd number recommended)
- Each node must have the same package version
- Recommended server specs:
- CPU: 4 cores or more
- Memory: 8GB or more
- Disk: 100GB+ (adjust based on data volume)
Example Deployment Plan
- Node 1 (Primary): Master Server + Worker Server
- IP: 192.168.1.101
- Role: Master + Worker
- Node 2 (Worker): Worker Server
- IP: 192.168.1.102
- Role: Worker
- Node 3 (Worker): Worker Server
- IP: 192.168.1.103
- Role: Worker
Installation Notes
- Run the same installation script on all nodes
- Ensure the installation paths are consistent across nodes
- Verify network connectivity between nodes (use ping/telnet)
- Configure
registry.servers
Detailed Configuration Steps
- Edit
common.properties
on all nodes
- File path:
/opt/your_app/conf/common.properties
- Set
registry.servers
to your Zookeeper cluster addresses - Example format:
registry.servers=zk1:2181,zk2:2181,zk3:2181
Configuration Verification
- Use zkCli.sh to verify Zookeeper config
./zkCli.sh -server zk1:2181
- Check node registration:
ls /your_app/nodes
- Time Synchronization Configuration
Detailed Time Sync Plan All nodes must maintain time sync (within 1-second drift). Recommended steps:
NTP Setup
- Install NTP:
yum install -y ntp
- Sync with NTP server (Aliyun example):
ntpdate ntp.aliyun.com
- Set auto-sync:
# Enable at startup
systemctl enable ntpd
# Start service
systemctl start ntpd
- Verify sync:
ntpq -p
date
Alternative Time Sync Option If external NTP server is inaccessible, set up an internal time server:
- Designate one server as the time source
- Sync all other nodes with that server
- Example config:
ntpdate 192.168.1.100
Time Sync Notes
- Recommended to set up a crontab job for periodic sync:
*/5 * * * * /usr/sbin/ntpdate ntp.aliyun.com >/dev/null 2>&1
- For systems sensitive to time (e.g., finance), maintain <100ms drift
Common Troubleshooting
Database Connection Issues
- PostgreSQL Remote Access Config
- Check
pg_hba.conf
file and ensure it includes:
host all all 0.0.0.0/0 md5
- Restart PostgreSQL after changes
- Credential Verification
- Test connection with psql:
psql -h [host] -U [username] -d [database]
- Ensure password is correct
- Firewall Check
- Check if port 5432 is open:
firewall-cmd --list-all
- Open the port if needed:
firewall-cmd --zone=public --add-port=5432/tcp --permanent
firewall-cmd --reload
Zookeeper Connection Issues
- Basic Connection Test
- Use telnet:
telnet your-zookeeper-server 2181
- Should show: “Connected to your-zookeeper-server”
- Log Check
- View Zookeeper logs:
tail -f /var/log/zookeeper/zookeeper.log
- Common issues:
- Insufficient disk space
- Low memory allocation
- Improper cluster config
Service Startup Issues
- Log Analysis
- Check main log file:
tail -n 100 /opt/dolphinscheduler/logs/dolphinscheduler-api.log
- Check other component logs:
/opt/dolphinscheduler/logs/
├── dolphinscheduler-alert-server.log
├── dolphinscheduler-api-server.log
├── dolphinscheduler-master-server.log
└── dolphinscheduler-worker-server.log
- Java Environment Check
- Verify Java version:
java -version
- Requirement: JDK 1.8+
- Check JAVA_HOME:
echo $JAVA_HOME
- Check memory settings:
jmap -heap <pid>
- Port Conflict Check
- Check port usage:
netstat -tunlp | grep [port]
- Default ports:
- Master Server: 5678
- Worker Server: 1234
- API Server: 12345
\
This content originally appeared on HackerNoon and was authored by William Guo