Cherry's blog: PostgreSQL - Streaming Replication (Hot Standby)

1. General information

Master Node

IP: 192.168.56.111

Hostname: postgres-master

Version: postgres (PostgreSQL) 16.4

$PGDATA=/var/lib/pgsql/16/data

postgres=# \db

List of tablespaces

Name | Owner | Location

------------+----------+------------------------------------

dvd_data | tucdt | /var/lib/pgsql/dvdrental_tbs/data

dvd_index | tucdt | /var/lib/pgsql/dvdrental_tbs/index

pg_default | postgres |

pg_global | postgres |

(4 rows)

Slave Node

IP: 192.168.56.222

Hostname: postgres-slave

Version: postgres (PostgreSQL) 16.4

$PGDATA=/var/lib/pgsql/16/data // empty

2. Preparation on Master/Slave

Disable Selinux : SELINUX=disabled
Stop Firewall
Config /etc/hosts

192.168.56.111 postgres-master.taolaoxibup.com postgres-master
192.168.56.222 postgres-slave.taolaoxibup.com postgres-slave

Setup SSH KEY-PAIR

ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@192.168.56.[111/222]

3. Master - Configuration

3.1 Create user for replication

postgres=# CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'P@ssw0rd';

3.2 Create replication slot

Replication slots provide an automated way to ensure that the primary does not remove WAL segments until they have been received by all standbys, and that the primary does not remove rows which could cause a recovery conflict even when the standby is disconnected.
In lieu of using replication slots, it is possible to prevent the removal of old WAL segments using wal_keep_size, or by storing the segments in an archive using archive_command or archive_library

postgres=# select * from pg_replication_slots;

postgres=# select * from pg_create_physical_replication_slot('cherry');

postgres=# select * from pg_replication_slots;

-[ RECORD 1 ]-------+---------

slot_name | cherry

plugin |

slot_type | physical

datoid |

database |

temporary | f

active | f

active_pid |

xmin |

catalog_xmin |

restart_lsn |

confirmed_flush_lsn |

wal_status |

safe_wal_size |

two_phase | f

conflicting |

3.3 Configure parameters

alter system set wal_level TO 'hot_standby';

alter system set archive_mode to 'on';

alter system set archive_command = 'test ! -f /var/lib/pgsql/backup/%f && cp %p /var/lib/pgsql/backup/%f';

alter system set max_wal_senders to '10';

alter system set wal_keep_size = '1024 MB';

Restart cluster database to apply changes

[root@postgres-master ~]# systemctl restart postgresql-16.service

[root@postgres-master ~]# systemctl status postgresql-16.service

4. Slave - Replication from Master

$PGDATA must be empty
Do not create any tablespace locations. These locations will be created during replication processing

[postgres@postgres-slave ~]$ pg_basebackup -h postgres-master -U replicator -D $PGDATA -P -Xs -R -v

pg_basebackup: initiating base backup, waiting for checkpoint to complete

pg_basebackup: checkpoint completed

pg_basebackup: write-ahead log start point: 0/5000028 on timeline 1

pg_basebackup: starting background WAL receiver

pg_basebackup: created temporary replication slot "pg_basebackup_7922"

39497/39497 kB (100%), 3/3 tablespaces

pg_basebackup: write-ahead log end point: 0/5000100

pg_basebackup: waiting for background process to finish streaming ...

pg_basebackup: syncing data to disk ...

pg_basebackup: renaming backup_manifest.tmp to backup_manifest

pg_basebackup: base backup completed

💩 Verify the $PGDATA

[postgres@postgres-slave ~]$ cd $PGDATA

[postgres@postgres-slave data]$ ls -la

total 264

drwx------. 20 postgres postgres 4096 Sep 13 09:30 .

drwx------. 3 postgres postgres 18 Sep 13 09:27 ..

-rw------- 1 postgres postgres 225 Sep 13 09:30 backup_label

-rw------- 1 postgres postgres 198549 Sep 13 09:30 backup_manifest

drwx------ 6 postgres postgres 46 Sep 13 09:30 base

-rw------- 1 postgres postgres 30 Sep 13 09:30 current_logfiles

drwx------ 2 postgres postgres 4096 Sep 13 09:30 global

drwx------ 2 postgres postgres 136 Sep 13 09:30 log

drwx------ 2 postgres postgres 18 Sep 13 09:30 pg_commit_ts

drwx------ 2 postgres postgres 6 Sep 13 09:30 pg_dynshmem

-rw------- 1 postgres postgres 5458 Sep 13 09:30 pg_hba.conf

-rw------- 1 postgres postgres 2640 Sep 13 09:30 pg_ident.conf

drwx------ 4 postgres postgres 68 Sep 13 09:30 pg_logical

drwx------ 4 postgres postgres 36 Sep 13 09:30 pg_multixact

drwx------ 2 postgres postgres 6 Sep 13 09:30 pg_notify

drwx------ 2 postgres postgres 6 Sep 13 09:30 pg_replslot

drwx------ 2 postgres postgres 6 Sep 13 09:30 pg_serial

drwx------ 2 postgres postgres 6 Sep 13 09:30 pg_snapshots

drwx------ 2 postgres postgres 6 Sep 13 09:30 pg_stat

drwx------ 2 postgres postgres 6 Sep 13 09:30 pg_stat_tmp

drwx------ 2 postgres postgres 6 Sep 13 09:30 pg_subtrans

drwx------ 2 postgres postgres 32 Sep 13 09:30 pg_tblspc

drwx------ 2 postgres postgres 6 Sep 13 09:30 pg_twophase

-rw------- 1 postgres postgres 3 Sep 13 09:30 PG_VERSION

drwx------ 3 postgres postgres 60 Sep 13 09:30 pg_wal

drwx------ 2 postgres postgres 18 Sep 13 09:30 pg_xact

-rw------- 1 postgres postgres 727 Sep 13 09:30 postgresql.auto.conf

-rw------- 1 postgres postgres 29684 Sep 13 09:30 postgresql.conf

-rw------- 1 postgres postgres 0 Sep 13 09:30 standby.signal

👀standby.signal file:

The presence of this file triggers standby mode. PostgreSQL knows it should operate as a standby server and connect to the primary for replication.
The server will remain in standby mode as long as this file exists in the data directory.

[postgres@postgres-slave data]$ cat postgresql.auto.conf

# Do not edit this file manually!

# It will be overwritten by the ALTER SYSTEM command.

max_worker_processes = '10'

max_replication_slots = '10'

shared_preload_libraries = 'pglogical'

track_commit_timestamp = 'on'

wal_level = 'hot_standby'

archive_mode = 'on'

archive_command = 'test ! -f /var/lib/pgsql/backup/%f && cp %p /var/lib/pgsql/backup/%f'

max_wal_senders = '10'

wal_keep_size = '1024 MB'

primary_conninfo = 'user=replicator passfile=''/var/lib/pgsql/.pgpass'' channel_binding=prefer host=''postgres-master'' port=5432 sslmode=prefer sslcompression=0 sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable'

💩 Config postgresql.conf

primary_conninfo = 'host=postgres-master port=5432 user=replicator password=******'

primary_slot_name = 'cherry'

archive_cleanup_command = '/usr/pgsql-16/bin/pg_archivecleanup /var/lib/pgsql/backup %r'

restore_command = 'rsync -a postgres@192.168.56.111:/var/lib/pgsql/backup/%f %p' // using IP instead of hostname

recovery_target_timeline = 'latest'

💩 Start cluster database to begin replication

[root@postgres-slave ~]# systemctl stop postgresql-16.service

[root@postgres-slave ~]# systemctl start postgresql-16.service

[root@postgres-slave ~]# systemctl status postgresql-16.service

5. Verify replication status

💀 On Master Node

postgres=# select * from pg_replication_slots;

-[ RECORD 1 ]-------+----------

slot_name | cherry

plugin |

slot_type | physical

datoid |

database |

temporary | f

active | t

active_pid | 10897

xmin |

catalog_xmin |

restart_lsn | 0/70001C0

confirmed_flush_lsn |

wal_status | reserved

safe_wal_size |

two_phase | f

conflicting |

postgres=# select * from pg_stat_replication;

-[ RECORD 1 ]----+------------------------------

pid | 10897

usesysid | 17430

usename | replicator

application_name | walreceiver

client_addr | 192.168.56.222

client_hostname |

client_port | 33438

backend_start | 2024-09-13 10:40:03.323266+07

backend_xmin |

state | streaming

sent_lsn | 0/70001C0

write_lsn | 0/70001C0

flush_lsn | 0/70001C0

replay_lsn | 0/70001C0

write_lag |

flush_lag |

replay_lag |

sync_priority | 0

sync_state | async

reply_time | 2024-09-13 10:44:42.739911+07

postgres=# select pg_current_wal_lsn();

postgres=# select pg_walfile_name(pg_current_wal_lsn());

Lag in Bytes

postgres=# select client_addr, pg_wal_lsn_diff(pg_stat_replication.sent_lsn, pg_stat_replication.replay_lsn) AS byte_lag from pg_stat_replication;

-[ RECORD 1 ]---------------

client_addr | 192.168.56.222

byte_lag | 0

Lag in Seconds

SELECT CASE

WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0

ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())

END AS log_delay;

💀 On Slave Node

postgres=# select pg_is_wal_replay_paused();

pg_is_wal_replay_paused

-------------------------

f // f = recovery is running , t = stopped

(1 row)

postgres=# select pg_is_in_recovery();

pg_is_in_recovery

-------------------

(1 row)

postgres=# select pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();

-[ RECORD 1 ]-----------------+----------

pg_last_wal_receive_lsn | 0/70001C0

pg_last_wal_replay_lsn | 0/70001C0

pg_last_xact_replay_timestamp |

postgres=# select * from pg_stat_wal_receiver;

-[ RECORD 1 ]---------+--------

pid | 10407

status | streaming

receive_start_lsn | 0/6000000

receive_start_tli | 1

written_lsn | 0/70001C0

flushed_lsn | 0/70001C0

received_tli | 1

last_msg_send_time | 2024-09-13 10:44:03.598586+07

last_msg_receipt_time | 2024-09-13 10:44:02.71503+07

latest_end_lsn | 0/70001C0

latest_end_time | 2024-09-13 10:40:03.374545+07

slot_name | cherry

sender_host | postgres-master

sender_port | 5432

conninfo | user=replicator passfile=/var/lib/pgsql/.pgpass channel_binding=prefer dbname=replication host=postgres-master port=5432 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable

postgres=# select pg_last_wal_receive_lsn();

postgres=# select pg_last_wal_replay_lsn();

postgres=# select pg_last_xact_replay_timestamp();

Note: We can not execute DML, DDL on Slave Node

dvdrental=# create table temp01 as select * from actor ;

ERROR: cannot execute CREATE TABLE AS in a read-only transaction

👽 Make data change and check:

[root@postgres-master ~]# ps -eaf | grep sender

postgres 10897 9704 0 10:40 ? 00:00:00 postgres: walsender replicator 192.168.56.222(33438) streaming 0/7020300

[postgres@postgres-slave ~]$ ps -eaf | egrep "receiver|startup"

root 872 1 0 08:31 ? 00:00:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups

postgres 10406 10401 0 10:40 ? 00:00:00 postgres: startup recovering 000000010000000000000007

postgres 10407 10401 0 10:40 ? 00:00:00 postgres: walreceiver streaming 0/7020300

6. Change to SYNC mode
👉 On Master Node
postgres=# show synchronous_standby_names ;

postgres=# alter system set synchronous_standby_names to 'standby';
[root@postgres-master ~]# systemctl restart postgresql-16.service

Note: Convert to ASYNC mode with command

postgres=# alter system reset synchronous_standby_names ;

[root@postgres-master ~]# systemctl restart postgresql-16.service

👉 On Slave Node
postgres=# show cluster_name ;

postgres=# alter system set cluster_name = 'standby';

[root@postgres-slave ~]# systemctl restart postgresql-16.service

👉 Verify replication status on Master Node

postgres=# select * from pg_stat_replication;

-[ RECORD 1 ]----+------------------------------

pid | 23177

usesysid | 17430

usename | replicator

application_name | standby

client_addr | 192.168.56.222

client_hostname |

client_port | 37916

backend_start | 2024-09-13 15:02:18.940161+07

backend_xmin |

state | streaming

sent_lsn | 0/80000D8

write_lsn | 0/80000D8

flush_lsn | 0/80000D8

replay_lsn | 0/80000D8

write_lag |

flush_lag |

replay_lag |

sync_priority | 1

sync_state | sync

reply_time | 2024-09-13 15:02:58.081962+07

7. Failover

The Master node goes down and we can not fix the problem.
The Slave node will be promoted to be Master node

😈On Master Node

[root@postgres-master ~]# systemctl stop postgresql-16.service

💩On Slave Node

[postgres@postgres-slave data]$ pg_ctl promote -W // -W, --no-wait do not wait until operation completes

server promoting

Note: The standby.signal will be removed after promoting the Slave to Master

8. Switchover

Note: Master node must be in ASYNC mode

Step 1: On Master - Stop service

postgres=# select pg_switch_wal();

[root@postgres-master ~]# systemctl stop postgresql-16.service

Step 2: On Master - Backup postgresql.conf file

[postgres@postgres-master ~]$ cd $PGDATA

[postgres@postgres-master data]$ cp postgresql.conf postgresql.conf.bak

Step 3: On Slave - Check data synchronization

postgres=# select pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();

pg_last_wal_receive_lsn | pg_last_wal_replay_lsn | pg_last_xact_replay_timestamp

-------------------------+------------------------+-------------------------------

0/F0000A0 | 0/F0000A0 | 2024-09-16 14:42:46.980097+07

(1 row)

Step 4: On Slave - Promote Slave to Master

[postgres@postgres-slave ~]$ pg_ctl promote

waiting for server to promote.... done

server promoted

postgres=# select pg_promote();

Step 5: On New Master - Create replication slot

Because of replication slot is not replicated to Slave. So we must create a replication slot into New Master node

postgres=# select * from pg_create_physical_replication_slot('cherry');

postgres=# select * from pg_replication_slots;

Step 6: On OLD Master - Convert OLD Master to Slave

💩 Config postgresql.conf

primary_conninfo = 'host=postgres-slave port=5432 user=replicator password=******'

primary_slot_name = 'cherry'

archive_cleanup_command = '/usr/pgsql-16/bin/pg_archivecleanup /var/lib/pgsql/backup %r'

recovery_target_timeline = 'latest'

💓 Create standby.signal file

[postgres@postgres-master ~]$ cd $PGDATA

[postgres@postgres-master data]$ touch standby.signal

[postgres@postgres-master data]$ chmod 600 standby.signal

💓 Restart service

[root@postgres-master ~]# systemctl stop postgresql-16.service

[root@postgres-master ~]# systemctl start postgresql-16.service

Ref:

Cherry's blog

Pages

Thursday, September 12, 2024

PostgreSQL - Streaming Replication (Hot Standby)