Thursday, September 19, 2024

PostgreSQL - Setting Parameters for Production

There are several key parameters that should be adjusted for optimal performance, security, and stability. The default settings are usually designed for small systems and need tuning based on your hardware, workload, and specific use cases. Here's a breakdown of important parameters to adjust:

1. Memory Settings
  • shared_buffers
    • Memory the database server uses for shared memory buffers
    • 25% of physical RAM if physical RAM > 1GB
    • Larger settings for shared_buffers usually require a corresponding increase in max_wal_size and setting huge_pages
  • wal_buffers
    • The amount of shared memory used for WAL data that has not yet been written to disk
    • Default value:  3% of shared_buffers, but not less than 64kB nor more than the size of one WAL segment (16MB)
    • The contents of the WAL buffers are written out to disk at every transaction commit, so extremely large values are unlikely to provide a significant benefit. However, setting this value to at least a few megabytes can improve write performance on a busy server where many clients are committing at once. The auto-tuning selected by the default setting of -1 should give reasonable results in most cases
  • work_mem
    • Used for sorting tuples by ORDER BY and DISTINCT operations, and for joining tables by merge-join and hash-join operations.
    • Be careful with high values, as this is allocated per query, potentially consuming a large amount of RAM during heavy workloads
    • Consider with formular "Total RAM * 0.25 / max_connections"
  • temp_buffers
    • Used for temporary buffers within each database session and for storing temporary tables.
    • This setting can be changed within individual sessions, but only before the first use of temporary tables within the session; subsequent attempts to change the value will have no effect on that session

Thursday, September 12, 2024

PostgreSQL - Streaming Replication (Hot Standby)

1. General information

Master Node
IP: 192.168.56.111
Hostname: postgres-master
Version: postgres (PostgreSQL) 16.4
$PGDATA=/var/lib/pgsql/16/data

postgres=# \db
                    List of tablespaces
    Name    |  Owner   |              Location              
------------+----------+------------------------------------
 dvd_data   | tucdt    | /var/lib/pgsql/dvdrental_tbs/data
 dvd_index  | tucdt    | /var/lib/pgsql/dvdrental_tbs/index
 pg_default | postgres | 
 pg_global  | postgres | 
(4 rows)

Slave Node
IP: 192.168.56.222
Hostname: postgres-slave
Version: postgres (PostgreSQL) 16.4
$PGDATA=/var/lib/pgsql/16/data    // empty

2. Preparation on Master/Slave
  • Disable Selinux : SELINUX=disabled
  • Stop Firewall
  • Config /etc/hosts
    • 192.168.56.111  postgres-master.taolaoxibup.com postgres-master
    • 192.168.56.222  postgres-slave.taolaoxibup.com postgres-slave
  • Setup SSH KEY-PAIR
    • ssh-keygen -t rsa
    • ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@192.168.56.[111/222]
3. Master - Configuration
3.1 Create user for replication
postgres=# CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'P@ssw0rd';

3.2 Create replication slot
  • Replication slots provide an automated way to ensure that the primary does not remove WAL segments until they have been received by all standbys, and that the primary does not remove rows which could cause a recovery conflict even when the standby is disconnected.
  • In lieu of using replication slots, it is possible to prevent the removal of old WAL segments using wal_keep_size, or by storing the segments in an archive using archive_command or archive_library

Wednesday, September 11, 2024

PostgreSQL - Backup and Restore

  • pg_dump is a utility for backing up a PostgreSQL database. It makes consistent backups even if the database is being used concurrently. pg_dump does not block other users accessing the database (readers or writers)
  • pg_dump only dumps a single database.
[postgres@postgres-master ~]$ pg_dump --help
pg_dump dumps a database as a text file or to other formats.

Usage:
  pg_dump [OPTION]... [DBNAME]

General options:
  -f, --file=FILENAME          output file or directory name
  -F, --format=c|d|t|p         output file format (custom, directory, tar, plain text (default))
  • p - plainOutput a plain-text SQL script file (the default). Using psql to restore this script.
  • c - custom: Output a custom-format archive suitable for input into pg_restore. Together with the directory output format, this is the most flexible output format in that it allows manual selection and reordering of archived items during restore. This format is also compressed by default.
  • d - directory: Output a directory-format archive suitable for input into pg_restore. This will create a directory with one file for each table and large object being dumped, plus a so-called Table of Contents file describing the dumped objects in a machine-readable format that pg_restore can read. A directory format archive can be manipulated with standard Unix tools; for example, files in an uncompressed archive can be compressed with the gzip, lz4, or zstd tools. This format is compressed by default using gzip and also supports parallel dumps.
  • t - tar : Output a tar-format archive suitable for input into pg_restore. The tar format is compatible with the directory format: extracting a tar-format archive produces a valid directory-format archive. However, the tar format does not support compression. Also, when using tar format the relative order of table data items cannot be changed during restore
  -j, --jobs=NUM               use this many parallel jobs to dump    // only supported by the directory format -Fd

Options controlling the output content:
  -a, --data-only              dump only the data, not the schema    // only metadata of database
  -c, --clean                  clean (drop) database objects before recreating
  -n, --schema=PATTERN         dump the specified schema(s) only
  -N, --exclude-schema=PATTERN do NOT dump the specified schema(s)
  -s, --schema-only            dump only the schema, no data
  -t, --table=PATTERN          dump only the specified table(s)
  -T, --exclude-table=PATTERN  do NOT dump the specified table(s)

Connection options:
  -d, --dbname=DBNAME      database to dump
  -h, --host=HOSTNAME      database server host or socket directory
  -p, --port=PORT          database server port number
  -U, --username=NAME      connect as specified database user
  -w, --no-password        never prompt for password
  -W, --password           force password prompt (should happen automatically)

Example:
[postgres@postgres-master ~]$ pg_dump dvdrental -j2 -Fd -f /source/dumpfile
[postgres@postgres-master ~]$ pg_dump dvdrental -f /source/dvdrental.sql
[postgres@postgres-master ~]$ pg_dump dvdrental -t cherry -t actor > cherry_actor.sql

Monday, September 09, 2024

PostgreSQL - pglogical 2 extention

1. Overview 
  • The pglogical 2 extension provides logical streaming replication for PostgreSQL, using a publish/subscribe model.
  • Using pglogical the provider and subscriber must be running PostgreSQL 9.4 or newer.
  • Use cases supported are:
    • Upgrades between major versions (given the above restrictions)
    • Full database replication
    • Selective replication of sets of tables using replication sets
    • Selective replication of table rows at either publisher or subscriber side (row_filter)
    • Selective replication of table columns at publisher side
    • Data gather/merge from multiple upstream servers
  • Architectural details:
    • pglogical works on a per-database level, not whole server level like physical streaming replication
    • One Provider may feed multiple Subscribers without incurring additional disk write overhead
    • One Subscriber can merge changes from several origins and detect conflict between changes with automatic and configurable conflict resolution (some, but not all aspects required for multi-master).
    • Cascading replication is implemented in the form of changeset forwarding.
  • Limitations and restrictions
    • To replicate multiple databases you must set up individual provider/subscriber relationships for each. There is no way to configure replication for all databases in a PostgreSQL install at once
    • UPDATEs and DELETEs cannot be replicated for tables that lack a PRIMARY KEY or other valid replica identity such as a UNIQUE constraint. Replication has no way to find the tuple that should be updated/deleted since there is no unique identifier.
    • Automatic DDL replication is not supported. Managing DDL so that the provider and subscriber database(s) remain compatible is the responsibility of the user. 
    • The state of sequences added to replication sets is replicated periodically and not in real-time.
    • PGLogical can replicate across PostgreSQL major versions. Despite that, long term cross-version replication is not considered a design target, though it may often work. Issues where changes are valid on the provider but not on the subscriber are more likely to arise when replicating across versions. 
    • It is safer to replicate from an old version to a newer version since PostgreSQL maintains solid backward compatibility but only limited forward compatibility. Initial schema synchronization is only supported when replicating between same version of PostgreSQL or from lower version to higher version
    • ...
2. General information

Master Node
IP: 192.168.56.111
Hostname: postgres-master.taolaoxibup.com
OS: Red Hat Enterprise Linux release 9.4 (Plow)

postgres=# select datname from pg_database ;
  datname  
-----------
 postgres
 dvdrental        // we will replica this database
 template1
 template0
(4 rows)

Slave Node
IP: 192.168.56.222
Hostname: postgres-slave.taolaoxibup.com
OS: Red Hat Enterprise Linux release 9.4 (Plow)

3. Installation on Master/Slave nodes
# dnf install -y postgresql16-server postgresql16-contrib
# dnf install pglogical_16