Wednesday, September 11, 2024

PostgreSQL - Backup and Restore

  • pg_dump is a utility for backing up a PostgreSQL database. It makes consistent backups even if the database is being used concurrently. pg_dump does not block other users accessing the database (readers or writers)
  • pg_dump only dumps a single database.
[postgres@postgres-master ~]$ pg_dump --help
pg_dump dumps a database as a text file or to other formats.

Usage:
  pg_dump [OPTION]... [DBNAME]

General options:
  -f, --file=FILENAME          output file or directory name
  -F, --format=c|d|t|p         output file format (custom, directory, tar, plain text (default))
  • p - plainOutput a plain-text SQL script file (the default). Using psql to restore this script.
  • c - custom: Output a custom-format archive suitable for input into pg_restore. Together with the directory output format, this is the most flexible output format in that it allows manual selection and reordering of archived items during restore. This format is also compressed by default.
  • d - directory: Output a directory-format archive suitable for input into pg_restore. This will create a directory with one file for each table and large object being dumped, plus a so-called Table of Contents file describing the dumped objects in a machine-readable format that pg_restore can read. A directory format archive can be manipulated with standard Unix tools; for example, files in an uncompressed archive can be compressed with the gzip, lz4, or zstd tools. This format is compressed by default using gzip and also supports parallel dumps.
  • t - tar : Output a tar-format archive suitable for input into pg_restore. The tar format is compatible with the directory format: extracting a tar-format archive produces a valid directory-format archive. However, the tar format does not support compression. Also, when using tar format the relative order of table data items cannot be changed during restore
  -j, --jobs=NUM               use this many parallel jobs to dump    // only supported by the directory format -Fd

Options controlling the output content:
  -a, --data-only              dump only the data, not the schema    // only metadata of database
  -c, --clean                  clean (drop) database objects before recreating
  -n, --schema=PATTERN         dump the specified schema(s) only
  -N, --exclude-schema=PATTERN do NOT dump the specified schema(s)
  -s, --schema-only            dump only the schema, no data
  -t, --table=PATTERN          dump only the specified table(s)
  -T, --exclude-table=PATTERN  do NOT dump the specified table(s)

Connection options:
  -d, --dbname=DBNAME      database to dump
  -h, --host=HOSTNAME      database server host or socket directory
  -p, --port=PORT          database server port number
  -U, --username=NAME      connect as specified database user
  -w, --no-password        never prompt for password
  -W, --password           force password prompt (should happen automatically)

Example:
[postgres@postgres-master ~]$ pg_dump dvdrental -j2 -Fd -f /source/dumpfile
[postgres@postgres-master ~]$ pg_dump dvdrental -f /source/dvdrental.sql
[postgres@postgres-master ~]$ pg_dump dvdrental -t cherry -t actor > cherry_actor.sql

  • using pg_dumpall to extract a PostgreSQL database cluster into a script file (text-format)
  • using pg_dumpall for upgrading a PostgreSQL server - Works well for a tiny database cluster
  • No parallel mode.
[postgres@postgres-master ~]$ pg_dumpall --help
pg_dumpall extracts a PostgreSQL database cluster into an SQL script file.

Usage:
  pg_dumpall [OPTION]...

General options:
  -f, --file=FILENAME          output file name

Options controlling the output content:
  -a, --data-only              dump only the data, not the schema
  -c, --clean                  clean (drop) databases before recreating
  -s, --schema-only            dump only the schema, no data
  -t, --tablespaces-only       dump only tablespaces, no databases or roles

Connection options:
  -d, --dbname=CONNSTR     connect using connection string
  -h, --host=HOSTNAME      database server host or socket directory
  -l, --database=DBNAME    alternative default database
  -p, --port=PORT          database server port number
  -U, --username=NAME      connect as specified database user
  -w, --no-password        never prompt for password
  -W, --password           force password prompt (should happen automatically)

Example:
[postgres@postgres-master ~]$ pg_dumpall -f all_cluster.sql
[postgres@postgres-master ~]$ pg_dumpall -s -f all_metadata_cluster.sql

  • pg_restore — restore a PostgreSQL database from an archive file created by pg_dump
  • using pg_restore to restore an entire database or specific parts of it
[postgres@postgres-master ~]$ pg_restore --help
pg_restore restores a PostgreSQL database from an archive created by pg_dump.

Usage:
  pg_restore [OPTION]... [FILE]

General options:
  -d, --dbname=NAME        connect to database name
  -f, --file=FILENAME      output file name (- for stdout)
  -F, --format=c|d|t       backup file format (should be automatic)
  -l, --list               print summarized TOC of the archive

Options controlling the restore:
  -a, --data-only              restore only the data, no schema
  -c, --clean                  clean (drop) database objects before recreating
  -C, --create                 create the target database
  -j, --jobs=NUM               use this many parallel jobs to restore
  -n, --schema=NAME            restore only objects in this schema
  -N, --exclude-schema=NAME    do not restore objects in this schema
  -s, --schema-only            restore only the schema, no data
  -t, --table=NAME             restore named relation (table, view, etc.)
  --if-exists                  use IF EXISTS when dropping objects

Connection options:
  -h, --host=HOSTNAME      database server host or socket directory
  -p, --port=PORT          database server port number
  -U, --username=NAME      connect as specified database user
  -w, --no-password        never prompt for password
  -W, --password           force password prompt (should happen automatically)

Example:
# pg_restore -U postgres -W -C -c -j2 --if-exists -d postgres /tmp/ecommerce_backup.tar
  • -C: This option tells pg_restore to create the database before restoring the data. If the database already exists, pg_restore will exit with an error unless the --if-exists option is also specified.
  • -c: This option specifies the "clean" mode for the restore. It tells pg_restore to drop database objects (tables, functions, etc.) before recreating them. This ensures that the restored database is in a clean state and matches the structure of the backup file.
  • -j or --jobs: Specifies the number of concurrent jobs for parallel restoration.
  • --if-exists: This option is used with the -C option. It tells pg_restore to ignore errors if the database being created already exists. If the database exists, pg_restore will proceed with the restore without attempting to create it again.
  • -d postgres: This option specifies the name of the database to which to connect initially. In this case, it's the Postgres database, which is the default database that typically exists in PostgreSQL installations. pg_restore needs to connect to an existing database to create the new database specified in the backup file.
  • /tmp/ecommerce_backup.tar: This is the path to the backup file that contains the database dump. It should be a valid backup file created by pg_dump in the "tar" format.
  • pg_basebackup is used to take a base backup of a running PostgreSQL database cluster. The backup is taken without affecting other clients of the database, and can be used both for point-in-time recovery and as the starting point for a log-shipping or streaming-replication standby server.
  • pg_basebackup makes an exact copy of the database cluster's files, while making sure the server is put into and out of backup mode automatically. Backups are always taken of the entire database cluster; it is not possible to back up individual databases or database objects. 
  • pg_basebackup can make a base backup from not only a primary server but also a standby.
  • Whenever pg_basebackup is taking a base backup, the server's pg_stat_progress_basebackup view will report the progress of the backup.
4.1 Cluster Database information
  • Create user for new database
  • Create tablespaces for data and index
  • Create database with owner user
  • Import data into database
4.2 Backup Cluster Database
[postgres@postgres-master ~]$ pg_basebackup -p 5432 -U postgres -D /postgres/backup/ -Ft -z -Xs -P -v
[postgres@postgres-master ~]$ ls -l /postgres/backup

4.3 Restore Cluster Database
  • Stop postgresql service
  • Delete all data in $PG_DATA
  • Create directory for each tablespace if needed
  • Restore data into each location (data, pg_wal, tablespace locations) - using tar command
  • Creating recovery.conf for recovering data 
    • restore_command
    • recovery_target_name
    • recovery_target_time
    • recovery_target_xid
    • recovery_target_lsn
    • recovery_target_action
  • Start postgresql service
  • Verify data 
Ref: