COMMAND NAME:  gpexpand

Expands an existing Apache Cloudberry across new hosts in the array.

******************************************************
SYNOPSIS
******************************************************

gpexpand 
      [-f <hosts_file>]
      | -i <input_file> [-B <batch_size>] [-V] [-t segment_tar_dir] [-S]
      | {-d <hh:mm:ss> | -e '<YYYY-MM-DD hh:mm:ss>'} 
        [-analyze] [-n <parallel_processes>]
      | --hba-hostnames
      | --rollback
      | --clean
[--verbose] [--silent]

gpexpand -? | -h | --help 

gpexpand --version


******************************************************
PREREQUISITES
******************************************************

* You are logged in as the Apache Cloudberry superuser (gpadmin).

* The new segment hosts have been installed and configured as per 
  the existing segment hosts. This involves:

  * Configuring the hardware and OS
  * Installing the Cloudberry software
  * Creating the gpadmin user account
  * Exchanging SSH keys. 

* Enough disk space on your segment hosts to temporarily hold a 
  copy of your largest table. 
* When redistributing data, Apache Cloudberry must be running in 
  production mode. Apache Cloudberry cannot be restricted mode or in 
  coordinator mode. The gpstart options -R or -m cannot be specified to start 
  Apache Cloudberry. 
  
  
******************************************************
DESCRIPTION
******************************************************

The gpexpand utility performs system expansion in two phases: segment 
initialization and then table redistribution.
 
In the initialization phase, gpexpand runs with an input file that 
specifies data directories, dbid values, and other characteristics 
of the new segments. You can create the input file manually, or by 
following the prompts in an interactive interview.

If you choose to create the input file using the interactive interview, 
you can optionally specify a file containing a list of expansion hosts. 
If your platform or command shell limits the length of the list of hostnames 
that you can type when prompted in the interview, specifying the hosts 
with -f may be mandatory. 

In addition to initializing the segments, the initialization phase 
performs these actions:
* Creates an expansion schema to store the status of the expansion 
  operation, including detailed status for tables.
* Changes the distribution policy for all tables to DISTRIBUTED RANDOMLY. 
  The original distribution policies are later restored in the 
  redistribution phase.

To begin the redistribution phase, you must run gpexpand with either 
the -d (duration) or -e (end time) options. Until the specified end 
time or duration is reached, the utility will redistribute tables in 
the expansion schema. Each table is reorganized using ALTER TABLE 
commands to rebalance the tables across new segments, and to set 
tables to their original distribution policy. If gpexpand completes 
the reorganization of all tables before the specified duration, 
it displays a success message and ends. 

NOTE: Data redistribution should be performed during low-use hours. 
Redistribution can divided into batches over an extended period. 

******************************************************
OPTIONS
******************************************************

-a | --analyze
 Run ANALYZE to update the table statistics after expansion. 
 The default is to not run ANALYZE.


-B <batch_size>
 Batch size of remote commands to send to a given host before 
 making a one-second pause. Default is 16. Valid values are 1-128.
 The gpexpand utility issues a number of setup commands that may exceed 
 the host's maximum threshold for authenticated connections as defined 
 by MaxStartups in the SSH daemon configuration. The one-second pause 
 allows authentications to be completed before gpexpand issues any 
 more commands. The default value does not normally need to be changed. 
 However, it may be necessary to reduce the maximum number of commands 
 if gpexpand fails with connection errors such as 
 'ssh_exchange_identification: Connection closed by remote host.'


-c | --clean
 Remove the expansion schema.


-d | --duration <hh:mm:ss>
 Duration of the expansion session from beginning to end.


-e | --end '<YYYY-MM-DD hh:mm:ss>'
 Ending date and time for the expansion session.


-f | --hosts-file <filename>
 Specifies the name of a file that contains a list of new hosts for 
 system expansion. Each line of the file must contain a single 
 host name. This file can contain hostnames with or without network 
 interfaces specified. The gpexpand utility handles either case, 
 adding interface numbers to end of the hostname if the original nodes 
 are configured with multiple network interfaces.


--hba-hostnames
 Optional. use hostnames instead of CIDR in pg_hba.conf


-i | --input <input_file>
 Specifies the name of the expansion configuration file, which contains 
 one line for each segment to be added in the format of:

  <hostname>|<address>|<port>|<datadir>|<dbid>|<content>|<preferred_role>

  ...


-n <parallel_processes>
 The number of tables to redistribute simultaneously. Valid values 
 are 1 - 96. Each table redistribution process requires two database
 connections: one to alter the table, and another to update the table's 
 status in the expansion schema. Before increasing -n, check the current 
 value of the server configuration parameter max_connections and make 
 sure the maximum connection limit is not exceeded.


-r | --rollback
 Roll back a failed expansion setup operation.


-s | --silent
 Runs in silent mode. Does not prompt for confirmation to proceed 
 on warnings.


-S | --simple_progress
 Show simple progress view.


-t | --tardir <directory>
 Specify the temporary directory on segment hosts to put tar file.


-v | --verbose
 Verbose debugging output. With this option, the utility will output 
 all DDL and DML used to expand the database.


--version
 Display the utility's version number and exit.


-V | --novacuum
 Do not vacuum catalog tables before creating schema copy.


-? | -h | --help
 Displays the online help.


******************************************************
EXAMPLES
******************************************************

Run gpexpand with an input file to initialize new segments and 
create the expansion schema in the default database:

  $ gpexpand -i input_file


Run gpexpand for sixty hours maximum duration to redistribute 
tables to new segments:

  $ gpexpand -d 60:00:00

******************************************************
SEE ALSO
******************************************************

gpssh-exkeys

