Filter during replication or edge-to-edge chaining
For performance reasons, you might want to ensure that replication occurs where necessary. Rule for replica filtering are therefore useful.
On this page:
As part of an HA/DR solution, you typically want to ensure that all the metadata and all the versioned files are replicated. In most other use cases, particularly build servers and forwarding replicas, this leads to a great deal of redundant data being transferred.
It is often advantageous to configure your replica servers to filter data on client workspaces and file revisions. For example:
- Developers working on one project at a remote site do not typically need to know the state of every client workspace at other sites where other projects are being developed.
- Build servers don’t require access to the endless stream of changes to office documents and spreadsheets associated with a typical large enterprise.
- In the case of edge-to-edge chaining, the outer edge might need only a subset of what the inner edge has.
Filters only apply for metadata and files that are handled by the p4 pull threads defined in the startup.N configurables.
Filtering applies to new revisions
Changes made to a filtering rules are not applied retrospectively.
-
If a file revision is on the replica, it will remain on the replica if the filter is changed.
-
If the metadata has previously been filtered out for file revisions and the filter is subsequently changed to allow those file revisions, the replica will still be missing any revisions created before the filter was changed, even after running the
p4 verify -tcommand.
Therefore, filtering is more consistent and predictable if you re-seed the replica or edge server after an update to the server specification.
Starting with 2025.2, an exception to this rule is Automatic replica filter reconciliation.
Sensitive or unneeded librarian A subsystem of the server that stores, manages, and provides the archive files to other subsystems of the server. See also 'archive files'. files can be removed by running p4 cachepurge before reseeding the replica or edge server.
p4 verify -t command does not respect the ArchiveDataFilter that applies to a specific server instance. If metadata exists on that instance and p4 verify -t detects the archive file is missing within the file[revRange] provided, the p4 verify -t command will cause the file to be scheduled for transfer. This can have a performance impact along with populating the server with archive files it was not originally intended to contain.Two ways to filter
| Exclude database tables | Filter by fields |
|---|---|
|
The simplest way to filter metadata is by using the Excluding entire database tables is a coarse-grained method of managing the amount of data passed between servers, requires some knowledge of which tables are most likely to be referred to during P4 Server command operations, and offers no means of control over which versioned files are replicated. |
You can have fine-grained control over what data is replicated
by using the |
Example: Filtering out client workspace data and files
If workspaces for users in each of three sites are named with
site[123]-ws-username, a replica intended to act as
partial backup for users at site1 could be configured as
follows:
ServerID: site1-1668
Name: site1-1668
Type: server
Services: replica
Address: tcp:site1bak:1668
Description:
Replicate all client workspace data, except the states of
workspaces of users at sites 2 and 3.
Automatically replicate .c files in anticipation of user
requests. Do not replicate .mp4 video files, which tend
to be large and impose high bandwidth costs.
ClientDataFilter:
//...
-//site2-ws-*/...
-//site3-ws-*/...
RevisionDataFilter:
ArchiveDataFilter:
//....c
-//....mp4
When you start the replica, your p4 pull metadata
thread might resemble the following:
p4 configure set "site1-1668#startup.1=pull -i 30"
In this configuration, only those portions of db.have
that are associated with site1 are replicated. All
metadata concerning workspaces associated with site2 and
site3 is ignored.
All file-related metadata is replicated. All files in the depot are
replicated, except for those ending in .mp4. Files ending
in .c are transferred automatically to the replica when
submitted.
To further illustrate the concept, consider a build server scenario. The ongoing work of the organization (such as code, business documents, or videos) can be stored anywhere in the depot, but this build farm is dedicated to building releasable products, and has no need to have the rest of the organization’s output:
Example: Replicating metadata and file contents for a subset of a depot
Releasable code is placed into //depot/releases/... and
automated builds are based on these changes. Changes to other portions
of the depot, as well as the states of individual workers' client
workspaces, are filtered out with the - character(line 10).
ServerID: builder-1669
Name: builder-1669
Type: server
Services: build-server
Address: tcp:built:1669
Description:
Exclude all client workspace data
Replicate only revisions in release branches
ClientDataFilter:
-//...
RevisionDataFilter:
//depot/releases/...
ArchiveDataFilter:
//depot/releases/...
Exclude a subset of paths
If you want to exclude a subset of paths, first put inclusionary line(s), then add the exclusionary line(s) below. For example,
RevisionDataFilter:
//...
-//depot/releases/...
Seed the replica
To seed the replica, you can use a command like the following to create a filtered checkpoint:
p4d -r /p4/london -P builder-1669 -jd myCheckpointwhere london represents the name of the target server The immediately upstream server for replica servers, edge servers, standby servers, proxies and brokers. See also 'upstream server' and 'central server'..
The filters specified for builder-1669 are used in
creating the checkpoint. You can then continue to update the replica
using the p4 pull command.
When you start the replica, your p4 pull metadata
thread might resemble the following:
p4 configure set "builder-1669#startup.1=pull -i 30"
Therefore, this p4 pull thread gets metadata for
replication that excludes all client workspace data (including the have
lists) of all users.
The p4 pull -u threads ignore all changes on the
target server except those that affect revisions in the
//depot/releases/... branch, which are the only changes of
interest to a build server. The only metadata that is available is that
which concerns released code. All released code is automatically
transferred to the build server before any requests are made, so that
when the build server performs a p4 sync, the sync is
performed locally.
Automatic replica filter reconciliation
If the filtering rules in a replica's server spec have changed, the change to the replica filter affects the replication of server data from the target server The immediately upstream server for replica servers, edge servers, standby servers, proxies and brokers. See also 'upstream server' and 'central server'. to the replica A P4 Server that automatically maintains a full or partial copy of the central server's metadata and that might contain related file content. The replica copies by using 'p4 pull' or 'p4 journalcopy'. A replica can be used as a backup server for disaster recovery. server. In the case of edge servers, this affects what the upstream server Any server in the inward direction, that is, toward the central server. For example, in an edge-to-edge configuration with a commit, edge1, and edge2, both edge1 and the commit server are upstream servers for edge2. See also 'central server'. replicates to the downstream replicas, where the upstream server is either the commit server or an upstream edge server in a chain of edge servers.
The replica filter reconcile feature, introduced in P4 Server 2025.2, combines the ability the purge unnecessary records and fetch records that have become necessary. Be aware that changes in replication might have a performance impact.
By default, the journal pull thread automatically performs the reconciliation when it detects a change in the replica filter.
These configurables control the behavior of the reconciliation:
| Purge database records | Fetch database records |
|---|---|
| To change the number of threads used when purging multiple database tables in parallel, use the rpl.filter.restrict.threads configurable. | To change the number of threads used to process database tables in parallel, use the rpl.filter.expand.threads configurable. |
|
To change the number of database records that are processed in parallel, use the rpl.filter.restrict.batch configurable. |
To change the block size used for calculating the checksum to compare to database records to those of the upstream server, use the rpl.filter.expand.batch configurable. Blocks are used to avoid dumping the entire table to the journal if only a few records are missing on the replica. |
|
To disable automatic purging, set the rpl.filter.restrict configurable to |
To disable automatic fetching, set the rpl.filter.expand configurable to |
An administrator can run a manual replica filter reconciliation from the command line, which provides an option to specify which tables the command applies to. To learn more, see p4 admin replica-filter-reconcile in the P4 CLI Reference.