Character Set Support

Character set support is only enabled for Unicode-enabled Helix Servers. In this mode, P4Java differentiates between Helix Server file content character sets (that is, the encoding used to read or write a file’s contents) and the character sets used for Helix Server file names, job specs, changelist descriptions, and so on.

This distinction is made due to the way Java handles strings and basic I/O: in general, while file content character set encodings need to be preserved so that the end results written to or read from the local disk are properly encoded, P4Java does not need to know about file metadata or other string value encodings. Because Helix Server store and transmit all such metadata and strings in normalized UTF-8 form, and because all Java strings are inherently encoded in UTF-16, the encoding to and from non-UTF-16 character sets (such as shiftjis) is done externally from P4Java (usually by the surrounding app), and is not influenced by or implemented in P4Java itself. This means that the character set passed to the IOptionsServer.setCharsetName method is only used for translation of file content. Everything else, including all file names, job specs, changelist descriptions, and so on, is encoded in the Java-native Java string encoding UTF-16 (and may or may not need to be translated out of that coding to something like shiftjis or winansi).

P4Java supports file content operations on files encoded in most of the character sets supported by the Helix Server, but not all. The list of supported Helix Server file content charsets is available to calling programs through the PerforceCharsets.getKnownCharsets method. If you attempt to set a IOptionsServer object’s charset to a charset not supported by both the Helix Server and the local JDK installation, you will get an appropriate exception; similarly, if you try to (for example) sync a file with an unsupported character set encoding, you will also get an exception.

The Helix Server uses non-standard names for several standard character sets. P4Java also uses the Helix Server version of the character set, rather than the standard name.