                     Internationalization Notes for
                    P4D, the Helix Versioning Engine
                     and Helix client applications
                            Version 2025.1

        
Introduction
        
        The Helix clients and server have an optional mode of operation
        where all metadata and some file content are stored in the server
        in the UTF8 Unicode character set and are translated into another
        character set on the client.
        
        When running in internationalized mode, all non-file data (identifiers,
        descriptions, and so on), as well as the content of all files of type 
        "unicode" are translated between the character set specified by the
        P4CHARSET variable on the client and UTF8 in the server.
        
        
Server configuration
        
        Before you use Perforce in an internationalized environment, you must
        first instruct your server to run in internationalized mode.
        
        Setting your server to run in internationalized mode:
        
        1.  Run "p4d -xi"
        
        This will verify that any and all existing metadata is valid 
        UTF8 and set a protected counter "unicode" to instruct future
        invocations of p4d to operate in internationalized mode.  
        
        (The "p4d" process invoked by "p4d -xi" does *not* start a Perforce
        server; rather, it terminates after instructing future invocations 
        of p4d to run in internationalized mode.  After p4d -xi sets up 
        internationalized mode, you may then invoke p4d with your site's
        usual flags.)
        
        After setting the server to run in internationalized mode, your 
        users must also set the P4CHARSET environment variable.
        
        Once set on the server, internationalized mode cannot be deactivated.
        (That is, you cannot return to non-internationalized mode.)
        
Client configuration
        
	As of 2014.2, the P4CHARSET environment variable is no longer
	required.

	P4CHARSET may be set to avoid clients having to detect the
	Unicode nature of a server.  If it is not set, clients may
	detect and remember the Unicode nature by setting an
	environment variable of the form 'P4_<p4port>_CHARSET'.
	See the server release notes for 2014.2 or documentation
	for details.

	A P4CHARSET value of 'auto' indicates that the client should
	make operating system specfic checks to determine what
	character set should be used.

        The following table lists recommended P4CHARSET values for
        supported character sets and platforms.
        
        Language      Platform     Windows        Unix         P4CHARSET
        			  Code page      LOCALE         setting
        ------------------------------------------------------------------
        Japanese      Windows       932           n/a           shiftjis
        Japanese      UNIX          n/a           varies        eucjp
        Japanese      UNIX          n/a           varies        shiftjis
	Chinese	      All	    936		  varies	cp936
	Chinese	      All	    950		  varies	cp950
	Korean	      All	    949		  varies	cp949
        High-ASCII    Windows       1252          n/a           winansi
        High-ASCII    Windows       1250          n/a           cp1250
        High-ASCII    Windows       850           n/a           cp850
        High-ASCII    Windows       852           n/a           cp852
        High-ASCII    Windows       858           n/a           cp858
        High-ASCII    Windows       437           n/a           winoem
        High-ASCII    UNIX          n/a           varies        iso8859-1
        High-ASCII    UNIX          n/a           varies        iso8859-2
        High-ASCII    UNIX          n/a           varies        iso8859-15
        High-ASCII    MacOS         n/a           varies        macosroman
        untranslated  All           n/a           n/a           utf8*
	Cyrillic      All	    n/a		  varies	iso8859-5
	Cyrillic      All	    n/a		  varies	koi-r
	Cyrillic      All	    1251	  n/a		cp1251
	Greek	      Windows	    1253	  n/a		cp1253
	Greek	      UNIX	    n/a		  n/a		iso8859-7
	All	      All	    n/a		  n/a		utf16*
        
	Unicode		Client				Byte-Order-Mark
	P4CHARSET	Unicode				written to
	settings	Format				files
	------------------------------------------------------------------
	utf8		UTF-8				No
	utf8-bom	UTF-8				Yes
	utf8unchecked	UTF-8 (not validated)		No
	utf8unchecked-bom UTF-8	(not validated)		Yes
	utf16		UTF-16 in client byte order	Yes
	utf16le		UTF-16 in Little Endian order	Yes (Windows Style)
	utf16be		UTF-16 in Big Endian order	Yes
	utf16-nobom	UTF-16 in client byte order	No
	utf16le-nobom	UTF-16 in Little Endian order	No
	utf16be-nobom	UTF-16 in Big Endian order	No
	utf32		UTF-32 in client byte order	Yes
	utf32le		UTF-32 in Little Endian order	Yes
	utf32be		UTF-32 in Big Endian order	Yes
	utf32-nobom	UTF-32 in client byte order	No
	utf32le-nobom	UTF-32 in Little Endian order	No
	utf32be-nobom	UTF-32 in Big Endian order	No

        (Note that eucjp is not a supported P4CHARSET value under Windows.)
        
	*Note that utf16 and utf32 require that P4COMMANDCHARSET
	be set to a different (non-utf16 and non-utf32) charset for
	the p4 command line to function.  Also, many p4 api based
	applications will not be able to support utf16 or utf32
	charsets without special work.
	*Note also that utf16 can be one of utf16, utf16-nobom,
	utf16le, utf16le-nobom, utf16be, utf16be-nobom which indicate
	if Byte-Order-Marks (BOMs) are desired and a particular byte
	order is desired.  Windows platforms will probably want
	to use utf16le which matches most closely with Windows
	concept of Unicode files.  The following notes about P4CHARSET
	also apply to P4COMMANDCHARSET.  P4COMMANDCHARSET allows
	for a different charset for command input and output while
	allowing P4CHARSET to set the charset of file contents.

	*Note that utf8 is untranslated, but effective with clients
	 built with the p4 api of 2006.1 or later will validate that
	 file contents are in fact utf8.  Previous clients did not
	 validate utf8 file contents.

        Setting P4CHARSET on Windows:
        
        1.  Log in to Windows and open an MS-DOS command prompt.
        
        2.  Run the chcp ("CHangeCodePage") command without any arguments to see
            your current code page.
        
        3.  Display your active code page on Windows machines by issuing
            the "chcp" command.  Windows displays a message like the following:
        
               Active code page: 1252
        
        4.  Select the character set based on the active code page as follows:
        
            Code page    Set P4CHARSET to:
        	 1252           winansi
        	 932            shiftjis
		 949		cp949
		 1250		cp1250
		 1251		cp1251
		 850		cp850
		 852		cp852
		 858		cp858
		 437		winoem
		 1253		cp1253
          
            To set P4CHARSET for all users on this workstation, you will need
            Administrator privileges.  Issue the following command:
        
               p4 set -s P4CHARSET=[character_set]
        
            If you don't have Administrator privileges, you can use:
        
               p4 set P4CHARSET=[character_set]
        
            to set P4CHARSET for the user currently logged in.  Other users
            on the same machine will have to set P4CHARSET independently.
        
        Setting P4CHARSET on UNIX:
        
        1.  Set P4CHARSET to the proper value from either a command shell
            or in a startup script such as .kshrc, .cshrc, or .profile.
            You can determine the proper value for P4CHARSET by examining
            the current setting of the LANG or LOCALE environment variable.
        
            Sample $LANG value:    Set P4CHARSET to:
              en_US.ISO_8859-1          iso8859-1
              ja_JP.EUC                 eucjp
              ja_JP.PCK                 shiftjis
        
            In general:
        
            For a Japanese installation, set P4CHARSET to eucjp
            For a European installation, set P4CHARSET to iso8859-1
        
        
Unicode file type
        
        Files of type "unicode" are stored in the depot in UTF-8.  Perforce
        client programs use the P4CHARSET environment variable to determine
        how to translate the UTF-8 data in "unicode" files into the local
        character set.  Only files of type "unicode" are translated; Perforce
        ignores P4CHARSET when retrieving or storing files of other file types.
        
        The first time you try to submit any file to the depot, Perforce
        attempts to determine its type by examining a portion (currently
        the first 8192 bytes) of the file.
        
        If P4CHARSET is unset:
        
            Files are (by default) assigned the filetypes "text" or "binary"
            depending on the presence of characters with the high bit set in
            the first part of the file.  This is the default behavior of
            Perforce in a non-internationalized environment.
        
        If P4CHARSET is set:
        
            If nonprintable characters are detected, the file is assigned the
            type "binary".  If there are no nonprintable characters, and there
            are high-ASCII characters, *and* those high-ASCII characters are
            translatable in the defined P4CHARSET, the file is deemed to be
            "unicode".  Otherwise, the file is stored as type "text" (that is,
            both plain text files without high-ASCII characters, and files with
            high-ASCII characters that are undefined in the character set
            specified by P4CHARSET, are stored as type "text".)
        
        To override Perforce's default file type detection, you can:
        
            Specify the desired filetype on the command line, as in
            "p4 add -t unicode file.txt"
        
        or:
        
            Use the p4 typemap command to assign Perforce filetypes according
            to a file's extension. For example, the following table assigns
            the Perforce "unicode" filetype to text and html files, and the
            Perforce "binary" filetype to PDF files:
        
            Typemap:
        	     unicode     //....txt
        	     unicode     //....html
        	     binary      //....pdf
        
            For more about using the typemap feature, refer to the Perforce
            System Administrator's Guide, or the "p4 typemap" page of the
            Command Reference.


UTF16 file type

	Files of type "utf16" are stored in the depot in UTF-8.
	These files are only in utf16 in the client workspace.

	Commands which output file contents such as p4 diff, p4 annotate,
	etc will attempt to translate content from the UTF-16 file
	into the P4CHARSET when in unicode mode	rather than mixing
	UTF-16 content with non-UTF-16 content.  Note that "p4 print"
	with the "-o" flag will write a file as UTF-16 while without
	the "-o" flag output the command will attempt to translate
	the output to the P4CHARSET.

	When adding files, UTF-16 files will prefer to be stored
	with the "utf16" filetype rather than the "unicode" filetype
	even if P4CHARSET is set to a utf16 encoding.  This should allow
	UTF16 files to live side by side with other character sets.
	The automatic type detection requires a BOM be present at
	the start of the file.  Files without a BOM are assumed to
	be in client byte order.  When utf16 files are written to
	a client, such as with the 'p4 sync' command, they are
	written with a BOM and in client byte order.
	

Diffing files
        
        The p4 diff2 command, which compares two files, can
        only compare files that have the same Perforce file type, 
        either text or unicode. You cannot compare a text file to
        a unicode file.
        
        
"CANNOT TRANSLATE" error message
        
        This message is displayed if your client machine is 
        configured with a character set that does not include 
        characters being sent to it by the Perforce server.
        Your client machine cannot display unmapped characters.
        
        For example, if your client machine is configured to
        use the shift-JIS character set and your depot contains
        files named using characters from the Japanese EUC character
        set that do not have mappings in shift-JIS, you will
        see the "Cannot translate..." error message when you
        execute a p4 files or p4 changes command that lists those files.
        
        To avoid translation errors, do not use unmapped
        characters (Japanese EUC character set that do not have 
        mappings in shift-JIS) in the following Perforce elements:
        
        - user names or specifications
        - client names or specifications
        - jobs
        - file names

	Translation failures during file transfers will report
	a line number near where the translation failure occurred.

        
Length limit for Unicode identifiers
        
        The Perforce server has internal limits on the lengths of
        strings used to index job descriptions, specify filenames,
        control view mappings, and identify client names, label names,
        and other objects.
        
        The most common limit is 1024 bytes.  Because some characters
        in Unicode can expand to more than one byte, it's possible 
        for certain Unicode entries to exceed Perforce internal limits.
        
        Because no basic Unicode character expands to more than three bytes,
        dividing the Perforce internal limit by three will ensure
        that no Unicode sequence will exceed the limit.
        
        To ensure that no Unicode sequence exceeds the Perforce limit,
        do not create client names or view patterns that exceed 341 
        Unicode characters.
        
        Under normal usage conditions, this is not expected to pose a
        significant limitation.
        
Localization of error and informational messages
        
	The error and informational messages in Perforce have been
	internationalized.  This means that you can read messages
	in your native language, if a translation has been provided
	(localization).
        
        if P4LANGUAGE is unset:
        
        	By default all messages (info and error) are reported in 
        	English. 
        
        if P4LANGUAGE is set:
        
		If a localization is available and your administrator
		has loaded the language specific messages into the
		Perforce database then you can activate native
		messages by setting P4LANGUAGE.

		example

		To have your messages returned in French set
		P4LANGUAGE to "fr-FR".
        
        
Administrator Notes
        
        The Perforce server operates in either an internationalized or a non-
        internationalized mode.  For release 2001.2, internationalized mode
        is activated upon invocation of "p4d -xi" as described above.
        
        Only Perforce client programs at 2001.2 or above are able to interact
        with an internationalized server.  P4CHARSET must be set for all such
        clients.
        
	The command line client ("p4") has a new global flag (-C)
	that overrides P4CHARSET settings.  For instance:
        
           p4 -C winansi files //...
        
        displays all filenames in the depot, as translated using the winansi
        code page.
        
Instructions for Translators (system integrators)
        
        To get a copy of the "English" message text file for translation
        contact technical support.
        
        To build a localized version of this file edit the text strings, taking
        care not to change any of the key parameters (except for the language
        code - note "en" changed to "fr"). It is also important not to change
        the named parameters (specified between %'s)  i.e. %depot% must remain 
        %depot% (even if there is a valid translation).  Square braces
	also require special care.
        
        example
        
	@pv@ 0 @db.message@ @en@ 822220833 @Depot '%depot%' unknown
	- use 'depot' to create it.@

	to translate into French

	@pv@ 0 @db.message@ @fr@ 822220833 @Depot '%depot%' inconnu
	- utilisez 'depot' pour le creer.@

	The character set of this file should be utf8.

	Once this file has been completely translated it can be
	loaded into Perforce with the following command:
        
        	p4d -jr /fullpath/message.txt
        
        The user would have to set the correct language code to get the native 
        messages,  for this case P4LANGUAGE would be set to "fr".
        
        ---

No new functionality for 2025.1

No new functionality for 2024.1

No new functionality for 2023.2

No new functionality for 2023.1

No new functionality for 2022.2

No new functionality for 2022.1

No new functionality for 2021.2

No new functionality for 2021.1

No new functionality for 2020.2

No new functionality for 2020.1

No new functionality for 2019.2

No new functionality for 2019.1
