From 188b4a655f792995abc68afb7a6f894d96e53f76 Mon Sep 17 00:00:00 2001
From: Gerald Combs <gerald@wireshark.org>
Date: Thu, 3 Sep 2020 18:40:36 -0700
Subject: [PATCH] README.developer: Note that sources can use UTF-8.

We started allowing source files to be encoded as UTF-8 in April 2019 in
bd75f5af0a. Update README.developer to match.

README.developer no longer has a "Code style" section, so update the
Developer's Guide to point to the "Portability" section.
---
 doc/README.developer                          | 22 ++++++++++---------
 .../wsdg_src/WSDG_chapter_build_intro.adoc    |  2 +-
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/doc/README.developer b/doc/README.developer
index bf15d68c4f..8c283f1cf4 100644
--- a/doc/README.developer
+++ b/doc/README.developer
@@ -501,16 +501,18 @@ automatically free()d when the dissection of the current packet ends so you
 don't have to worry about free()ing them explicitly in order to not leak memory.
 Please read README.wmem.
 
-Don't use non-ASCII characters in source files; not all compiler
-environments will be using the same encoding for non-ASCII characters,
-and at least one compiler (Microsoft's Visual C) will, in environments
-with double-byte character encodings, such as many Asian environments,
-fail if it sees a byte sequence in a source file that doesn't correspond
-to a valid character.  This causes source files using either an ISO
-8859/n single-byte character encoding or UTF-8 to fail to compile.  Even
-if the compiler doesn't fail, there is no guarantee that the compiler,
-or a developer's text editor, will interpret the characters the way you
-intend them to be interpreted.
+Source files can use UTF-8 encoding, but characters outside the ASCII
+range should be used sparingly. It should be safe to use non-ASCII
+characters in comments and strings, but some compilers (such as GCC
+versions prior to 10) may not support extended identifiers very well.
+There is also no guarantee that a developer's text editor will interpret
+the characters the way you intend them to be interpreted.
+
+The majority of Wireshark encodes strings as UTF-8. The main exception
+is the code that uses the Qt API, which uses UTF-16. Console output is
+UTF-8, but as with the source code extended characters should be used
+sparingly since some consoles (most notably Windows' cmd.exe) have
+limited support for UTF-8.
 
 3. Robustness.
 
diff --git a/docbook/wsdg_src/WSDG_chapter_build_intro.adoc b/docbook/wsdg_src/WSDG_chapter_build_intro.adoc
index 43e1a4eddc..c8d9a9b8f7 100644
--- a/docbook/wsdg_src/WSDG_chapter_build_intro.adoc
+++ b/docbook/wsdg_src/WSDG_chapter_build_intro.adoc
@@ -28,7 +28,7 @@ the _/capchild_ and _/caputils directories
 
 === Coding Style
 
-The coding style guides for Wireshark can be found in the "Code style"
+The coding style guides for Wireshark can be found in the “Portability”
 section of the file _doc/README.developer_.
 
 [[ChCodeGLib]]