Message ID | 4b372b47487992fa0b4036b4bfbb6c879f497786.1620641727.git.mchehab+huawei@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Get rid of UTF-8 chars that can be mapped as ASCII | expand |
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes: > While UTF-8 characters can be used at the Linux documentation, > the best is to use them only when ASCII doesn't offer a good replacement. > So, replace the occurences of the following UTF-8 characters: > > - U+00a0 (' '): NO-BREAK SPACE > - U+2013 ('–'): EN DASH > - U+2014 ('—'): EM DASH > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> > --- > Documentation/admin-guide/index.rst | 2 +- > Documentation/admin-guide/module-signing.rst | 4 +- > Documentation/admin-guide/ras.rst | 94 +++++++++---------- > .../admin-guide/reporting-issues.rst | 12 +-- > 4 files changed, 56 insertions(+), 56 deletions(-) Hi Mauro, This patch misses one occurrence of U+2014 in Documentation/admin-guide/sysctl/kernel.rst:1288. There are also countless occurrences in Documentation/, outside of Documentation/admin-guide. I suppose another patch in the series, which I didn't receive, will fix them? These characters will just reappear elsewhere, eventually. I'm not sure what is the gain here, other than minor consistence improvements. But we should add a Warning during documentation generation (if there isn't one already), to prevent them from spreading again.
Em Mon, 10 May 2021 14:40:09 -0400 Gabriel Krisman Bertazi <krisman@collabora.com> escreveu: > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes: > > > While UTF-8 characters can be used at the Linux documentation, > > the best is to use them only when ASCII doesn't offer a good replacement. > > So, replace the occurences of the following UTF-8 characters: > > > > - U+00a0 (' '): NO-BREAK SPACE > > - U+2013 ('–'): EN DASH > > - U+2014 ('—'): EM DASH > > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> > > --- > > Documentation/admin-guide/index.rst | 2 +- > > Documentation/admin-guide/module-signing.rst | 4 +- > > Documentation/admin-guide/ras.rst | 94 +++++++++---------- > > .../admin-guide/reporting-issues.rst | 12 +-- > > 4 files changed, 56 insertions(+), 56 deletions(-) > > Hi Mauro, > > This patch misses one occurrence of U+2014 in > Documentation/admin-guide/sysctl/kernel.rst:1288. It ended to be on a separate patch. > There are also countless occurrences in Documentation/, outside of > Documentation/admin-guide. I suppose another patch in the series, which > I didn't receive, will fix them? Yes. This series should fix all occurrences inside Documentation/ on *.rst files and on ABI, except for Documentation/translations[1]. [1] Still it probably makes sense to do a subset of the changes from this series there, but touching non-Latin translations are riskier. > These characters will just reappear elsewhere, eventually. I'm not sure > what is the gain here, other than minor consistence improvements. The main point here is that a large amount of those UTF-8 characters appeared as result of document conversion from DocBook/LaTeX/Markdown. As the conversion ended, I don't expect the need of re-doing a series like that in the near future. There are even some cases where the UTF-8 were doing wrong things, like using an EN DASH instead of an hyphen in order to pass a command line parameter, and the addition of non-printable BOM characters. So, IMO, this is a necessarily cleanup after the conversion. > But we > should add a Warning during documentation generation (if there isn't one > already), to prevent them from spreading again. Not sure if it is worth... See: people can (and should) use UTF-8 characters when needed, like for instance using Latin accented characters on names and translations, and use Greek letters when pertinent, like using MICRO SIGN or GREEK SMALL LETTER MU to represent microsseconds. On the other hand, using curly commas instead of ASCII ones and dashes instead of -- and --- only makes harder for people to type documents with normal editors without any gain, as Sphinx already convert those into curly commas and EN/EM DASH when it generates html/pdf docs. Thanks, Mauro
On Wed, 2021-05-12 at 10:44 +0200, Mauro Carvalho Chehab wrote: > The main point here is that a large amount of those UTF-8 characters > appeared as result of document conversion from DocBook/LaTeX/Markdown. > > As the conversion ended, I don't expect the need of re-doing a series > like that in the near future. > > There are even some cases where the UTF-8 were doing wrong things, like > using an EN DASH instead of an hyphen in order to pass a command line > parameter, and the addition of non-printable BOM characters. > > So, IMO, this is a necessarily cleanup after the conversion. That part — fixing characters that are *wrong*, such as converting a UTF-8 U+2014 EM DASH to a UTF-8 U+002D HYPHEN-MINUS, is reasonable enough. But you're not "avoiding using UTF-8 chars" there, as it says in the title of this patch. HYPHEN-MINUS encoded as 0x2D *is* UTF-8. I think you meant "avoid using non-ASCII chars", and even *that* is an entirely bogus reason for doing anything at all, as discussed. Limit yourself to fixing characters which are actually wrong, and it's fine. One level of pointless trivia below spelling errors, mind you, but at least not actively wrong.
Em Wed, 12 May 2021 10:25:35 +0100 David Woodhouse <dwmw2@infradead.org> escreveu: > On Wed, 2021-05-12 at 10:44 +0200, Mauro Carvalho Chehab wrote: > > The main point here is that a large amount of those UTF-8 characters > > appeared as result of document conversion from DocBook/LaTeX/Markdown. > > > > As the conversion ended, I don't expect the need of re-doing a series > > like that in the near future. > > > > There are even some cases where the UTF-8 were doing wrong things, like > > using an EN DASH instead of an hyphen in order to pass a command line > > parameter, and the addition of non-printable BOM characters. > > > > So, IMO, this is a necessarily cleanup after the conversion. > > That part — fixing characters that are *wrong*, such as converting a > UTF-8 U+2014 EM DASH to a UTF-8 U+002D HYPHEN-MINUS, is reasonable > enough. > > But you're not "avoiding using UTF-8 chars" there, as it says in the > title of this patch. HYPHEN-MINUS encoded as 0x2D *is* UTF-8. Yeah, you're right, as ASCII is a subset of UTF-8 - as ASCII is also subset of other charsets as well[1]. [1] ASCII is a subset for all charsets mentioned at: https://man7.org/linux/man-pages/man7/charsets.7.html A more precise title would be something like: Use ASCII instead of non-ASCII UTF-8 alternate symbols or Use ASCII subset instead of UTF-8 alternate symbols See, the goal of this series is to address the cases where there are multiple UTF-8 alternate symbols with the same meaning as the original ASCII set. Most of them were introduced by tools like DocBook/LaTeX/pandoc during document conversions[2], not by design, but just because the UTF-8 non-ASCII symbols produce a nicer output in html or pdf. In another words, it was a toolset decision to change them, diverging from what the author originally typed. [2] I suspect that a few of them could have been introduced as a result of someone using a text editor like libreoffice (or equivalent), that has a similar behavior. With ReST, there's no need to use any those, as the building tools will already do the such conversion when generating html/pdf output. So, better to stick with ASCII subset on such cases, as it allows to better use tools like grep and it makes easier to edit such files on editors like vi, nano, emacs, etc. Thanks, Mauro
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index dc00afcabb95..b1692643718d 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -3,7 +3,7 @@ The Linux kernel user's and administrator's guide The following is a collection of user-oriented documents that have been added to the kernel over time. There is, as yet, little overall order or -organization here — this material was not written to be a single, coherent +organization here - this material was not written to be a single, coherent document! With luck things will improve quickly over time. This initial section contains overall information, including the README diff --git a/Documentation/admin-guide/module-signing.rst b/Documentation/admin-guide/module-signing.rst index 7d7c7c8a545c..bd1d2fef78e8 100644 --- a/Documentation/admin-guide/module-signing.rst +++ b/Documentation/admin-guide/module-signing.rst @@ -100,8 +100,8 @@ This has a number of options available: ``certs/signing_key.pem`` will disable the autogeneration of signing keys and allow the kernel modules to be signed with a key of your choosing. The string provided should identify a file containing both a private key - and its corresponding X.509 certificate in PEM form, or — on systems where - the OpenSSL ENGINE_pkcs11 is functional — a PKCS#11 URI as defined by + and its corresponding X.509 certificate in PEM form, or - on systems where + the OpenSSL ENGINE_pkcs11 is functional - a PKCS#11 URI as defined by RFC7512. In the latter case, the PKCS#11 URI should reference both a certificate and a private key. diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst index 7b481b2a368e..00445adf8708 100644 --- a/Documentation/admin-guide/ras.rst +++ b/Documentation/admin-guide/ras.rst @@ -40,10 +40,10 @@ it causes data loss or system downtime. Among the monitoring measures, the most usual ones include: -* CPU – detect errors at instruction execution and at L1/L2/L3 caches; -* Memory – add error correction logic (ECC) to detect and correct errors; -* I/O – add CRC checksums for transferred data; -* Storage – RAID, journal file systems, checksums, +* CPU - detect errors at instruction execution and at L1/L2/L3 caches; +* Memory - add error correction logic (ECC) to detect and correct errors; +* I/O - add CRC checksums for transferred data; +* Storage - RAID, journal file systems, checksums, Self-Monitoring, Analysis and Reporting Technology (SMART). By monitoring the number of occurrences of error detections, it is possible @@ -443,49 +443,49 @@ A typical EDAC system has the following structure under /sys/devices/system/edac/ ├── mc - │ ├── mc0 - │ │ ├── ce_count - │ │ ├── ce_noinfo_count - │ │ ├── dimm0 - │ │ │ ├── dimm_ce_count - │ │ │ ├── dimm_dev_type - │ │ │ ├── dimm_edac_mode - │ │ │ ├── dimm_label - │ │ │ ├── dimm_location - │ │ │ ├── dimm_mem_type - │ │ │ ├── dimm_ue_count - │ │ │ ├── size - │ │ │ └── uevent - │ │ ├── max_location - │ │ ├── mc_name - │ │ ├── reset_counters - │ │ ├── seconds_since_reset - │ │ ├── size_mb - │ │ ├── ue_count - │ │ ├── ue_noinfo_count - │ │ └── uevent - │ ├── mc1 - │ │ ├── ce_count - │ │ ├── ce_noinfo_count - │ │ ├── dimm0 - │ │ │ ├── dimm_ce_count - │ │ │ ├── dimm_dev_type - │ │ │ ├── dimm_edac_mode - │ │ │ ├── dimm_label - │ │ │ ├── dimm_location - │ │ │ ├── dimm_mem_type - │ │ │ ├── dimm_ue_count - │ │ │ ├── size - │ │ │ └── uevent - │ │ ├── max_location - │ │ ├── mc_name - │ │ ├── reset_counters - │ │ ├── seconds_since_reset - │ │ ├── size_mb - │ │ ├── ue_count - │ │ ├── ue_noinfo_count - │ │ └── uevent - │ └── uevent + │ ├── mc0 + │ │ ├── ce_count + │ │ ├── ce_noinfo_count + │ │ ├── dimm0 + │ │ │ ├── dimm_ce_count + │ │ │ ├── dimm_dev_type + │ │ │ ├── dimm_edac_mode + │ │ │ ├── dimm_label + │ │ │ ├── dimm_location + │ │ │ ├── dimm_mem_type + │ │ │ ├── dimm_ue_count + │ │ │ ├── size + │ │ │ └── uevent + │ │ ├── max_location + │ │ ├── mc_name + │ │ ├── reset_counters + │ │ ├── seconds_since_reset + │ │ ├── size_mb + │ │ ├── ue_count + │ │ ├── ue_noinfo_count + │ │ └── uevent + │ ├── mc1 + │ │ ├── ce_count + │ │ ├── ce_noinfo_count + │ │ ├── dimm0 + │ │ │ ├── dimm_ce_count + │ │ │ ├── dimm_dev_type + │ │ │ ├── dimm_edac_mode + │ │ │ ├── dimm_label + │ │ │ ├── dimm_location + │ │ │ ├── dimm_mem_type + │ │ │ ├── dimm_ue_count + │ │ │ ├── size + │ │ │ └── uevent + │ │ ├── max_location + │ │ ├── mc_name + │ │ ├── reset_counters + │ │ ├── seconds_since_reset + │ │ ├── size_mb + │ │ ├── ue_count + │ │ ├── ue_noinfo_count + │ │ └── uevent + │ └── uevent └── uevent In the ``dimmX`` directories are EDAC control and attribute files for diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst index 18d8e25ba9df..f691930e13c0 100644 --- a/Documentation/admin-guide/reporting-issues.rst +++ b/Documentation/admin-guide/reporting-issues.rst @@ -824,7 +824,7 @@ and look a little lower at the table. At its top you'll see a line starting with mainline, which most of the time will point to a pre-release with a version number like '5.8-rc2'. If that's the case, you'll want to use this mainline kernel for testing, as that where all fixes have to be applied first. Do not let -that 'rc' scare you, these 'development kernels' are pretty reliable — and you +that 'rc' scare you, these 'development kernels' are pretty reliable - and you made a backup, as you were instructed above, didn't you? In about two out of every nine to ten weeks, mainline might point you to a @@ -866,7 +866,7 @@ How to obtain a fresh Linux kernel ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Using a pre-compiled kernel**: This is often the quickest, easiest, and safest -way for testing — especially is you are unfamiliar with the Linux kernel. The +way for testing - especially is you are unfamiliar with the Linux kernel. The problem: most of those shipped by distributors or add-on repositories are build from modified Linux sources. They are thus not vanilla and therefore often unsuitable for testing and issue reporting: the changes might cause the issue @@ -1248,7 +1248,7 @@ paragraph makes the severeness obvious. In case you performed a successful bisection, use the title of the change that introduced the regression as the second part of your subject. Make the report -also mention the commit id of the culprit. In case of an unsuccessful bisection, +also mention the commit id of the culprit. In case of an unsuccessful bisection, make your report mention the latest tested version that's working fine (say 5.7) and the oldest where the issue occurs (say 5.8-rc1). @@ -1345,7 +1345,7 @@ about it to a chatroom or forum you normally hang out. **Be patient**: If you are really lucky you might get a reply to your report within a few hours. But most of the time it will take longer, as maintainers -are scattered around the globe and thus might be in a different time zone – one +are scattered around the globe and thus might be in a different time zone - one where they already enjoy their night away from keyboard. In general, kernel developers will take one to five business days to respond to @@ -1388,7 +1388,7 @@ Here are your duties in case you got replies to your report: **Check who you deal with**: Most of the time it will be the maintainer or a developer of the particular code area that will respond to your report. But as -issues are normally reported in public it could be anyone that's replying — +issues are normally reported in public it could be anyone that's replying - including people that want to help, but in the end might guide you totally off track with their questions or requests. That rarely happens, but it's one of many reasons why it's wise to quickly run an internet search to see who you're @@ -1716,7 +1716,7 @@ Maybe their test hardware broke, got replaced by something more fancy, or is so old that it's something you don't find much outside of computer museums anymore. Sometimes developer stops caring for their code and Linux at all, as something different in their life became way more important. In some cases -nobody is willing to take over the job as maintainer – and nobody can be forced +nobody is willing to take over the job as maintainer - and nobody can be forced to, as contributing to the Linux kernel is done on a voluntary basis. Abandoned drivers nevertheless remain in the kernel: they are still useful for people and removing would be a regression.
While UTF-8 characters can be used at the Linux documentation, the best is to use them only when ASCII doesn't offer a good replacement. So, replace the occurences of the following UTF-8 characters: - U+00a0 (' '): NO-BREAK SPACE - U+2013 ('–'): EN DASH - U+2014 ('—'): EM DASH Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> --- Documentation/admin-guide/index.rst | 2 +- Documentation/admin-guide/module-signing.rst | 4 +- Documentation/admin-guide/ras.rst | 94 +++++++++---------- .../admin-guide/reporting-issues.rst | 12 +-- 4 files changed, 56 insertions(+), 56 deletions(-)