Discussion:
MIME header question
(too old to reply)
Adam H. Kerman
2022-01-17 19:21:24 UTC
Permalink
I asked in another newsgroup.

Eduardo, in your interpretation of the RFCs is declaring 7 bit on
Content Transfer Encoding in conflict with declaring UTF-8 as the
character set?

Logically it seems to me that the two headers should be set jointly and
not UTF-8 without the use of non-ASCII characters if transfer encoding
is marked as 7 bit.

pine/alpine have always parsed for the lowest denomination character set
despite the user's settings. If there are no non-ASCII characters, then
the character set marking is US-ASCII and transfer encoding 7 bit.

I don't know of another client that performs that parsing.
Eduardo Chappa
2022-01-17 20:15:18 UTC
Permalink
Post by Adam H. Kerman
Eduardo, in your interpretation of the RFCs is declaring 7 bit on
Content Transfer Encoding in conflict with declaring UTF-8 as the
character set?
Dear Adam,

I do not think there is a conflict here. Let me say it in a different way.
The Content-Tranfer-Encoding here just tells you how to process the data.
If could have other values, such as base64, or quoted-printable, so the
value tells you what to do with the data. In the case of 7 bit just
interpret that 7 bit in the charset, in this case utf-8, which actually
means US-ASCII. In other words

7bit intersected with utf-8 = us-ascii,

so you could write us-ascii for the charset in this case, or utf-8. It
seems more like a question of style, not of correctness.

Having said that, I prefer to use us-ascii in this case because more
clients are likely to understand us-ascii instead of utf-8. Alpine did not
get utf-8 handling until very late, while many other clients understood
utf-8, so it was better for pine users to receive a message 7bit in
us-ascii than 7-bit in utf-8, because Pine could not handle the latter.

I doubt that there are Pine users still out there (although I can always
be proven wrong) but it is better to be conservative here in my opinion.
--
Eduardo
https://tinyurl.com/yc377wlh (web)
http://repo.or.cz/alpine.git (Git)
Adam H. Kerman
2022-01-17 21:49:50 UTC
Permalink
Post by Eduardo Chappa
Post by Adam H. Kerman
Eduardo, in your interpretation of the RFCs is declaring 7 bit on
Content Transfer Encoding in conflict with declaring UTF-8 as the
character set?
I do not think there is a conflict here. Let me say it in a different way.
The Content-Tranfer-Encoding here just tells you how to process the data.
If could have other values, such as base64, or quoted-printable, so the
value tells you what to do with the data. In the case of 7 bit just
interpret that 7 bit in the charset, in this case utf-8, which actually
means US-ASCII. In other words
7bit intersected with utf-8 = us-ascii,
so you could write us-ascii for the charset in this case, or utf-8. It
seems more like a question of style, not of correctness.
Thanks. This is why I asked you. I thought 7 bit was about the
communication channel and not the capabilities of the client and display
on the other end.

If the display interprets MIME headers, does that mean the same 7-bit
character is displayed ignoring the eighth bit or two characters are
displayed in a UTF-8 double byte character? All this time, when my
terminal emulation translation didn't match what was received (I have to
change it manually), I thought I was changed the assumed character set,
not the transfer encoding toggle.
Post by Eduardo Chappa
Having said that, I prefer to use us-ascii in this case because more
clients are likely to understand us-ascii instead of utf-8. Alpine did not
get utf-8 handling until very late, while many other clients understood
utf-8, so it was better for pine users to receive a message 7bit in
us-ascii than 7-bit in utf-8, because Pine could not handle the latter.
I doubt that there are Pine users still out there (although I can always
be proven wrong) but it is better to be conservative here in my opinion.
I certainly agree with you.
Eduardo Chappa
2022-01-17 23:29:21 UTC
Permalink
Post by Adam H. Kerman
Thanks. This is why I asked you. I thought 7 bit was about the
communication channel and not the capabilities of the client and display
on the other end.
If the display interprets MIME headers, does that mean the same 7-bit
character is displayed ignoring the eighth bit or two characters are
displayed in a UTF-8 double byte character? All this time, when my
terminal emulation translation didn't match what was received (I have to
change it manually), I thought I was changed the assumed character set,
not the transfer encoding toggle.
Dear Adam,

I never used the word display to refer to how the message actually
displays on the screen. The headers tell the client what to do internally.
For example, if the content-transfer-encoding were base64, then this tells
the client to decode the encoded blob. Same with 7bit. It just tells to
interpret the 7 bit it finds in the given charset. This will become a
character on screen later on.

I have to acknowledge that I do not understand completely what you are
saying. There is no "transfer encoding toggle" in Alpine, nor there is a
"assumed character set", so I am not exactly sure what you are referring
to, but if I understand you correctly, you are asking what happens to
multibyte characters. Unless you make changes to the default configuration
in Alpine, Alpine will send to the terminal utf-8 codes, which the
terminal will display if it is utf-8 capable. Do you have Alpine and our
terminal configured differently?
--
Eduardo
https://tinyurl.com/yc377wlh (web)
http://repo.or.cz/alpine.git (Git)
Adam H. Kerman
2022-01-18 03:52:58 UTC
Permalink
Post by Eduardo Chappa
Post by Adam H. Kerman
Thanks. This is why I asked you. I thought 7 bit was about the
communication channel and not the capabilities of the client and display
on the other end.
If the display interprets MIME headers, does that mean the same 7-bit
character is displayed ignoring the eighth bit or two characters are
displayed in a UTF-8 double byte character? All this time, when my
terminal emulation translation didn't match what was received (I have to
change it manually), I thought I was changed the assumed character set,
not the transfer encoding toggle.
I never used the word display to refer to how the message actually
displays on the screen. The headers tell the client what to do internally.
For example, if the content-transfer-encoding were base64, then this tells
the client to decode the encoded blob. Same with 7bit. It just tells to
interpret the 7 bit it finds in the given charset. This will become a
character on screen later on.
I have to acknowledge that I do not understand completely what you are
saying. There is no "transfer encoding toggle" in Alpine,
Sorry to be unclear. I just meant that the standard allows a choice of
encoding schemes, as you've been discussing.
Post by Eduardo Chappa
nor there is a "assumed character set",
The user can name a character set in .pinerc. Isn't that for the composer
as well as the display? If there are no non-ASCII characters, the MIME
header declares ASCII no matter how the user set this feature.

I liked the fact that alpine declares a lowest denomination character
set.
Post by Eduardo Chappa
so I am not exactly sure what you are referring
to, but if I understand you correctly, you are asking what happens to
multibyte characters. Unless you make changes to the default configuration
in Alpine, Alpine will send to the terminal utf-8 codes, which the
terminal will display if it is utf-8 capable. Do you have Alpine and our
terminal configured differently?
I usually have to change the translation between ISO-8859-1 and UTF-8
depending on what Usenet article I'm looking at. alpine isn't my
newsreader. Also, in followup, I liked to get rid of the nonprinting
characters; translation mismatch can make them visible. I post in ASCII
whenever possible.

John Levine
2022-01-17 23:36:00 UTC
Permalink
Post by Eduardo Chappa
Post by Adam H. Kerman
Eduardo, in your interpretation of the RFCs is declaring 7 bit on
Content Transfer Encoding in conflict with declaring UTF-8 as the
character set?
I'm not Eduardo, but it's clearly not valid. RFC 2045 says

An encoding type of 7BIT requires that the body
is already in a 7bit mail-ready representation.

Needless to say, UTF-8 is not 7bit mail-ready. I can believe that
some mail programs have tried to make sense of this, but it's utterly
ad-hoc and whatever they do with it is wrong. Maybe stuff declared to
be UTF-8 is in fact just ASCII in a particular message, but I wouldn't
count on it.
Post by Eduardo Chappa
I doubt that there are Pine users still out there (although I can always
be proven wrong) but it is better to be conservative here in my opinion.
Probably not, although there are plenty of us Alpine users.

R's,
John
--
Regards,
John Levine, ***@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly
Loading...