Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401040534.AA17784@necom830.cc.titech.ac.jp>
Subject: Re: Response to MIME charset issue
To: Glenn Adams <glenn@metis.com>
Date: Tue, 4 Jan 94 14:34:20 JST
Cc: John_Jenkins@taligent.com, ietf-822@dimacs.rutgers.edu, 
    unicored@unicode.org
In-Reply-To: <9312290722.AA02456@sapir.metis.com>; from "Glenn Adams" at Dec 29, 93 2:22 am

>   [Ohta]
>   : The problem is that ISO 10646/Unicode does not address the issue for
>   : displaying or comparing strings which contains more than a single
>   : "character".
> 
> Tell me, since when did any (international or national) character set
> define either display or sorting behaviour?  Does JIS X 0208 define
> sorting order?  Does ASCII?  Do ASMO 449 (Arabic) or ISCII (Indian)
> standards define display behaviour for what are clearly complex scripts?
> The answer is no.  None of them do.

No, while comparison would be important for NIR in general RFC 1521
says nothing about that.

But, RFC1521/1522 does require profiling-free code->glyph mapping even
in individual headers.

X.400 is OK not to provide character set information in its fields,
of course. So what?

> The same answer applies to 10646:
> it should not define either.

That's why 10646, as is, is no good for MIME.

Or, do you have any reason why 10646 is any good for MIME charset
without additional profiling information?

> They are in the application domain.

They are at the bottom of the application domain.

> Although interrelated,
> such ancilliary work needs to be separated from the character set standard
> itself, whose task is *only* to enumerate a set of characters assigned to
> code points according to some encoding scheme, *and no more*.

How can you say you can design the bottom, 10646,  without knowing
how the upper layers will be?

>   [Ohta]
>   : I designed ICODE exactly for such a purpose. How do you think about it?
> 
> I don't know anything about it.  Perhaps you could send me a paper describing
> it.  [Metis Technology, Inc., 522 Atlantic Ave., Boston, MA 02210].  I would
> be interested in looking at it from an academic perspective, but not from
> any practical perspective since it won't go anywhere as a real standard.

I hope you see it with the *ENGINEERING* perspective.

I really hope you don't see it with the perspective of ISO-is-the-real-
and-the-only-standard.

>   [Ohta]
>   : But that was the job we expected to SC2.
> 
> Who are "we"?

Many. At the time before the first DIS was voted down, many people in
Japan and many people who developed MIME has expected so.

> Perhaps *you* expected it, but you seemed
> to neglect to inform SC2 that that is what their charter should have been.

The charter was that "to develop the universal character set".

The result is "the universal character set" with the meaning of
"character" minimized and the meaning of "universal" undefined.

> If you would bother to research the history of the charter of SC2/WG2 or of
> SC2 in general you would find that your expectations wildly contradict
> reality.

I'm proud of being my reality contradict with that of ISO.

							Masataka Ohta


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa13463;
          10 Jan 94 20:29 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa13459;
          10 Jan 94 20:29 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa29464;
          10 Jan 94 20:29 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA13952; Mon, 10 Jan 94 20:05:22 EST
Received: from taligent.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA13948; Mon, 10 Jan 94 20:05:21 EST
Received: from david-goldsmith.taligent.com by taligent.com with SMTP (5.67/23-Oct-1991-eef)
	id AA28930; Mon, 10 Jan 94 17:02:48 -0800
	for 
Message-Id: <9401110102.AA28930@taligent.com>
X-Sender: dgold@banpeikun-rx.taligent.com
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Mon, 10 Jan 1994 17:02:50 -0800
To: unicored@unicode.org, ietf-822@dimacs.rutgers.edu
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: David Goldsmith <David_Goldsmith@taligent.com>
Subject: Possible changes to UTF-7

We are considering removing the portions of UTF-7 (the proposed 7 bit
encoding format of ISO 10646/Unicode which I posted to these lists in
December) which duplicate the quoted printable content transfer encoding.
This would involve removing rules 3, 4, and 5 regarding white space, line
breaks, and soft line breaks. This will make UTF-7 slightly simpler, and
UTF-7 data could still be passed with 7BIT content transfer encoding if it
met the criteria, just as for ASCII data. If lines were too long or there
were other problems, then UTF-7 could be passed through the quoted
printable content transfer encoding.

Since mailers will likely already be set up to examine messages to
determine which encoding to use, it seems like less work for implementers
to use existing code to handle line length and whitespace issues.
Converting 10646 to UTF-7 then basically consists of making it
ASCII-compatible, and leaving the rest to the content transfer encoding
software. This comes at a higher processing cost, however, since the
message stream must be processed twice.

We would like to hear your opinion on this particular issue.

David Goldsmith and Mark Davis

----------------------------
David Goldsmith
david_goldsmith@taligent.com
Taligent, Inc.
10201 N. DeAnza Blvd.
Cupertino, CA  95014-2233


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa21824;
          11 Jan 94 0:22 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa21820;
          11 Jan 94 0:22 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa03380;
          11 Jan 94 0:22 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA19633; Mon, 10 Jan 94 23:53:56 EST
Received: from SIGURD.INNOSOFT.COM by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA19629; Mon, 10 Jan 94 23:53:54 EST
Received: from SIGURD.INNOSOFT.COM by SIGURD.INNOSOFT.COM (PMDF V4.3-3 #1234)
 id <01H7ILHC6NZK9LVS0F@SIGURD.INNOSOFT.COM>; Mon, 10 Jan 1994 20:38:59 PDT
Date: Mon, 10 Jan 1994 20:37:14 -0700 (PDT)
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Ned Freed <NED@sigurd.innosoft.com>
Subject: Re: Possible changes to UTF-7
In-Reply-To: Your message dated "Mon, 10 Jan 1994 17:02:50 -0800"
 <9401110102.AA28930@taligent.com>
To: David_Goldsmith@taligent.com
Cc: unicored@unicode.org, ietf-822@dimacs.rutgers.edu
Message-Id: <01H7J4AK85JK9LVS0F@SIGURD.INNOSOFT.COM>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7BIT

> We are considering removing the portions of UTF-7 (the proposed 7 bit
> encoding format of ISO 10646/Unicode which I posted to these lists in
> December) which duplicate the quoted printable content transfer encoding.
> This would involve removing rules 3, 4, and 5 regarding white space, line
> breaks, and soft line breaks. This will make UTF-7 slightly simpler, and
> UTF-7 data could still be passed with 7BIT content transfer encoding if it
> met the criteria, just as for ASCII data. If lines were too long or there
> were other problems, then UTF-7 could be passed through the quoted
> printable content transfer encoding.

> Since mailers will likely already be set up to examine messages to
> determine which encoding to use, it seems like less work for implementers
> to use existing code to handle line length and whitespace issues.
> Converting 10646 to UTF-7 then basically consists of making it
> ASCII-compatible, and leaving the rest to the content transfer encoding
> software. This comes at a higher processing cost, however, since the
> message stream must be processed twice.

> We would like to hear your opinion on this particular issue.

This sounds like a very good idea to me. As you say, mailers are already set
up to perform standard MIME encoding as needed. There's no need to reimplement
this particular wheel. Moreover, if you do implement the encoding inside
of the character set you are likely to end up with two levels of encoding
sometimes, which isn't at all nice.

				Ned


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa09946;
          12 Jan 94 12:54 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa09941;
          12 Jan 94 12:54 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa12633;
          12 Jan 94 12:54 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA18219; Wed, 12 Jan 94 12:39:09 EST
Received: from Mordor.Stanford.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA18215; Wed, 12 Jan 94 12:39:07 EST
Received: from localhost by Mordor.Stanford.EDU (8.6.4/inc-1.0)
	id JAA07428; Wed, 12 Jan 1994 09:39:06 -0800
Message-Id: <199401121739.JAA07428@Mordor.Stanford.EDU>
To: ietf-822@dimacs.rutgers.edu
Subject: EDI over Mime
Contact: phone:  +1 408 246 8253;  fax: +1 408 249 6205
Date: Wed, 12 Jan 94 09:39:06 -0800
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Dave Crocker <dcrocker@mordor.stanford.edu>
X-Mts: smtp

Folks,

The mailing list for specifying EDI encapsulation within Mime has been 
quite active.  However, my sense is that virtually none of the Mime
technical community is participating.

Not surprisingly, the EDI world has a view of how to do things and it
has differences from the IETF and Mime style.  To work through such
differences well, there needs to be adequate participation from both
cultures.

I urge you to join and participate.

To join, send a message to 

	listserv@byu.edu

with a body having the line:

	sub ietf-edi <yourname>

thanks.

d/


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa27133;
          13 Jan 94 1:09 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa27129;
          13 Jan 94 1:09 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa27165;
          13 Jan 94 1:09 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA03904; Thu, 13 Jan 94 00:42:52 EST
Received: from mcigateway.mcimail.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA03900; Thu, 13 Jan 94 00:42:49 EST
Received: from mcimail.com by MCIGATEWAY.MCIMail.com id ad23732;
          13 Jan 94 5:23 GMT
Date: Thu, 13 Jan 94 00:27 EST
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: "Robert G. Moskowitz" <0003858921@mcimail.com>
To: Dave Crocker <dcrocker@mordor.stanford.edu>
To: ietf 822 <ietf-822@dimacs.rutgers.edu>
Subject: Re: EDI over Mime
Message-Id: <72940113052727/0003858921NA2EM@mcimail.com>

>The mailing list for specifying EDI encapsulation within Mime has been 
>quite active.  However, my sense is that virtually none of the Mime
>technical community is participating.

>Not surprisingly, the EDI world has a view of how to do things and it
>has differences from the IETF and Mime style.  To work through such
>differences well, there needs to be adequate participation from both
>cultures.

Yes, Please!  Would the Mime eXperts add some leavening to the EDI eXperts
so we getting something out of this that will endure?

I am particularly concerned about the sloffing off the need to forward the
EMail address to the EDI translator so it knows how to return mail.  They
are appearently expecting the EDI trading partners database to continue
forever.  Also the PEM cert.

Bob
  

Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa03405;
          17 Jan 94 15:41 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa03401;
          17 Jan 94 15:41 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa21348;
          17 Jan 94 15:41 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA29292; Mon, 17 Jan 94 15:13:49 EST
Received: from taligent.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA29288; Mon, 17 Jan 94 15:13:48 EST
Received: from qm.taligent.com by taligent.com with SMTP (5.67/23-Oct-1991-eef)
	id AA16319; Mon, 17 Jan 94 12:12:28 -0800
	for 
Message-Id: <9401172012.AA16319@taligent.com>
Date: 17 Jan 1994 11:59:57 -0800
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Mark Davis <Mark_Davis@taligent.com>
Subject: Response
To: ietf-822 + <ietf-822@dimacs.rutgers.edu>, 
    Unicored + <unicored@unicode.org>

Subject:   Response                             Time: 11:53 AM   Date:
01/17/94
This is in response to an issue raised on BIDI a couple of weeks ago that was
sent to David.

Mark Davis

>> --------------------------------------
>> Date: 01/04/94 3:57 PM
>> David,
>> 
>> I need to slip on my Area Director hat and inject a few policy
>> statements into this evolving discussion. I want to stress that they are
>> policy statements, and not technical ones. Discussion needs to continue
>> on the technical issues until something reasonably approximately
>> consensus is reached.  I don't think anything here contradicts what
>> Nathaniel and Ned told you.
>> 
>> (1) Please be very clear about whether you are discussing UNICODE or IS
>> 10646-1:1983 ("BMP", henceforth in this note "10646").  While the code
>> point mappings are the same, the introductory and conformance texts
>> appear to be different.  It will be much easier for you to pick one and
>> use it than to prove that the two are the same and therefore, the two
>> can be taken as equivalent.  There is a deliberate bias in MIME toward
>> use of the ISO standard, so take that as a preference unless there are
>> strong reasons for using UNICODE.

The goal in producing the MIME proposal was to identify 10646 as the charset.
In the process of merging Unicode 1.0 and 10646, Unicode 1.1 has been brought
into full conformance with 10646. Unicode also adds additional information
that more exactly specifies the behavior of characters. As a practical matter,
I fully expect that the vast majority of 10646 implementations will be Unicode
implementations as well.

My personal preference (speaking without any official hats on) would be to
allow the specification of both 10646 and 10646/Unicode, where the latter
provides receivers with more information about the incoming text. In such a
case, a receiver that wants to distinguish between certain aspects of unmarked
10646 and Unicode can; others can ignore the additional information supplied.
In the absence of this information, receivers will probably assume that the
text is Unicode (but have no assurance of that fact).
 
>> (2) 10646 does not specify the presentation order of characters on,
>> e.g., a screen, relative to characters in the data stream.  For
>> languages whose characters are read from right to left, this implies a
>> profiling issue, since there are several methods in use of writing the
>> characters of those languages into the data stream.  This issue was
>> addressed in a special review at the Houston IETF and the conclusion was
>> that the character sets used with Hebrew and Arabic should be registered
>> three times each, to correspond to the presentation orders defined in
>> the relevant ECMA/ISO standard.  A "charset" definition for 10646 must
>> address this issue or it doesn't meet the profile-free MIME requirement.

The default internal ordering of characters within 10646 is logical order; it
also provides format codes for controlling the implicit and explicit
presentation order (0x200E, 0x200F, and 0x202A through 0x202C), or can be used
with other standards such as ISO/IEC 6429.

(Unicode provides an detailed algorithm for determining presentation order of
10646 characters within a line or paragraph--even in the absence of
presentation format codes. In the presence of presentation format codes, it
also specifies use of those codes, and their interaction with the implicit
algorithm.)

However, since the default internal ordering of the characters is specified,
the semantics of the text is preserved by 10646, and can be used to transmit
information correctly even in the absence of any further presentation
information. What is not specified by 10646 is the precise details of the
graphical expression of the text.

[A point that often seems to be lost in these discussions is that *no*
character set specifies the precise details of the graphical expression of the
text; neither JIS, nor ASCII, nor 8859/x etc. None of them specify the exact
bits used to draw a character, the exact width of a character, the exact
placement of successive characters, nor the exact shape of a character.]

>> (3) Your text includes the statements:
>>     The United States bodies X3L2 and X3V1 have recently developed a
>>     character/glyph model whose main purpose is to clarify the use of these
>>     terms and provide examples of their usage.  This character/glyph model
was
>>     developed at the request of the relevant ISO bodies and has been
forwarded
>>     both to SC2 and SC18 for formal approval.
>> 
>> As a general rule, IETF is very nervous about basing our work on
>> something that is halfway through the ISO process.  That rule has been
>> strongly reinforced, probably to "showstopper" level in this particular
>> area by the history of 10646.  When MIME was first being designed, the
>> working group was told, essentially, that 10646 was a done deal and
>> should be IETF-standardized on the basis of the DIS.  Unfortunately,
>> that was DIS-1, and the turmoil DIS-2 and the current Standard had yet
>> to happen and was, at that time, unexpected.  
>> 
>> Consequently, if the definition you are proposing depends on an emerging
>> piece of work in JTC1, and the quality or utility of that definition
>> would be changed significantly if JTC1 decides to do "something else",
>> your efforts are likely to go into IETF-hold until JTC1 does something
>> definitive.

The work being done in this area is to help further refine the distinctions
between characters and glyphs, and in no way implies that 10646 cannot be
currently be implemented as it stands. The main goal of the character/glyph
model is to help the relevant subcommittees to catagorize future proposed
additions to 10646 to ensure that they are not better coded by SC18. This will
have no effect on the characters currently in 10646, and have no impact on
consideration of 10646 within MIME.

>> (4) As Nathaniel mentioned, the use of 10646 was specifically intended
>> as soon as it settled down enough to be adequated defined.  It was not
>> included in-line in RFC 1521 and its predecessor along with US-ASCII and
>> the 8859 group only because those definitions were not in place.  
>> Precisely because it is considered very important, you should assume
>> that we will want to place any definitional, external profiling, or
>> restrictive information that goes with MIME use of 10646 on the IETF
>> standards track.
>> 
>>    --john
>> 


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa29571;
          24 Jan 94 2:16 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa29567;
          24 Jan 94 2:16 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa06525;
          24 Jan 94 2:16 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA02752; Mon, 24 Jan 94 01:48:02 EST
Received: from necom830.cc.titech.ac.jp by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA02378; Mon, 24 Jan 94 01:47:50 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Mon, 24 Jan 94 15:42:22 +0900
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401240642.AA12485@necom830.cc.titech.ac.jp>
Subject: Re: Response
To: Mark Davis <Mark_Davis@taligent.com>
Date: Mon, 24 Jan 94 15:42:20 JST
Cc: ietf-822@dimacs.rutgers.edu, unicored@unicode.org
In-Reply-To: <9401172012.AA16319@taligent.com>; from "Mark Davis" at Jan 17, 94 11:59 am
X-Mailer: ELM [version 2.3 PL11]

> Subject:   Response                             Time: 11:53 AM   Date:
> 01/17/94
> This is in response to an issue raised on BIDI a couple of weeks ago that was
> sent to David.

Have we agreed that ISO 10646 needs some profiling to be a MIME charset?

> The goal in producing the MIME proposal was to identify 10646 as the charset.

10646 here means DIS 10646-1.0 which is much better than the current IS.

> The default internal ordering of characters within 10646 is logical order; it

Where in the ISO 10646 is it described? It seems to me that it only
says logical ordering would be "usual".

> it
> also provides format codes for controlling the implicit and explicit
> presentation order (0x200E, 0x200F, and 0x202A through 0x202C),

They are in an informative annex.

> or can be used
> with other standards such as ISO/IEC 6429.

So? What is your choice?

> (Unicode provides an detailed algorithm for determining presentation order of
> 10646 characters within a line or paragraph--even in the absence of
> presentation format codes.

I have never seen any detailed algorithm in Uncode books.

Where is it described?

> [A point that often seems to be lost in these discussions is that *no*
> character set specifies the precise details of the graphical expression of the
> text; neither JIS, nor ASCII, nor 8859/x etc.

I know that many people think distinction between lower/upper case
is unnecessary, to whom ASCII is too luxurious.

But, I think *YOU* need 8859/1, which contains characters which are identical
to characters in ASCII save precise details.

> None of them specify the exact
> bits used to draw a character, the exact width of a character, the exact
> placement of successive characters, nor the exact shape of a character.]

Font is a problem different from plain text encoding, of course. So?

							Masataka Ohta


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa09634;
          24 Jan 94 12:09 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa09628;
          24 Jan 94 12:09 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa17937;
          24 Jan 94 12:09 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA13624; Mon, 24 Jan 94 11:39:29 EST
Received: from SAPIR.METIS.COM by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA12019; Mon, 24 Jan 94 11:38:36 EST
Received: from trubetzkoy.metis.com by sapir.metis.com (4.1/METIS-4.10) id AA06551; Mon, 24 Jan 94 11:37:47 EST
Received: by trubetzkoy.metis.com (NX5.67d/NX3.0M) id AA06999; Mon, 24 Jan 94 11:37:44 -0500
Date: Mon, 24 Jan 94 11:37:44 -0500
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Glenn Adams <glenn@metis.com>
Message-Id: <9401241637.AA06999@trubetzkoy.metis.com>
Received: by NeXT.Mailer (1.100)
Received: by NeXT Mailer (1.100)
To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Subject: Re: Response
Cc: ietf-822@dimacs.rutgers.edu, unicored@unicode.org


  From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
  Date: Mon, 24 Jan 94 15:42:20 JST

  Have we agreed that ISO 10646 needs some profiling to be a MIME charset?

No, it does not need profiling to be used by MIME.  If it does, then
ASCII does too.

  10646 here means DIS 10646-1.0 which is much better than the current IS.

Excuse me, but what purpose does it serve to keep recalling DIS 1?  It
is dead and buried.  Your longings won't resurrect it.

  > (Unicode provides an detailed algorithm for determining presentation
  > order of 10646 characters within a line or paragraph--even in the
  > absence of presentation format codes.

  I have never seen any detailed algorithm in Uncode books.
  Where is it described?

Well, it may not be your idea of "detailed", but Appendix A in volume I,
along with corrections in Appendix D of volume II and erratta in TR #4
define an algorithm, which, if you possess only an average amount of
cleverness, will allow you to implement the Unicode BIDI algorithm in
real code without any problems.  Of course, it would help if you knew
something about Arabic or Hebrew too; so perhaps it will be a bit more
difficult for you.

Glenn Adams


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa25324;
          24 Jan 94 23:43 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa25320;
          24 Jan 94 23:43 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa04610;
          24 Jan 94 23:43 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA22471; Mon, 24 Jan 94 23:15:29 EST
Received: from necom830.cc.titech.ac.jp by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA22187; Mon, 24 Jan 94 23:15:20 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 25 Jan 94 13:09:53 +0859
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401250410.AA15763@necom830.cc.titech.ac.jp>
Subject: Re: Response
To: Glenn Adams <glenn@metis.com>
Date: Tue, 25 Jan 94 13:09:52 JST
Cc: ietf-822@dimacs.rutgers.edu, unicored@unicode.org
In-Reply-To: <9401241637.AA06999@trubetzkoy.metis.com>; from "Glenn Adams" at Jan 24, 94 11:37 am
X-Mailer: ELM [version 2.3 PL11]

>   From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
>   Date: Mon, 24 Jan 94 15:42:20 JST
> 
>   Have we agreed that ISO 10646 needs some profiling to be a MIME charset?
> 
> No, it does not need profiling to be used by MIME.  If it does, then
> ASCII does too.

I'm afraid that you don't understand the fact that *no* character
set specifies the precise details of the graphical expression of the
text.

Anyway, you should read RFC 1555.

>   10646 here means DIS 10646-1.0 which is much better than the current IS.
> 
> Excuse me, but what purpose does it serve to keep recalling DIS 1?

Ask Mark Davis who said that the intention of MIME was to identify
DIS 1 as the charset.

> It is dead and buried.

So bury Mark's longing with it.

>   > (Unicode provides an detailed algorithm for determining presentation
>   > order of 10646 characters within a line or paragraph--even in the
>   > absence of presentation format codes.
> 
>   I have never seen any detailed algorithm in Uncode books.
>   Where is it described?
> 
> Well, it may not be your idea of "detailed", but Appendix A in volume I,
> along with corrections in Appendix D of volume II and erratta in TR #4
> define an algorithm, which, if you possess only an average amount of
> cleverness, will allow you to implement the Unicode BIDI algorithm in
> real code without any problems.  Of course, it would help if you knew
> something about Arabic or Hebrew too; so perhaps it will be a bit more
> difficult for you.

Do you want to convince me that some necessary profiling information
for 10646 is provided by Unicode?

As I already know that I can't display Japanese text with UNICODE without
some profiling information, you don't have to do so.

						Masataka Ohta


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa29056;
          25 Jan 94 0:18 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa29052;
          25 Jan 94 0:18 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa05348;
          25 Jan 94 0:18 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA16691; Mon, 24 Jan 94 23:59:34 EST
Received: from netmail2.microsoft.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA16647; Mon, 24 Jan 94 23:59:33 EST
Received:  by netmail2.microsoft.com (5.65/25-eef)
	id AA03201; Mon, 24 Jan 94 20:59:58 -0800
Message-Id: <9401250459.AA03201@netmail2.microsoft.com>
Received: by netmail2 using fxenixd 1.0 Mon, 24 Jan 94 20:59:58 PST
X-Msmail-Message-Id:  B3FE8B1F
X-Msmail-Conversation-Id:  B3FE8B1F
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Asmus Freytag <asmusf@microsoft.com>
To: glenn@metis.com, mohta@necom830.cc.titech.ac.jp
Date: Mon, 24 Jan 94 22:07:54 PST
Subject: Re: Response
Cc: ietf-822@dimacs.rutgers.edu, unicored@unicode.org

Let's get this discussion back to where useful information is exchanged and
participants and observers learn something from the exchange of facts.

Maybe someone else can summarize the state of the discussion for all of
us, so we can see what issues are still open, and for which the discussion
as yielded some answers.

Thanks,
A.

PS: Since I find it highly unlikely that Mark Davis (who is the editor 
of 10646) would
        want to go back to DIS-1, perhaps the issue is one of 
confusion? The standard
        is called ISO/IEC 10646-1, where -1 stands for part one, and 
that, indeed is
        the part of 10646 that contains the Base Multilingual Plane and 
should be
        used as the charset. (There are no other parts of 10646 
published at this time).


----------
| From: Masataka Ohta  <netmail!mohta@necom830.cc.titech.ac.jp>
| To: Glenn Adams  <glenn@metis.com>
| Cc:  <ietf-822@dimacs.rutgers.edu>;  <unicored@Unicode.ORG>
| Subject: Re: Response
| Date: Tuesday, January 25, 1994 1:09PM
|
| Received: by netmail using toxenix
| netmail!mohta@necom830.cc.titech.ac.jp Mon, 24 Jan 94 20:29:33
| PST
| Received: from UNICODE.ORG by netmail.microsoft.com with SMTP (5.65/25-eef)
| 	id AA01356; Mon, 24 Jan 94 20:28:48 -0800
| Received: from necom830.cc.titech.ac.jp by Unicode.ORG (NX5.67c/NX3.0M)
| 	id AA24750; Mon, 24 Jan 94 20:01:30 -0800
| Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg);
| Tue, 25 Jan 94 13:09:53 +0859
| Return-Path: <mohta@necom830.cc.titech.ac.jp>
| Message-Id: <9401250410.AA15763@necom830.cc.titech.ac.jp>
| In-Reply-To: <9401241637.AA06999@trubetzkoy.metis.com>; from
| "Glenn Adams" at Jan 24, 94 11:37 am
| X-Mailer: ELM [version 2.3 PL11]
|
| >   From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
| >   Date: Mon, 24 Jan 94 15:42:20 JST
| >
| >   Have we agreed that ISO 10646 needs some profiling to be a MIME charset?
| >
| > No, it does not need profiling to be used by MIME.  If it does, then
| > ASCII does too.
|
| I'm afraid that you don't understand the fact that *no* character
| set specifies the precise details of the graphical expression of the
| text.
|
| Anyway, you should read RFC 1555.
|
| >   10646 here means DIS 10646-1.0 which is much better than the current IS.
| >
| > Excuse me, but what purpose does it serve to keep recalling DIS 1?
|
| Ask Mark Davis who said that the intention of MIME was to identify
| DIS 1 as the charset.
|
| > It is dead and buried.
|
| So bury Mark's longing with it.
|
| >   > (Unicode provides an detailed algorithm for determining presentation
| >   > order of 10646 characters within a line or paragraph--even in the
| >   > absence of presentation format codes.
| >
| >   I have never seen any detailed algorithm in Uncode books.
| >   Where is it described?
| >
| > Well, it may not be your idea of "detailed", but Appendix A in volume I,
| > along with corrections in Appendix D of volume II and erratta in TR #4
| > define an algorithm, which, if you possess only an average amount of
| > cleverness, will allow you to implement the Unicode BIDI algorithm in
| > real code without any problems.  Of course, it would help if you knew
| > something about Arabic or Hebrew too; so perhaps it will be a bit more
| > difficult for you.
|
| Do you want to convince me that some necessary profiling information
| for 10646 is provided by Unicode?
|
| As I already know that I can't display Japanese text with UNICODE without
| some profiling information, you don't have to do so.
|
| 						Masataka Ohta
|
| 


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa06184;
          25 Jan 94 12:47 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa06180;
          25 Jan 94 12:47 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa07411;
          25 Jan 94 12:47 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA04502; Tue, 25 Jan 94 12:06:23 EST
Received: from taligent.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA04467; Tue, 25 Jan 94 12:06:21 EST
Received: from qm.taligent.com by taligent.com with SMTP (5.67/23-Oct-1991-eef)
	id AA27184; Tue, 25 Jan 94 09:01:09 -0800
	for 
Message-Id: <9401251701.AA27184@taligent.com>
Date: 25 Jan 1994 08:53:05 -0800
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Mark Davis <Mark_Davis@taligent.com>
Subject: Re: Response
To: Glenn Adams <glenn@metis.com>, 
    Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Cc: ietf-822@dimacs.rutgers.edu, unicored@unicode.org

Reply to: RE>>Response                                   01/25/94     8:28 AM
>> Mark Davis who said that the intention of MIME was to identify DIS 1 as the
charset

I am afraid that I view the whole direction of this discussion as pointless;
there are many different approaches that one could take to the development of
any standard; there will always be people who are discontented with the
results, whatever they are. I have no desire to endlessly debate what could
have been.

However, I do need to respond when I see my name used. You have misunderstood
me; I have no longings whatsoever for DIS 1. There were a host of working
drafts, national contributions, individual contributions, draft proposals, and
draft international standards in the process of producing IS 10646. They are
now just history; it is the IS that has been approved and is being
implemented.

Mark Davis

P.S. 
>> As I already know that I can't display Japanese text with UNICODE without
>> some profiling information, you don't have to do so.

BTW, Unicode includes the full repertoire, code for code, of JIS 208 and 212,
as well as codes for compatibility with shift-JIS. It is not very productive
to continue without your having  read and understood the Unicode standard (V1,
V2 and TR#4). 

--------------------------------------
Date: 01/24/94 8:33 PM
To: Mark Davis
From: Masataka Ohta
>   From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
>   Date: Mon, 24 Jan 94 15:42:20 JST
> 
>   Have we agreed that ISO 10646 needs some profiling to be a MIME charset?
> 
> No, it does not need profiling to be used by MIME.  If it does, then
> ASCII does too.

I'm afraid that you don't understand the fact that *no* character
set specifies the precise details of the graphical expression of the
text.

Anyway, you should read RFC 1555.

>   10646 here means DIS 10646-1.0 which is much better than the current IS.
> 
> Excuse me, but what purpose does it serve to keep recalling DIS 1?

Ask Mark Davis who said that the intention of MIME was to identify
DIS 1 as the charset.

> It is dead and buried.

So bury Mark's longing with it.

>   > (Unicode provides an detailed algorithm for determining presentation
>   > order of 10646 characters within a line or paragraph--even in the
>   > absence of presentation format codes.
> 
>   I have never seen any detailed algorithm in Uncode books.
>   Where is it described?
> 
> Well, it may not be your idea of "detailed", but Appendix A in volume I,
> along with corrections in Appendix D of volume II and erratta in TR #4
> define an algorithm, which, if you possess only an average amount of
> cleverness, will allow you to implement the Unicode BIDI algorithm in
> real code without any problems.  Of course, it would help if you knew
> something about Arabic or Hebrew too; so perhaps it will be a bit more
> difficult for you.

Do you want to convince me that some necessary profiling information
for 10646 is provided by Unicode?

As I already know that I can't display Japanese text with UNICODE without
some profiling information, you don't have to do so.

						Masataka Ohta

------------------ RFC822 Header Follows ------------------
Received: by qm.taligent.com with SMTP;24 Jan 1994 20:33:03 -0800
Received: from UNICODE.ORG by taligent.com with SMTP (5.67/23-Oct-1991-eef)
	id AA26934; Mon, 24 Jan 94 20:29:49 -0800
	for 
Received: from necom830.cc.titech.ac.jp by Unicode.ORG (NX5.67c/NX3.0M)
	id AA24750; Mon, 24 Jan 94 20:01:30 -0800
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Tue, 25 Jan 94
13:09:53 +0859
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401250410.AA15763@necom830.cc.titech.ac.jp>
Subject: Re: Response
To: glenn@metis.com (Glenn Adams)
Date: Tue, 25 Jan 94 13:09:52 JST
Cc: ietf-822@dimacs.rutgers.edu, unicored@Unicode.ORG
In-Reply-To: <9401241637.AA06999@trubetzkoy.metis.com>; from "Glenn Adams" at
Jan 24, 94 11:37 am
X-Mailer: ELM [version 2.3 PL11]


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa06695;
          25 Jan 94 13:16 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa06691;
          25 Jan 94 13:16 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa08304;
          25 Jan 94 13:16 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA09117; Tue, 25 Jan 94 12:08:57 EST
Received: from SAPIR.METIS.COM by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA08448; Tue, 25 Jan 94 12:08:35 EST
Received: from trubetzkoy.metis.com by sapir.metis.com (4.1/METIS-4.10) id AA06728; Tue, 25 Jan 94 12:07:39 EST
Received: by trubetzkoy.metis.com (NX5.67d/NX3.0M) id AA07275; Tue, 25 Jan 94 12:07:38 -0500
Date: Tue, 25 Jan 94 12:07:38 -0500
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Glenn Adams <glenn@metis.com>
Message-Id: <9401251707.AA07275@trubetzkoy.metis.com>
Received: by NeXT.Mailer (1.100)
Received: by NeXT Mailer (1.100)
To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Subject: Re: Response
Cc: ietf-822@dimacs.rutgers.edu, unicored@unicode.org


  From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
  Date: Tue, 25 Jan 94 13:09:52 JST

  As I already know that I can't display Japanese text with UNICODE without
  some profiling information, you don't have to do so.

I disagree.  You *can* display Japanese text without any profiling information.
You have utterly failed to provide even a shred of evidence for this assertion
you are so fond of repeating.  If I may offer a quote from a more official
Japanese source regarding the use of 10646 (and by extension, of Unicode),
perhaps it will help settle the minds of readers who have been unnecessarily
alarmed by your unfounded statements:

The following are excerpts from the paper "ISO/IEC 10646-1 in Japan" by
Prof. Koji Shibano of Tokyo International University,  who, according to
my understanding, is relating the official position of JISC:

1. Countermeasure by Japan to the 10646

"Strenuous effort[s] have been made in JIS-related works, aiming at the
preparation of the draft of the JIS standards corresponding to ISO/IEC
10646 Universal Multiple-octe[t] Coded Character Set. Japan has been
assuming a negative attitude toward the 10646. However, such an attitude
by Japan is not preferrable when taking into account the efforts for
the internationalization of Japan in other fields, such as the character
code and the consistency with the domestic character code problems.

"Hereafter in Japan, the dissemination of the 10646 is to be positively
promoted and the successive movement is promoted from the existing JIS
code to the JIS of the 10646. At the same time, full efforts should be
made in the expansion and improvement of the 10646 in ISO activities."

2.3 Problems with the present JIS code

[Discussion of various problems of JIS C6226-1978 and JIS X0208-1990.]

"As a result [of these problems], the efforts made by JIS for a long time
could not establish an effective standard, but they promoted confusion."

"The only solution for such situations is believed to be promotion of the
positive movement toward the 10646."

3. CJK Unified Ideograph

"The SC2 Japan Domestic Committee has opposed the CJK unified ideograph for
several reasons, however, it is now considering supporting this CJK unified
ideograph. The evidence of such ideas are derived from the fact that the
reasons agreeable to the CJK unified ideograph can be found, while the reasons
opposing cannot be found. A variety of opposing opinions exist in Japan, but
any one of the are incorrect."

4.2 Problems of unified ideograph

[discussion of minor problems such as lack of classification according to
frequency of use, lack of correlation between frequency and code position that
would facilitate more compact representation in a UTF or other transformation
of UCS, single arrangment order, identification (name) of unified ideographs
by code point, inclusion of source separation characters in main CJK ideo-
graph space rather than in compatibility zone.]

"There are several problems in the unified ideograph [summarized above] but
they never lead to a complete error such as non-unification."

5. Technologgy to use the 10646

"It is believed that the 10646 will occupy the position of the Bible in the
future ISO code systems in place of the conventional Bible [i.e., ISO 646]
as the new number indicates."

-----------------

I haven't included the full text here which does raise certain issues with
regard to unified CJK ideographs and other aspects of 10646. However, none
of these issues require that 10646 be used with profiling information in
order to use it for Japanese information processing.  Furthermore, the
tone of the text which I include above paints a rather stark difference
to the viewpoint you have consistently advocated, a difference which makes
me suspect you aren't a party to the consensus that appears to have formed
in official, and, may I say, better informed, JISC circles.

I would suggest that you are pushing your own personal agenda here rather
than offering factual information about 10646/Unicode.  Your ICODE system,
which you offer as a potential solution to certain issues you perceive to
be problematic, is really nothing more than a rich text system based on top
of 10646.  While I consider that it may be useful as a potential rich
text encoding based on 10646, it is not a viable plain text encoding.

If at any time you wish to provide some real evidence that 10646/Unicode
is not capable of processing Japanese without profiling information or
rich text information, then I would be eager to hear the details.  However,
until then I cannot accept your claims and I would urge others to do the
same.

Regards,
Glenn Adams


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa11465;
          25 Jan 94 16:48 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa11461;
          25 Jan 94 16:48 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa14584;
          25 Jan 94 16:48 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA13861; Tue, 25 Jan 94 16:22:29 EST
Received: from taligent.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA13829; Tue, 25 Jan 94 16:22:27 EST
Received: from qm.taligent.com by taligent.com with SMTP (5.67/23-Oct-1991-eef)
	id AA08325; Tue, 25 Jan 94 13:20:31 -0800
	for 
Message-Id: <9401252120.AA08325@taligent.com>
Date: 25 Jan 1994 13:13:57 -0800
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: David Goldsmith <David_Goldsmith@taligent.com>
Subject: Re: Response
To: Asmus Freytag <asmusf@microsoft.com>
Cc: ietf-822@dimacs.rutgers.edu, unicored@unicode.org

         Reply to:   RE>>Response
(Asmus Freytag writes:)
>Let's get this discussion back to where useful information is exchanged and
>participants and observers learn something from the exchange of facts.

>Maybe someone else can summarize the state of the discussion for all of
>us, so we can see what issues are still open, and for which the discussion
>as yielded some answers.

I will attempt to do so. A couple of people have stated that as defined,
ISO/IEC 10646 cannot be used as a MIME character set in the absence of
external profiling information due to issues revolving around directionality
of text and use of combining marks.

The text issue related to Hebrew and Arabic, and specific reference was made
to RFC 1555 and RFC 1556. In those RFCs, variants of ISO-8859-6 and ISO-8859-8
are defined as multiple MIME charsets. Specifically, there are -i and -e
variants, -i signifying implicit ordering and -e signifying explicit ordering.
The claim was made that (at least) the same needs to be done for 10646.

This is incorrect. My reading of RFCs 1555 and 1556 is that the variants were
defined because control sequences for specifying directionality were added. A
text stream with such sequences added cannot be claimed to be ISO-8859-x,
since that standard exists and does not define those sequences. Since control
codes for specifying implicit and explicit directionality already exist in
ISO/IEC 10646, an amended character set is not necessary, nor is a separate
designation. It is possible, within the context of 10646, to accomodate all
three of the options discussed in RFC 1556: "visual mode" (what Mark Davis
called "logical order"), implicit mode, and explicit mode. Therefore the
single character set designation suffices.

As for combining marks, the issue was the requirement that MIME charsets
unambiguously specify the translation from a byte stream to "glyphs" (the MIME
standard is silent as to what a glyph is). Assuming that 10646 is to be held
to the same standard as every other existing character set standard, including
all those already specified and accepted for use with MIME (ASCII and
ISO-8859-x), then there is no issue, as ISO 10646 specifies exactly as much
about the translation of character codes to screen display (i.e., almost
nothing) as all the other character set standards. No external profiling
information (other than the usual things like having the appropriate fonts
installed) is necessary to display a 10646 message. By external profiling
information, MIME means additional out-of-band information that must accompany
the text in a message. While the display of messages in scripts that use
combining marks is certainly complex, it is algorithmic and does not require
any information to be transmitted with the message beyond the 10646 character
sequence. Therefore I believe that the version of 10646 specified in our draft
document ("Encoding of ISO/IEC 10646-1/Unicode in MIME") meets the
requirements for a MIME charset.

I must say that the purpose of my distributing the drafts of my documents was
to solicit feedback on the best way technically to encode 10646/Unicode within
MIME, not a debate on the merits of 10646. I am pursuing this because of a
practical need by people making commercial use of 10646 and Unicode to have a
means of transmitting it via electronic mail. The 10646 standard exists, is
finished, and is in commercial use. At this point we should be discussing how
to use it, not whether it should exist or is perfect. I have received a few
useful comments, for which I thank the persons involved.

The only pending change in the documents right now is the elimination of some
features of the UTF-7 encoding of 10646 which duplicated aspects of the quoted
printable content transfer encoding of MIME. In retrospect, we decided these
were redundant. I plan to update the documents and take them to the next level
of the standards process [as soon as I figure out what that is :-)]. If anyone
wishes to make comments during this informal review, there is not much time
left. Anyone who does not have copies of the documents in question and wants
to review them should contact me immediately via e-mail, and I can send you
plain text or Postscript versions.

David Goldsmith
david_goldsmith@taligent.com


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa16162;
          25 Jan 94 22:41 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa16158;
          25 Jan 94 22:41 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa20980;
          25 Jan 94 22:41 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA11525; Tue, 25 Jan 94 22:32:41 EST
Received: from necom830.cc.titech.ac.jp by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA11395; Tue, 25 Jan 94 22:32:37 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 26 Jan 94 12:23:43 +0900
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta>
Message-Id: <9401260323.AA20899@necom830.cc.titech.ac.jp>
Subject: Re: Response
To: David Goldsmith <David_Goldsmith@taligent.com>
Date: Wed, 26 Jan 94 12:23:42 JST
Cc: asmusf@microsoft.com, ietf-822@dimacs.rutgers.edu, unicored@unicode.org
In-Reply-To: <9401252120.AA08325@taligent.com>; from "David Goldsmith" at Jan 25, 94 1:13 pm
X-Mailer: ELM [version 2.3 PL11]


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa20365;
          25 Jan 94 23:42 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa20355;
          25 Jan 94 23:42 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa22022;
          25 Jan 94 23:42 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA10538; Tue, 25 Jan 94 22:48:01 EST
Received: from necom830.cc.titech.ac.jp by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA10215; Tue, 25 Jan 94 22:47:50 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Wed, 26 Jan 94 12:42:37 +0900
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401260342.AA21098@necom830.cc.titech.ac.jp>
Subject: Re: Response
To: Glenn Adams <glenn@metis.com>
Date: Wed, 26 Jan 94 12:42:36 JST
Cc: ietf-822@dimacs.rutgers.edu, unicored@unicode.org
In-Reply-To: <9401251707.AA07275@trubetzkoy.metis.com>; from "Glenn Adams" at Jan 25, 94 12:07 pm
X-Mailer: ELM [version 2.3 PL11]

>   As I already know that I can't display Japanese text with UNICODE without
>   some profiling information, you don't have to do so.
> 
> I disagree.  You *can* display Japanese text without any profiling information.

As long as you know the text is Japanese, which is a profiling information.

> You have utterly failed to provide even a shred of evidence for this assertion
> you are so fond of repeating.

You have shown that, this time.

> The following are excerpts from the paper "ISO/IEC 10646-1 in Japan" by
> Prof. Koji Shibano of Tokyo International University,  who, according to
> my understanding, is relating the official position of JISC:

So? Conversion between JIS and 10646 is of course possible, as the
profling information that the content is Japanese Han is given.

> I haven't included the full text here which does raise certain issues with
> regard to unified CJK ideographs and other aspects of 10646. However, none
> of these issues require that 10646 be used with profiling information in
> order to use it for Japanese information processing.

So? If the purpose is profilied to be "Japanese" information processing,
no further profiling is necessary, of course.

> Your ICODE system,

Neither MIME nor ICODE is for "Japanese information processing".

							Masataka Ohta


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa09635;
          26 Jan 94 13:23 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa09631;
          26 Jan 94 13:23 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa14120;
          26 Jan 94 13:23 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA14447; Wed, 26 Jan 94 13:03:13 EST
Received: from SAPIR.METIS.COM by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA13888; Wed, 26 Jan 94 13:02:53 EST
Received: from trubetzkoy.metis.com by sapir.metis.com (4.1/METIS-4.10) id AA07035; Wed, 26 Jan 94 13:01:57 EST
Received: by trubetzkoy.metis.com (NX5.67d/NX3.0M) id AA07607; Wed, 26 Jan 94 13:01:55 -0500
Date: Wed, 26 Jan 94 13:01:55 -0500
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Glenn Adams <glenn@metis.com>
Message-Id: <9401261801.AA07607@trubetzkoy.metis.com>
Received: by NeXT.Mailer (1.100)
Received: by NeXT Mailer (1.100)
To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Subject: Re: Response
Cc: ietf-822@dimacs.rutgers.edu, unicored@unicode.org


  From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
  Date: Wed, 26 Jan 94 12:42:36 JST

  : [Ohta]
  | As I already know that I can't display Japanese text with UNICODE without
  | some profiling information, you don't have to do so.

  : [Adams] 

  | I disagree.  You *can* display Japanese text without any profiling
  | information.

  : [Ohta]
  | As long as you know the text is Japanese, which is a profiling information.

Again, I disagree that you must know it is Japanese, and, therefore, that you
need profiling information to tell you this fact.  I shall prove this below.

Given a 10646/Unicode plain text without any profiling, then display the text
as follows:

1. If the system contains only Japanese fonts (say, e.g., a collection of
fonts which possesses the glyphs needed to display the characters in JIS X0201,
JIS X0208, and JIS X 0212), then for each 10646/Unicode character in the
text, map it to its JIS counterpart yielding the glyph code.  If no mapping is
available, then map it to a substitution glyph (e.g., a GETA MARK or an
empty box, etc.).

2. If the system contains a font covering all unified CJK ideographs in
10646/Unicode (i.e., a font which, for each of the 20,902 unified ideographs,
chooses a representative glyph for each ideograph drawn from any source),
then, for each unified CJK ideograph character contained in the sample text,
display that character with its corresponding glyph from the unified CJK
font.  For other characters having no potential glyph in an available font,
then display those characters with a substitution glyph.

3. If the system contains distinct Chinese, Japanese, and Korean fonts
which cover the respective glyphs contained in their national standards,
say, e.g., BIG5, GB2312, JIS X0208, JIS X 0212, and KS C 5601 fonts, then
perform one of the following:

  a. for a given unified CJK ideograph, determine if that ideograph has
     a mapping to one of the font encodings (BIG5, GB, JIS, KSC, etc.); if
     a mapping exists, then display the glyph corresponding to that mapping.

  b. parse the text being displayed in order to determine sequences of
     substrings which can be strongly associated with a particular writing
     system based on character content; using the results of such parse,
     choose the appropriate national font(s) to display each such sequence.
     such a parser can be easily constructed based on the following
     statistical facts:

     -) sequences of Japanese text will with high probability contain at
        least one Kana character; whereas Chinese and Korean will with
        high probability not contain such a character;

     -) sequences of Korean text will with high probablility contain at
        least one Hangul character; whereas Chinese and Japanese will not;

     -) long sequences containing CJK ideographs, say >20 characters,
        containing neither Kana nor Hangul characters are with high
        probability Chinese

     -) short sequences containing neither Kana nor Hangul characters may
        be resolved by determining whether every character in the sequence
        maps to some character in a particular national standard; in the
        case that every character in the sequence maps to more than one
        national standard, then choose the first standard given some
        prioritization based on locale (e.g., choose Japanese standard if
        Japanese locale)

In each of the above cases, the text is displayable.  In case (1), given
that a single national font collection is available, no other solution is
possible in any case.  In case (2), the character may be displayed with
a national variant which does not match the text (e.g., a Chinese font's
glyph for a Japanese Kanji).  Case (3a) may produce the same results as
case (2).  Case (3b) will produce the best results for either monolingual
or multilingual CJK texts in the absence of language or font bindings
(i.e., profiling or rich text).

The above algorithm thus proves that one *may* display any given
10646/Unicode plain code text in the absence of profiling information.
[It is also useful to note that nearly all existing system fall into
case (1) above, so the issue of how to handle multilingual texts in
these cases is moot.]

The crux of the matter is whether or not such display is deemed to
be acceptable in the case that a wrong font (or glyph) is chosen to
display a given character (e.g., choosing a Japanese font to display
a unified CJK ideograph contained in a Chinese text).

As far as I know, MIME does not specify any criteria for typographic
acceptability.  In the absence of such criteria, it is not possible
to make a negative judgement about correctness or acceptability of
the above algorithm.  The purpose of the algorithm was to display
each character with some glyph and that this algorithm performs this
is plainly evident.  Therefore, in the absence of a criteria for
typographic quality, this algorithm *is* correct and serves the
requirement for a MIME client to display a 10646/Unicode text.

Should MIME decide that it will establish a criteria of typographic
acceptability for displaying character text, then it would have to
describe how an multilingual European text encoded with ISO 8859-1
could, without profiling, "acceptably" display distinct language
sequences with distinct fonts; or, how a multilingual Arabic and
Turkish text encoded with ISO 8859-6 could, without profiling,
"acceptably" display distinct language sequences using, say Naskh
versus Ruq`ah styles of Arabic as appropriate to Arabic vs. Turkish
written language customs.

Since MIME would not be able to accomplish the latter without
invalidating existing MIME usage, then it cannot enforce such
requirements of typographic acceptability on CJK usage.

Now I have presented a detailed argument stating how you *can* display
Chinese, Japanese, and Korean encoded with 10646/Unicode without the
use of profiling information of any kind.  Unless MIME wishes to
invalidate existing practices by retroactively enforcing an as yet
undetermined quality of typographic acceptability (a task which of
itself would prove extremely difficult), then I would suggest that
all discussion of requiring profiling information in order to use
10646/Unicode with MIME should cease.

If you have a detailed counterargument to the above, then please
provide it.

Regards,
Glenn Adams


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa13910;
          26 Jan 94 16:46 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa13906;
          26 Jan 94 16:46 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa21516;
          26 Jan 94 16:46 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA00299; Wed, 26 Jan 94 16:27:03 EST
Received: from WILMA.CS.UTK.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA00236; Wed, 26 Jan 94 16:27:00 EST
Received: from LOCALHOST by wilma.cs.utk.edu with SMTP (8.6.4/2.8c-UTK)
          id QAA14010; Wed, 26 Jan 1994 16:19:30 -0500
Message-Id: <199401262119.QAA14010@wilma.cs.utk.edu>
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Keith Moore <moore@cs.utk.edu>
To: Glenn Adams <glenn@metis.com>
Cc: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>, 
    ietf-822@dimacs.rutgers.edu, unicored@unicode.org, moore@cs.utk.edu
Subject: Re: Response 
In-Reply-To: Your message of "Wed, 26 Jan 1994 13:01:55 EST."
             <9401261801.AA07607@trubetzkoy.metis.com> 
Date: Wed, 26 Jan 1994 16:19:29 -0500
X-Orig-Sender: moore@cs.utk.edu

Okay, I'll bite on this one.

> Again, I disagree that you must know it is Japanese, and, therefore, 
> that you need profiling information to tell you this fact.  I shall 
> prove this below.
> 
> Given a 10646/Unicode plain text without any profiling, then display the 
> text as follows:

[ clever algorithm deleted ]

>      -) short sequences containing neither Kana nor Hangul characters may
>         be resolved by determining whether every character in the sequence
>         maps to some character in a particular national standard; in the
>         case that every character in the sequence maps to more than one
>         national standard, then choose the first standard given some
>         prioritization based on locale (e.g., choose Japanese standard if
>         Japanese locale)

Hmmm.  Use of a locale sounds pretty much like "external profiling" to me.

However, I *do* appreciate your suggestions, because they form the first
concrete proposal I've seen for dealing with Ohta-san's complaints (as I
understand them) in a way that's compatible with MIME charset labelling.

> The crux of the matter is whether or not such display is deemed to
> be acceptable in the case that a wrong font (or glyph) is chosen to
> display a given character (e.g., choosing a Japanese font to display
> a unified CJK ideograph contained in a Chinese text).
> 
> As far as I know, MIME does not specify any criteria for typographic
> acceptability.  In the absence of such criteria, it is not possible
> to make a negative judgement about correctness or acceptability of
> the above algorithm.  The purpose of the algorithm was to display
> each character with some glyph and that this algorithm performs this
> is plainly evident.  Therefore, in the absence of a criteria for
> typographic quality, this algorithm *is* correct and serves the
> requirement for a MIME client to display a 10646/Unicode text.

My recollection is that the "unique mapping of characters to glyphs" prose
was invented to disallow ISO 646 variants where a given code point can mean
one character in one national variant and a completely different character in
another.  In retrospect, the word "Glyphs" was too specific (and not quite
what's intended), but "characters" would not have been precise enough.  The
"no external profiling" prose was intended to disallow lumping all of the ISO
2022 switching sequences into a single MIME charset.  It's impractical to
support the entirety of ISO 2022 (new charsets can always be added), so a
client needs to know which of the charsets are being used before it can
decide whether to display the body part.

It's true that MIME doesn't specify criteria for typographic acceptability,
and for most purposes that's not a problem.  I don't care whether an upper
case A is rendered as a stick figure, or in Courier or Times Roman; my brain
sees that as an "A".  But substituting a Greek alpha would confuse me, even
though the two characters are similar in appearance and have a common
ancestry.

If the precise forms of the characters are important to those who use the
language, the unified ideographs may well be sufficiently different from the
character desired to violate the intent of the "unique mapping" MIME charset
requirement.  In short, I think Ohta-san has a valid point which should not
be dismissed out-of-hand or by claiming that it doesn't exist.

> Should MIME decide that it will establish a criteria of typographic
> acceptability for displaying character text, then it would have to
> describe how an multilingual European text encoded with ISO 8859-1
> could, without profiling, "acceptably" display distinct language
> sequences with distinct fonts; or, how a multilingual Arabic and
> Turkish text encoded with ISO 8859-6 could, without profiling,
> "acceptably" display distinct language sequences using, say Naskh
> versus Ruq`ah styles of Arabic as appropriate to Arabic vs. Turkish
> written language customs.

This is also a good point; the problem is not specific to 10646.

(But if we're being pedantic, does the *definition* of 8859/6 give multiple
possible appearances for some characters?)

------------------------------------------------------------------

I suggest that the question of whether 10646 violates the MIME spec
in minor ways is of secondary importance.

The more important question is: 

     +----------------------------------------------------------+
     | Is the Internet really better off without 10646 in MIME? |
     +----------------------------------------------------------+

THINK REALLY HARD ABOUT THIS BEFORE YOU ANSWER.

It appears that although 10646 is imperfect (some would say sorely lacking),
it's the best technical solution yet devised.  An improvement may be
forthcoming, but it's probably years away, and it would help to have some
real experience with using 10646 to guide the next version.

Registering 10646 as a MIME charset doesn't mean that everyone is going to
use it, or that the other charsets will go away.  Ultimately, people will use
what works best for them, if they have the opportunity to choose.

Registering 10646 in its current state doesn't mean that it won't be improved
later.  If the ideograph unification problem is annoying enough, someone will
devise a solution.

Regardless of what we say here, 10646 is not going to go away.  We don't have
that much clout.  NOT providing for 10646 might be really damaging to MIME,
in comparison to other email systems that allow 10646.  Do we want to take
this risk?

Finally, registration of 10646 as a MIME charset is NOT an endorsement of
10646, or any particular use of 10646.  It just gives a way to label it for
those who do want it.

If we decide that MIME really does want to be able to have 10646, then we may
want to change the "unique mapping" prose for the Full Standard to very
clearly allow it.

------------------------------------------------------------------

It seems to me that the best thing we can do is to make 10646 as good as
possible for MIME, without making it incompatible with other anticipated 
uses of 10646.  Glenn Adams's suggestions as to how 10646 might be
displayed seem to have the right intent -- though others may have better
ideas.

But I'm very tired of seeing the same arguments over and over.  So at this
point I'll strongly suggest that those who insist that we should prevent all
use of 10646 in MIME be silent for the time being.

Meanwhile, those who favor 10646 in MIME should continue making their
proposal as good as they can, keeping in mind the expressed concerns of those
who have problems with it.  

When the proposal is finalized those who don't like it can address their
concerns to the specifics of that proposal.

Now, as to rules:

Simple registration of a MIME charset doesn't require the consensus of any
working group.  For better or worse, the rules allow a non-standards-track
proposal to be submitted for publication as an RFC EVEN IF SOME PEOPLE HAVE
OBJECTIONS.

If the final 10646-in-MIME proposal as submitted for publication still isn't
acceptable to some people, they are better off taking their complaints to the
RFC editor, than trying to get agreement here that 10646 is a bad idea.

Regards,

Keith Moore


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa15146;
          26 Jan 94 17:54 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa15142;
          26 Jan 94 17:54 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa23020;
          26 Jan 94 17:54 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA00878; Wed, 26 Jan 94 17:30:48 EST
Received: from SAPIR.METIS.COM by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA00568; Wed, 26 Jan 94 17:30:35 EST
Received: from trubetzkoy.metis.com by sapir.metis.com (4.1/METIS-4.10) id AA07108; Wed, 26 Jan 94 17:29:09 EST
Received: by trubetzkoy.metis.com (NX5.67d/NX3.0M) id AA07669; Wed, 26 Jan 94 17:29:06 -0500
Date: Wed, 26 Jan 94 17:29:06 -0500
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Glenn Adams <glenn@metis.com>
Message-Id: <9401262229.AA07669@trubetzkoy.metis.com>
Received: by NeXT.Mailer (1.100)
Received: by NeXT Mailer (1.100)
To: Keith Moore <moore@cs.utk.edu>
Subject: Re: 10646 & MIME [was: Response] 
Cc: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>, 
    ietf-822@dimacs.rutgers.edu, unicored@unicode.org, moore@cs.utk.edu


  From: Keith Moore <moore@cs.utk.edu>
  Date: Wed, 26 Jan 1994 16:19:29 -0500

  >        in the
  >         case that every character in the sequence maps to more than one
  >         national standard, then choose the first standard given some
  >         prioritization based on locale (e.g., choose Japanese standard if
  >         Japanese locale)

  Hmmm.  Use of a locale sounds pretty much like "external profiling" to me.

This rule is actually unnecessary, but serves to produce the best results
given its heuristic flavor.  Without it one could say choose an arbitrary
standard which maps the sequence.  The result of this is that for such a
short sequence, the result is equivalent to the result obtained by case (3a).

  If the precise forms of the characters are important to those who use the
  language, the unified ideographs may well be sufficiently different from the
  character desired to violate the intent of the "unique mapping" MIME charset
  requirement.  In short, I think Ohta-san has a valid point which should not
  be dismissed out-of-hand or by claiming that it doesn't exist.

I'm not dismissing the fact that differences exist between national
conventions regarding the way that certain ideographs are depicted.  This
is an accepted fact and it is handled in the context of the ISO Ideographic
Rapporteur Group as a case of Z variation (known as font variants, a situation
identical to the distinction between a Helvetica 'a' and a Times Roman 'a').

What I do not accept from Ohta-san is his unsubstantiated assertion that
displaying an ideograph with a one variant glyph versus another variant
glyph is valid cause for rejecting one over the other.  I will allow this
only in the case that a typographic criteria of acceptability is used to
derive such a judgement; however, in the absence of such a criteria, and
in the absence of any evidence that somehow the "wrong choice" would make the
display illegible (a statement for which I can supply considerable counter
evidence), the choice of glyphs is entirely arbitrary and any choice is
acceptable according to the criteria of legibility.  I have repeatedly asked
Ohta-san for evidence for his position, but, as his own peers have concluded
in the official position of Japan on 10646:

  "the reasons agreeable to the CJK unified ideograph can be found, while
   the reasons opposing cannot be found. A variety of opposing opinions
   exist in Japan, but any one of them are incorrect."

Ohta-san, sorry to say, is in the group expressing the "incorrect".

  > Should MIME decide that it will establish a criteria of typographic
  > acceptability for displaying character text, then it would have to
  > describe how an multilingual European text encoded with ISO 8859-1
  > could, without profiling, "acceptably" display distinct language
  > sequences with distinct fonts; or, how a multilingual Arabic and
  > Turkish text encoded with ISO 8859-6 could, without profiling,
  > "acceptably" display distinct language sequences using, say Naskh
  > versus Ruq`ah styles of Arabic as appropriate to Arabic vs. Turkish
  > written language customs.

  This is also a good point; the problem is not specific to 10646.
  (But if we're being pedantic, does the *definition* of 8859/6 give multiple
  possible appearances for some characters?)

This is an interesting question and one I hoped someone would ask.  As
you know, 8859/6 is an encoding of Arabic graphemes which requires that
the display system will choose among any number of distinct glyphs, all
present in the same font or in multiple fonts, for depicting a given
Arabic letter according to its context.  It is immaterial whether the
standard gives multiple appearances (in the standard); for it *requires*
that multiple appearances be employed to produce a minimally legible
display, a process, which in this case, can be done deterministically
(i.e., by a DFSM with some small amount of state) based on a contextual
analysis of the character content.  [By the way, depending on the style
of Arabic script supported by the font and display subsystem, the number
of glyphs required to depict a single character may be as many as 30-50.
The latter would be the case for certain complex Arabic styles such as
Ruq`ah and Nastaliq.]

  It seems to me that the best thing we can do is to make 10646 as good as
  possible for MIME, without making it incompatible with other anticipated 

  uses of 10646.  Glenn Adams's suggestions as to how 10646 might be
  displayed seem to have the right intent -- though others may have better
  ideas.

Regarding "mak[ing] 10646 as good as possible", at this point, 10646 is
published and will not be changed until the first addendum comes along.
The first addendum will not substantially change any thing already there,
though it may augment it.  [I am aware of at least one change being suggested
which removes a restriction; namely, the restriction on direct encoding of
C1 controls.]  What can be done is to further articulate various implementation
information so that persons implementing 10646 and/or Unicode systems can do
so more effectively and compatibly.  [As the editor of the newsletter of
the Unicode Consortium, Encoding, I can say that this latter goal is one of
my highest priorities.]

As you say, 10646/Unicode *will* be used whether MIME goes forward or not,
and it would be a shame if the developers of MIME and other Internet
facilities take an unnecessarily restrictive (and may I add, unwarranted)
stance toward it.  It may interest you to note that when the ISO SC22
(Programming Language) subgroups held an adhoc meeting to attempt to under-
stand how 10646 would affect their future, that they agreed to the
recommendations found below, which, I believe it would also behoove MIME
developers and other Internet developers to consider.

[FYI, a full length article on 10646/Unicode which discusses many of the
above issues in more detail, including Ideograph Unification, may be found
in the first issue of the new ACM Journal, StandardView, issued Sep 93.]

Regards,
Glenn Adams

Excerpt of Report from SC22 Adhoc on Character Sets, held in Copenhagen
from 21-23 April 1993.

  Short Term Recommendations (1-3 years)

  1. That support be provided for ISO/IEC 10646 where the unit of processing
  is exactly one coded charaacter - as defined in 10646. "Unit of processing"
  is the smallest unit a programming language or operating system can process
  for ISO/IEC 10646 coded character data.

  This means:

  - all coded characterrs in ISO/IEC 10646 level 3 are available for use
    by applications

  - this minimum level does not require the interpretation of composite
    sequences as logical processing units.

  2. That SC22 address interlanguage communication of ISO/IEC 10646 coded
  data.

  3. That FSS-UTF be registered within ISO 2375 (ECMA).

  Long Term Recommendations

  1. That programming languages and supporting environments provide support
  for composite sequences and CC-data-elements of ISO/IEC 10646 as logical
  processing units.  Considerations must be given to the relation between
  logical processing units and natural language and orthography, and as such
  may require a mechanism for their identification.

  2. That WG15 or WG20 address the need for announcement mechanisms for the
  different encodings, levels and subrepertoires of ISO/IEC 10646 (see
  section 17.1 of ISO/IEC 10646, second sentence).  The same mechanisms
  may also be used to announce other coded character sets.


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa15363;
          26 Jan 94 18:17 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa15359;
          26 Jan 94 18:17 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa23420;
          26 Jan 94 18:17 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA15226; Wed, 26 Jan 94 17:41:58 EST
Received: from taligent.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA15200; Wed, 26 Jan 94 17:41:56 EST
Received: from david-goldsmith.taligent.com by taligent.com with SMTP (5.67/23-Oct-1991-eef)
	id AA21327; Wed, 26 Jan 94 14:39:24 -0800
	for 
Message-Id: <9401262239.AA21327@taligent.com>
X-Sender: dgold@banpeikun-rx.taligent.com
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Wed, 26 Jan 1994 14:39:26 -0800
To: Keith Moore <moore@cs.utk.edu>
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: David Goldsmith <David_Goldsmith@taligent.com>
Subject: Re: Response
Cc: ietf-822@dimacs.rutgers.edu, unicored@unicode.org

>Hmmm.  Use of a locale sounds pretty much like "external profiling" to me.
>
I thought "external profiling" meant information that had to accompany the
message, but perhaps I was misinterpreting. Even assuming a locale is
"external profiling", this still only affects the latter stages of Glenn's
algorithm, which are aimed at achieving optimal as opposed to basic
legibility.

>It's true that MIME doesn't specify criteria for typographic acceptability,
>and for most purposes that's not a problem.  I don't care whether an upper
>case A is rendered as a stick figure, or in Courier or Times Roman; my brain
>sees that as an "A".  But substituting a Greek alpha would confuse me, even
>though the two characters are similar in appearance and have a common
>ancestry.
>
>If the precise forms of the characters are important to those who use the
>language, the unified ideographs may well be sufficiently different from the
>character desired to violate the intent of the "unique mapping" MIME charset
>requirement.  In short, I think Ohta-san has a valid point which should not
>be dismissed out-of-hand or by claiming that it doesn't exist.

This is indeed an important point. However, the Han unification in
10646/Unicode, accomplished under the aegis of the CJK-JRG
(China/Japan/Korea Joint Research Group, which was composed of nationals
from China, Japan, and Korea, all of whom were members of national
standards bodies of their countries), followed the principle that
characters were unified only under certain conditions, layed out in more
detail in "The Unicode Standard, Version 1.0, Volume 2", which itself took
the discussion from CJK-JRG Document 3-28, "Explanatory Notes for the
Unified Ideographic CJK Characters Repertoire and Ordering, Version 1.0".

There are many requirements that had to be met before two ideograms would
be unified, but one was definitely that the two characters have the same
"abstract shape", meaning that the component structure, number of
components, relative position of components in each complete character,
structure of a corresponding component, treatment in the source character
set, and radical contained in the component, all had to be identical or the
characters were not unified. Other rules which prevented unification
include characters having similar shapes that were unrelated by historical
derivation, and characters that are distinct within one of the source
national character sets.

The upshot of this is that legibility of such text does not require that a
font be used which corresponds to the language the text is written in, as
Glenn points out in the first step of his algorithm. Chinese text displayed
in a Japanese font, or vice versa, should be legible and comprehensible to
readers of those languages, even if the typographic quality is not optimal.
This is in keeping with the Unicode principle of minimal legibility. If
someone claims that there is a unified Han character which would not be
comprehensible and legible to a reader if displayed using the wrong font
(assuming the font does in fact have a glyph for that character, of
course), or that there is a string of text in Chinese, Japanese, or Korean
which would not be legible (given that the font contains glyphs for all the
characters in the text), would they please provide a concrete example
rather than speaking in generalities.

As an aside, I am told by people who are thoroughly familiar with and
literate in both Chinese and Japanese that the variations one finds in
characters among Japanese fonts is often greater than the typographic
variations between Japanese and Chinese fonts (for example).

>(lots of good advice deleted)

>Meanwhile, those who favor 10646 in MIME should continue making their
>proposal as good as they can, keeping in mind the expressed concerns of those
>who have problems with it.
>
>When the proposal is finalized those who don't like it can address their
>concerns to the specifics of that proposal.

That is why I posted the proposals to these mailing lists in the first
place. Unfortunately, I've received a lot of criticism of 10646 and
Unicode, but almost no comment on the proposals. I must therefore assume
they were perfect as distributed :-) :-).

Seriously, I would very much like to receive comments on the specifics of
the proposals that I posted. If anyone missed the documents when they were
originally posted, please e-mail me and I will be happy to send you
Postscript and/or plain text versions. The only technical (as opposed to
editorial) change we plan to make is the elimination of
"content-transfer-encoding-like" features of the UTF-7 mail-safe variant,
specifically the rules lifted from quoted-printable about line breaks,
white space, and so on.

----------------------------
David Goldsmith
david_goldsmith@taligent.com
Taligent, Inc.
10201 N. DeAnza Blvd.
Cupertino, CA  95014-2233


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa15498;
          26 Jan 94 18:44 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa15494;
          26 Jan 94 18:44 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa23770;
          26 Jan 94 18:44 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA26099; Wed, 26 Jan 94 18:11:11 EST
Received: from WILMA.CS.UTK.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA26075; Wed, 26 Jan 94 18:11:10 EST
Received: from LOCALHOST by wilma.cs.utk.edu with SMTP (8.6.4/2.8c-UTK)
          id SAA14119; Wed, 26 Jan 1994 18:03:50 -0500
Message-Id: <199401262303.SAA14119@wilma.cs.utk.edu>
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Keith Moore <moore@cs.utk.edu>
To: Glenn Adams <glenn@metis.com>
Cc: Keith Moore <moore@cs.utk.edu>, 
    Masataka Ohta <mohta@necom830.cc.titech.ac.jp>, 
    ietf-822@dimacs.rutgers.edu, unicored@unicode.org
Subject: Re: 10646 & MIME [was: Response] 
In-Reply-To: Your message of "Wed, 26 Jan 1994 17:29:06 EST."
             <9401262229.AA07669@trubetzkoy.metis.com> 
Date: Wed, 26 Jan 1994 18:03:49 -0500
X-Orig-Sender: moore@cs.utk.edu

>   If the precise forms of the characters are important to those who use the
>   language, the unified ideographs may well be sufficiently different from 
>   the character desired to violate the intent of the "unique mapping" 
>   MIME charset requirement.  In short, I think Ohta-san has a valid 
>   point which should not be dismissed out-of-hand or by claiming 
>   that it doesn't exist.
> 
> I'm not dismissing the fact that differences exist between national
> conventions regarding the way that certain ideographs are depicted.  

Yes, but what you would call "differences between the way certain ideographs
are depicted", someone else might call "completely different characters".  We
will never decide which opinion is "right", and we don't need to.  Attempting
to do so is a waste of time.


>   It seems to me that the best thing we can do is to make 10646 as good as
>   possible for MIME, without making it incompatible with other anticipated 
>   uses of 10646.  Glenn Adams's suggestions as to how 10646 might be
>   displayed seem to have the right intent -- though others may have better
>   ideas.
> 
> Regarding "mak[ing] 10646 as good as possible", at this point, 10646 is
> published and will not be changed until the first addendum comes along.

To clarify: when I said "make 10646 as good as possible for MIME", I meant
"make the document that defines use of 10646 in MIME as good as possible at
adapting 10646 both to the constraints of MIME, and to the needs of the
Internet community that uses it."  Sorry if I was being unclear.


Regards,

Keith Moore


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa20902;
          26 Jan 94 23:14 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa20898;
          26 Jan 94 23:14 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa28826;
          26 Jan 94 23:14 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA25168; Wed, 26 Jan 94 20:53:17 EST
Received: from taligent.com by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA25131; Wed, 26 Jan 94 20:53:15 EST
Received: from lee-collins.taligent.com by taligent.com with SMTP (5.67/23-Oct-1991-eef)
	id AA29333; Wed, 26 Jan 94 17:50:45 -0800
	for 
Message-Id: <9401270150.AA29333@taligent.com>
Date: Wed, 26 Jan 1994 17:47:20 -0800
To: ietf-822@dimacs.rutgers.edu, unicored@unicode.org
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Lee Collins <lee_collins@taligent.com>
X-Sender: lcollins@blowfish.taligent.com
Subject: Re: Response

To clarify David's point about legibility of the unified Han, one of the
heuristics used in unification was that the maximum range of glyph
variation allowed in the unification of characters from different standards
should be no greater than the maximum variation acceptable within any one
standard. For example, JIS standards X 0208 and X 0212 both list the range
range of glyph variation deemed acceptable for unification in a single code
point. This can be found in section 3.6 of the explanatory material.The
discussion shows that JIS itself unifies "minor variations" (wazukana
tigai). Chart 6 on page 74 of JIS X 0208-1990 illustrates these variations.


If you compare the unified Han in 10646 with the JIS chart, you will find
that the results of Han unification fall within the tolerance of the
"wazukana tigai" discussed in JIS. This is not surprising since virtually
the same set of distinctions exist and are recognized as "minor" by some
group of people in each of the CJK countries. If anything, the requirement
for a round trip mapping caused the CJK-JRG to err on the side of
disunification.

The problem, then, is not whether an individual accepts the
CJK-unifications in Unicode and 10646 as identical characters, but whether
he or she accepts the unifications implemented by the national standard
bodies in existing standards in his or her respective country. There are
Japanese who agree with JIS on the "wazukana tigai" and Japanese who don't.
Those who agree accept Han unification, the others don't. 

I see little point for this group to continue to debate whether 10646
satisfies a requirement that no other encoding, including JIS, even
attempts or claims to satisfy.


Lee


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa02817;
          27 Jan 94 8:41 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa02812;
          27 Jan 94 8:41 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa05642;
          27 Jan 94 8:41 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA13713; Thu, 27 Jan 94 08:09:33 EST
Received: from ivory.educom.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA13676; Thu, 27 Jan 94 08:09:31 EST
Received: from [192.52.179.3] ([192.52.179.3]) by ivory.educom.edu with SMTP id <84434(1)>; Thu, 27 Jan 1994 13:09:22 -0000
Date: 	Thu, 27 Jan 1994 13:09:16 -0000
To: ietf-822@dimacs.rutgers.edu
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Jim Conklin <conklin@ivory.educom.edu>
X-Sender: conklin@educom.edu
Subject: Re: 10646 & MIME [was: Response]
Cc: Keith Moore <moore@cs.utk.edu>
Message-Id: <94Jan27.130922gmt.84434(1)@ivory.educom.edu>

  I suspect it's time to put aside our differences and move ahead, as Keith
suggests, to

>"make the document that defines use of 10646 in MIME as good as possible at
>adapting 10646 both to the constraints of MIME, and to the needs of the
>Internet community that uses it." 

realizing that this does not close other options, but does open the option
of using 10646 with MIME where is it deemed appropriate by the users and
developers.

Jim


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa03831;
          27 Jan 94 9:13 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa03827;
          27 Jan 94 9:13 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa06577;
          27 Jan 94 9:13 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA07563; Thu, 27 Jan 94 08:46:05 EST
Received: from domen.uninett.no by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA07517; Thu, 27 Jan 94 08:46:03 EST
Message-Id: <9401271346.AA07517@dimacs.rutgers.edu>
Received: from localhost by domen.uninett.no with SMTP (PP) 
          id <19089-0@domen.uninett.no>; Thu, 27 Jan 1994 14:45:11 +0100
To: Keith Moore <moore@cs.utk.edu>
Cc: Glenn Adams <glenn@metis.com>, 
    Masataka Ohta <mohta@necom830.cc.titech.ac.jp>, 
    ietf-822@dimacs.rutgers.edu, unicored@unicode.org
Subject: Re: 10646 & MIME [was: Response]
In-Reply-To: Your message of "Wed, 26 Jan 1994 18:03:49 EST." <199401262303.SAA14119@wilma.cs.utk.edu>
Date: Thu, 27 Jan 1994 14:45:09 +0100
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: "Harald T. Alvestrand" <Harald.T.Alvestrand@uninett.no>

As far as I can see, UNICODE requires one piece of profiling
information: Its encoding method.

According to Otha-san, it would require one more sentence, saying
that "the algorithm described by ... and ... in <the Unicode book>
is followed to determine the context in which to render the text.
This may not always give the result intended by the sender when
the recipient and sender do not have the same context"

No more than this needs to be said, I think. And I think no more *can*
be said, until the standards are changed.

But why are we taking this to the 822 list, and not to -charsets?

                         Harald A


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa14163;
          27 Jan 94 18:54 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa14157;
          27 Jan 94 18:54 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa21105;
          27 Jan 94 18:54 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA15150; Thu, 27 Jan 94 18:18:16 EST
Received: from necom830.cc.titech.ac.jp by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA15144; Thu, 27 Jan 94 18:18:14 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Fri, 28 Jan 94 08:13:20 +0900
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401272313.AA27199@necom830.cc.titech.ac.jp>
Subject: Re: 10646 & MIME [was: Response]
To: Jim Conklin <conklin@ivory.educom.edu>
Date: Fri, 28 Jan 94 8:13:19 JST
Cc: ietf-822@dimacs.rutgers.edu, moore@cs.utk.edu
In-Reply-To: <94Jan27.130922gmt.84434(1)@ivory.educom.edu>; from "Jim Conklin" at Jan 27, 94 1:09 pm
X-Mailer: ELM [version 2.3 PL11]

> >"make the document that defines use of 10646 in MIME as good as possible at
> >adapting 10646 both to the constraints of MIME, and to the needs of the
> >Internet community that uses it." 
> 
> realizing that this does not close other options, but does open the option
> of using 10646 with MIME where is it deemed appropriate by the users and
> developers.

Then, should we profile 10646 as

	ISO-10646-utf7-c-j-k

according to the locale based displaying algorithm Glenn suggests (save
his brain dead suggestion of Heuristics) and Microsoft actually uses?

He mentions ordering of coutries so that if Chinese font is availabble
it is used, and if not and if Japanese font is avaalale it is used
and finaly, if Korean font only is avaialble, it is used.

Then, if Korean text is displayed in such an envronment, most of Korean
Hans are represented in Chinese Han and most of the rest of Hans are
represented by Japanese Han. Very few, if any, will be dsplayed as
Korean Han.

I think it absurd. It's much better just say the text is in Korean from
the begnning.

It's like displaying English with its uppercase letters in Cylliric
and lowercase letters in Greek and only those letters which has no
counter parts in Cylliric or Greek in Latin.

							Masataka Ohta


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa15104;
          27 Jan 94 21:16 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa15100;
          27 Jan 94 21:16 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa23218;
          27 Jan 94 21:12 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA16006; Thu, 27 Jan 94 20:43:26 EST
Received: from WILMA.CS.UTK.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA16002; Thu, 27 Jan 94 20:43:25 EST
Received: from LOCALHOST by wilma.cs.utk.edu with SMTP (8.6.4/2.8c-UTK)
          id UAA17906; Thu, 27 Jan 1994 20:36:14 -0500
Message-Id: <199401280136.UAA17906@wilma.cs.utk.edu>
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Keith Moore <moore@cs.utk.edu>
To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Cc: Jim Conklin <conklin@ivory.educom.edu>, ietf-822@dimacs.rutgers.edu, 
    moore@cs.utk.edu
Subject: Re: 10646 & MIME [was: Response] 
In-Reply-To: Your message of "Fri, 28 Jan 1994 08:13:19 +0200."
             <9401272313.AA27199@necom830.cc.titech.ac.jp> 
Date: Thu, 27 Jan 1994 20:36:13 -0500
X-Orig-Sender: moore@cs.utk.edu

> Then, should we profile 10646 as
> 
> 	ISO-10646-utf7-c-j-k

Whatever.

The algorithm for choosing the glyph to be displayed is too elaborate to be
coded into a MIME charset name.  It shouldn't be called simply ISO-10646,
but any unique suffix should work.   Maybe ISO-10646-RFC-XXXX, where XXXX is
the RFC number.  Then if someone comes up with a better algorithm, he/she
can publish it with a different charset name.

As for your other comments:

Most of the people on this list don't know enough to judge whether your
complaints have merit.  Besides yourself, those who do claim to have the
expertise seem to disagree with you.  The rest of us simply aren't in a
position to decide which group is right.


This conversation can serve no further purpose.  Goodbye.


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa24675;
          28 Jan 94 1:17 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa24671;
          28 Jan 94 1:17 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa27283;
          28 Jan 94 1:17 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA21430; Fri, 28 Jan 94 00:53:26 EST
Received: from etlpost.etl.go.jp by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA21425; Fri, 28 Jan 94 00:52:58 EST
Received: from etlpom.etl.go.jp by etlpost.etl.go.jp (5.67+1.6W/2.7W)
	id AA09049; Fri, 28 Jan 94 14:52:39 JST
Received: by etlpom.etl.go.jp (4.1/6.4J.6-ETLpom.MASTER)
	id AA17719; Fri, 28 Jan 94 14:52:38 JST
Received: by etlken.etl.go.jp (4.1/6.4J.6-ETL.SLAVE)
	id AA25211; Fri, 28 Jan 94 14:52:30 JST
Date: Fri, 28 Jan 94 14:52:30 JST
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Kenichi Handa <handa@etlken.etl.go.jp>
Return-Path: <handa@etlken.etl.go.jp>
Message-Id: <9401280552.AA25211@etlken.etl.go.jp>
To: ietf-822@dimacs.rutgers.edu
In-Reply-To: Keith Moore's message of Thu, 27 Jan 1994 20:36:13 -0500 <199401280136.UAA17906@wilma.cs.utk.edu>
Subject: Re: 10646 & MIME [was: Response] 

On Thu, 27 Jan 1994 20:36:13 -0500, Keith Moore <moore@cs.utk.edu> said:
> Most of the people on this list don't know enough to judge whether your
> complaints have merit.  Besides yourself, those who do claim to have the
> expertise seem to disagree with you.  The rest of us simply aren't in a
> position to decide which group is right.

Since I'm now too busy to contribute to this discussion,
sorry for this short statement.

I do agree with Ohata's opinion about ISO10646/Unicode.
From my experience of writing an editor Mule (MULtilingual
Enhancement to GNU Emacs), ISO10646/Unicode without any
language tag is USELESS in multilingual environment.

---
Ken'ichi HANDA
handa@etl.go.jp


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa00809;
          28 Jan 94 3:40 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa00805;
          28 Jan 94 3:40 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa01742;
          28 Jan 94 3:40 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA22575; Fri, 28 Jan 94 03:13:29 EST
Received: from ics.uci.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA22570; Fri, 28 Jan 94 03:13:21 EST
Received: from nma.com by q2.ics.uci.edu id aa20930; 28 Jan 94 0:13 PST
Received: from localhost by odin.nma.com id aa01750; 28 Jan 94 0:09 PST
To: ietf-822@dimacs.rutgers.edu
Subject: Re: 10646 & MIME [was: Response] 
In-Reply-To: Your message of "Thu, 27 Jan 1994 20:36:13 EST."
             <199401280136.UAA17906@wilma.cs.utk.edu> 
Reply-To: Stef=mime@nma.com
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Einar Stefferud <Stef=mime@nma.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Fri, 28 Jan 1994 00:09:40 -0800
Message-Id: <1748.759744580@odin.nma.com>
X-Orig-Sender: stef@nma.com

Hello Keith, et al ... I am one of those unable to contribute to a
proper technical decision, but it seems to me that Ohta-san has some
important logic on his side when we consider how the proposed
heuristics will work to produce, for example, Korean in a mixture of
Chinese, Japanese and Korean, with Korean as the last resort...

But, on the other hand, we have no reason to tell people that because
of this problematic performance of the proposed 10646 Heuristics,
10646 must not be used for anything by anyone, as you noted so well a
few messages back.

The question I have then is "How can we specify the use of 10646 in
MIME for those who find it has value for them"?" and " How can we
avoid promoting inappropriate use where it is not well suited to its
users?"  

What are the MIME Charset options for this situation.  It seems to me
that among otehr things, it is important for the MIME Charset=
parameter to provide the required "locale profile identifier".  Does
this mean that their should be separate MIME Charset values for Japan,
China, Korea, and the rest of the world?  Each would supply the
"locale" profile identifier, which should be specified in an RFC.

I jhope this idea makes some sense, and will help to disengage the
protagonists and enble them to each go off to write their own RFCs.

Cheers...\Stef


From your message Thu, 27 Jan 1994 20:36:13 -0500:
}
}> Then, should we profile 10646 as
}> 
}> 	ISO-10646-utf7-c-j-k
}
}Whatever.
}
}The algorithm for choosing the glyph to be displayed is too elaborate to be
}coded into a MIME charset name.  It shouldn't be called simply ISO-10646,
}but any unique suffix should work.   Maybe ISO-10646-RFC-XXXX, where XXXX is
}the RFC number.  Then if someone comes up with a better algorithm, he/she
}can publish it with a different charset name.
}
}As for your other comments:
}
}Most of the people on this list don't know enough to judge whether your
}complaints have merit.  Besides yourself, those who do claim to have the
}expertise seem to disagree with you.  The rest of us simply aren't in a
}position to decide which group is right.


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa00731;
          29 Jan 94 3:18 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa00727;
          29 Jan 94 3:18 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa01635;
          29 Jan 94 3:17 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA10991; Sat, 29 Jan 94 02:56:16 EST
Received: from necom830.cc.titech.ac.jp by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA10987; Sat, 29 Jan 94 02:56:13 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Sat, 29 Jan 94 16:51:10 +0900
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401290751.AA02362@necom830.cc.titech.ac.jp>
Subject: Re: 10646 & MIME [was: Response]
To: Keith Moore <moore@cs.utk.edu>
Date: Sat, 29 Jan 94 16:51:09 JST
Cc: conklin@ivory.educom.edu, ietf-822@dimacs.rutgers.edu, moore@cs.utk.edu
In-Reply-To: <199401280136.UAA17906@wilma.cs.utk.edu>; from "Keith Moore" at Jan 27, 94 8:36 pm
X-Mailer: ELM [version 2.3 PL11]

> As for your other comments:
> 
> Most of the people on this list don't know enough to judge whether your
> complaints have merit.

No, of course.

> Besides yourself, those who do claim to have the
> expertise seem to disagree with you.  The rest of us simply aren't in a
> position to decide which group is right.

If you seriously want to know how Japanese should be processed on the
Internet, ask your question in Japanese language to people in Japan.

Not all the people in the world can use English.

"fj.kanji" is the appropriate news group.

As you might expect, there have been several discussions about UNICODE
in the news group. The summary is that most Japanese (including me)
denied UNICODE while few people (1 or 2) said they can use it with
profiling.

						Masataka Ohta


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa06645;
          29 Jan 94 22:20 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa06641;
          29 Jan 94 22:20 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa14955;
          29 Jan 94 22:20 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA24294; Sat, 29 Jan 94 21:41:13 EST
Received: from Mordor.Stanford.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA24290; Sat, 29 Jan 94 21:41:11 EST
Received: from localhost by Mordor.Stanford.EDU (8.6.4/inc-1.0)
	id SAA29529; Sat, 29 Jan 1994 18:40:54 -0800
Message-Id: <199401300240.SAA29529@Mordor.Stanford.EDU>
To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Cc: Keith Moore <moore@cs.utk.edu>, conklin@ivory.educom.edu, 
    ietf-822@dimacs.rutgers.edu
Subject: Re: 10646 & MIME [was: Response] 
Phone: +1 408 246 8253; fax: +1 408 249 6205
In-Reply-To: Your message of Sat, 29 Jan 94 16:51:09 +0200.
          <9401290751.AA02362@necom830.cc.titech.ac.jp> 
Date: Sat, 29 Jan 94 18:40:54 -0800
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Dave Crocker <dcrocker@mordor.stanford.edu>
X-Mts: smtp

I don't know anything about character sets.  I have no opinion about
the solutions to any of the problems being discussed.  I haven't even
been tracking them.

As usual, however, I have a comment on process:

    ---- Included message:

    If you seriously want to know how Japanese should be processed on the
    Internet, ask your question in Japanese language to people in Japan.
    
    Not all the people in the world can use English.
    
    "fj.kanji" is the appropriate news group.

The IETF does two things to try to ensure that its solutions are
appropriate for the end-user community.  The first is to conduct its
business in an open manner, inviting anyone and everyone to
participate.  To this in, those with an interest and with knowledge
need to assert themselves and come forward while the working group (or
the 'active group') is doing their development.

The second is to make our work available for review, prior to
standardization.  Our assumption is that serious and deep review is
conducted during development, but a final round of review also can be
helpful.

Hence, I suggest two actions, if they have not already been taken:

1.  Formulate specific proposals.  Whoever thinks they know the answer
or answers to this problem need to write them down as an approximate
specification, if they haven't already.  (By "approximate" I mean that
it does not have to be ready for publication, but should describe all
of the essential features of the solution being proposed.) Circulate
those proposals before any and all bodies that are believed to be
appropriate, inside or outside of the IETF.

It is one thing to talk about solutions and another thing to specify
them.  The IETF works with specs.

2.  Circulate an "announcement" of the IETF activity to any and all
groups that are deemed relevant and appropriate, inviting them to
participate in the IETF discussion.

Please note that neither of these suggestions entails conductiing a
discussion on a a non-IETF mailing list.  (I happen to believe that
the current 822 list is entirely inappropriate, at this point, but
I don't care much whether the discussion is on this list, ietf-charsets,
or whatever, as long as it is run along IETF rules and style.)  My
suggestion only is to use the "other" lists for announcing this 
activity.

Dave

d/


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa15934;
          30 Jan 94 6:14 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa15930;
          30 Jan 94 6:14 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa20043;
          30 Jan 94 6:14 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA26415; Sun, 30 Jan 94 05:40:08 EST
Received: from necom830.cc.titech.ac.jp by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA26411; Sun, 30 Jan 94 05:40:05 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Sun, 30 Jan 94 19:35:08 +0900
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401301035.AA04965@necom830.cc.titech.ac.jp>
Subject: Re: 10646 & MIME [was: Response]
To: Dave Crocker <dcrocker@mordor.stanford.edu>
Date: Sun, 30 Jan 94 19:35:07 JST
Cc: moore@cs.utk.edu, conklin@ivory.educom.edu, ietf-822@dimacs.rutgers.edu
In-Reply-To: <199401300240.SAA29529@Mordor.Stanford.EDU>; from "Dave Crocker" at Jan 29, 94 6:40 pm
X-Mailer: ELM [version 2.3 PL11]

> Hence, I suggest two actions, if they have not already been taken:

> 2.  Circulate an "announcement" of the IETF activity to any and all
> groups that are deemed relevant and appropriate, inviting them to
> participate in the IETF discussion.

Then, if, as Keith said:

	Besides yourself, those who do claim to have the
	expertise seem to disagree with you.

the judgement is done by the number of self-certified experts (IMHO, most
of them are not, at least for Japanese) who joind the discussion,
that is, by voting, it is unfair not to conduct the discussion on
Japanese language processing in Japanese language.

For those who are not native speakers of English, it is difficult to
conduct discussion in English, so that you can't expect a lot of
participants. OK?

So, I'll be happy if all of you accept my summary of the discussions
between Japanese experts:

	As you might expect, there have been several discussions about UNICODE
	in the news group. The summary is that most Japanese (including me)
	denied UNICODE while few people (1 or 2) said they can use it with
	profiling.

as a fact.

						Masataka Ohta


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa16054;
          30 Jan 94 6:43 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa16050;
          30 Jan 94 6:43 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa20354;
          30 Jan 94 6:43 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA26486; Sun, 30 Jan 94 06:11:04 EST
Received: from necom830.cc.titech.ac.jp by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA26482; Sun, 30 Jan 94 06:11:01 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Sun, 30 Jan 94 20:06:18 +0900
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401301106.AA05084@necom830.cc.titech.ac.jp>
Subject: Re: Response
To: ietf-822@dimacs.rutgers.edu
Date: Sun, 30 Jan 94 20:06:17 JST
In-Reply-To: <199401262119.QAA14010@wilma.cs.utk.edu>; from "Keith Moore" at Jan 26, 94 4:19 pm
X-Mailer: ELM [version 2.3 PL11]

> I suggest that the question of whether 10646 violates the MIME spec
> in minor ways is of secondary importance.
> 
> The more important question is: 
> 
>      +----------------------------------------------------------+
>      | Is the Internet really better off without 10646 in MIME? |
>      +----------------------------------------------------------+
> 
> THINK REALLY HARD ABOUT THIS BEFORE YOU ANSWER.
> 
> It appears that although 10646 is imperfect (some would say sorely lacking),
> it's the best technical solution yet devised.

Do you have any techncal reason to think 10646 is the best technical
solution? I think there is no.

> Finally, registration of 10646 as a MIME charset is NOT an endorsement of
> 10646,

So, why you can't register it with profiling?

							Masataka Ohta


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa18163;
          30 Jan 94 14:42 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa18159;
          30 Jan 94 14:42 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa25584;
          30 Jan 94 14:42 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA02941; Sun, 30 Jan 94 14:16:35 EST
Received: from WILMA.CS.UTK.EDU by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA02937; Sun, 30 Jan 94 14:16:32 EST
Received: from LOCALHOST by wilma.cs.utk.edu with SMTP (8.6.4/2.8c-UTK)
          id OAA22272; Sun, 30 Jan 1994 14:09:37 -0500
Message-Id: <199401301909.OAA22272@wilma.cs.utk.edu>
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Keith Moore <moore@cs.utk.edu>
To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Cc: ietf-822@dimacs.rutgers.edu, moore@cs.utk.edu
Subject: Re: Response 
In-Reply-To: Your message of "Sun, 30 Jan 1994 20:06:17 +0200."
             <9401301106.AA05084@necom830.cc.titech.ac.jp> 
Date: Sun, 30 Jan 1994 14:09:32 -0500
X-Orig-Sender: moore@cs.utk.edu

> > It appears that although 10646 is imperfect (some would say sorely lacking),
> > it's the best technical solution yet devised.
> 
> Do you have any techncal reason to think 10646 is the best technical
> solution? I think there is no.

I apologize for being imprecsise.  When I said "best technical 
solution yet devised", I meant "for a single worldwide character set".

My reasons for believing that 10646 is the best technical solution
yet devised are: (a) 10646 does appear to have been developed by
a great number of experts from all over the world, and (b) I haven't
seen an alternative that has a similar amount of expertise behind
it.  Have you?
 
> > Finally, registration of 10646 as a MIME charset is NOT an endorsement of
> > 10646,
> 
> So, why you can't register it with profiling?

MIME requires that the character be completely specified by the character set 
name.  This is so mail mail readers don't have to make complicated decisions
about whether they support a particular character set; they can just
compare character set strings.  Given that mail readers have to support
many different kinds of content-types, it still seems like this was
a good engineering decision to minimize the complexity in the mail reader.

For 10646 with profiling to fit within the MIME structure, it would 
have to be a different content-type.  Something like text/iso-10646.
It could then have whatever parameters it wanted to contain the
profiling information.

Keith Moore


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa25404;
          30 Jan 94 23:14 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa25400;
          30 Jan 94 23:14 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa02348;
          30 Jan 94 23:14 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA09860; Sun, 30 Jan 94 22:37:41 EST
Received: from necom830.cc.titech.ac.jp by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA09855; Sun, 30 Jan 94 22:37:33 EST
Received: by necom830.cc.titech.ac.jp (5.65+/necom-mx-rg); Mon, 31 Jan 94 12:32:12 +0900
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Return-Path: <mohta@necom830.cc.titech.ac.jp>
Message-Id: <9401310332.AA06977@necom830.cc.titech.ac.jp>
Subject: Re: Response
To: Keith Moore <moore@cs.utk.edu>
Date: Mon, 31 Jan 94 12:32:11 JST
Cc: ietf-822@dimacs.rutgers.edu, moore@cs.utk.edu
In-Reply-To: <199401301909.OAA22272@wilma.cs.utk.edu>; from "Keith Moore" at Jan 30, 94 2:09 pm
X-Mailer: ELM [version 2.3 PL11]

> > > It appears that although 10646 is imperfect (some would say sorely lacking),
> > > it's the best technical solution yet devised.
> > 
> > Do you have any techncal reason to think 10646 is the best technical
> > solution? I think there is no.
> 
> I apologize for being imprecsise.  When I said "best technical 
> solution yet devised", I meant "for a single worldwide character set".

I'm really bored of such nontechnical commercial hype.

If Unicode were a single worldwide character set, so were ISO 646.
Some Japanese use JIS X 0201, an ASCII-incompatible Japanese variation
of ISO 646, with some difficulty in reading ASCII documents. On my terminal,
ASCII backslash is represented by a Yen sign, a currency symbol, which
is not so irretating than seeing CJK unified Han.

Because of C/J/K unification and other reasons such as that there
is no single world wide policy to treat characters of different
languages, it is merely a single PanEuropean character set.

I do really want any technical reason of your opinion. But I don't
think you have techncal bacground for internationalization.

> My reasons for believing that 10646 is the best technical solution
> yet devised are: (a) 10646 does appear to have been developed by
> a great number of experts from all over the world,

A great number of experts for one or two languages! That's the reason
why Unicode is the worst. Most experts never be experts for
internationalization.

Thus, just as ISO 2022, individual part of Unicode was designed
individually by indivudual experts for individual languages.

The result is that, though, unlike ISO 2022, the encoding space is
shared, it is as useless as ISO 2022 for multilingual internationalized
purpose.

> and (b) I haven't
> seen an alternative that has a similar amount of expertise behind
> it.  Have you?

Yes, of course. For example ISO 2022 is just only as bad as Unicode while
already accepted in a large market. That's why we, Asian Internet
comminuty, has choosed it as the starting point of Internationalization.

ICODE, as I have presented at Amsterdam IETF and JWCC at Taiwan, has
several desirable features for plain text processing. The features are
shared by ASCII and ICODE while UNICODE, because of a lack of the single
policy, lacks all of them.

> > > Finally, registration of 10646 as a MIME charset is NOT an endorsement of
> > > 10646,
> > 
> > So, why you can't register it with profiling?
> 
> MIME requires that the character be completely specified by the character set 
> name.  This is so mail mail readers don't have to make complicated decisions
> about whether they support a particular character set; they can just
> compare character set strings. Given that mail readers have to support
> many different kinds of content-types, it still seems like this was
> a good engineering decision to minimize the complexity in the mail reader.

So, why you can't register it with profiling?

> For 10646 with profiling to fit within the MIME structure, it would 
> have to be a different content-type.  Something like text/iso-10646.
> It could then have whatever parameters it wanted to contain the
> profiling information.

OK. I'm tired of all the no-discussion. I've writtten an Internet Draft
for ISO 10646.

Let's see. You will be surprized how poorly Japanese Unicode is
supported on Windows/NT.

						Masataka Ohta


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa00862;
          31 Jan 94 5:41 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa00858;
          31 Jan 94 5:41 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa07860;
          31 Jan 94 5:41 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA11981; Mon, 31 Jan 94 05:15:03 EST
Received: from ics.uci.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA11977; Mon, 31 Jan 94 05:15:01 EST
Received: from nma.com by q2.ics.uci.edu id ad26501; 31 Jan 94 2:14 PST
Received: from localhost by odin.nma.com id aa01986; 30 Jan 94 23:58 PST
To: Keith Moore <moore@cs.utk.edu>
Cc: ietf-822@dimacs.rutgers.edu
Subject: Re: Response 
In-Reply-To: Your message of "Sun, 30 Jan 1994 14:09:32 EST."
             <199401301909.OAA22272@wilma.cs.utk.edu> 
Reply-To: Stef=mime@nma.com
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Einar Stefferud <Stef=mime@nma.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Sun, 30 Jan 1994 23:58:18 -0800
Message-Id: <1984.760003098@odin.nma.com>
X-Orig-Sender: stef@nma.com

Hello Keith, et al -- Taking Dave Crocker's advice, we need a proposal
for how to establish the appropriate Type/SubTypes and specify exactly
what they mean, including any necessary profiling.

We are told endlessly that Japan, Korea and China need profiling at
the MIME subtype level to distinguish among them, because of the HAN
Unification.  Even I understand the unification and profiling problem
and am totally bored with the alternating monologues conducted here.

I am in deep sympathy with the fact that UNICODE does not do for
Japan, Korea, and China what it does for (pretty much) all others in
the world, but I cannot do anything about that, short of killing off
those who did this dirty deed.  However, there is really no point in
doing this!

It is simpler to let the UNICODERS of the world go their way, and
declare in their RFC which specifies how to include UNICODE in MIME
that the Internet cannot reach consensus for how to label/profile CJK
text, so the specification does not apply to instances of CJK.

Then, if additional profiling is needed, is it seems to me that this
profiling should be specified for each of China, Korea and Japan, with
a different MIME subtype for each, so there will be at least 4.

iso-10646-not-cjk
iso-10646-china
iso-10646-korea
iso-10646-japan

With a separate RFC for each.  Of course, of any of CJK do not wish to
define their profiles in Internet RFC's and register a subtype with
the IANA, then that is their problem, and not ours.

This will let us proceed, by separating the efforts which should
eliminate the arguments about the separate profiles, or at least
confine the arguments to within the separate profile groups.

My login here is that we cannot afford to let Japan veto the use of
ISO-10646(UNICODE) in MIME in the Internet by people who do not care
about the CJK problem.

For the record, I find this episode on ISO "progress" to be
particularly painful and sad, but I also know that it is pointless to
argue with them.  Better to open the flood gates and let the market
deal with them in the long term.

As I see it working out, even the japanese Government will not be able
to force UNICODE to work in Japan by making decrees and mandating its
use.  It is just the GOSIP game all over again.  Some day they will
just have to give up if things are as flawed as the appear.

This is very unfortunate, but we still cannot allow anyone to have a
veto over allowing the use of ISO-10646(UNICODE) in MIME in the
Internet.

Remember, we elect to use "Rough Consensus and Runnng Code!"

It is not required that every specification do what every Internet
user wants it to do.  It is only required that it is proven to work,
and is a consensus product of some group that wants to get some useful
work done, provided it does not harm other groups trying to get other
useful work done.

So, who is going to write the various specs?  Surely not I!..\Stef

From your message Sun, 30 Jan 1994 14:09:32 -0500:
}
}> > It appears that although 10646 is imperfect (some would say sorely lacking
} ),
}> > it's the best technical solution yet devised.
}> 
}> Do you have any techncal reason to think 10646 is the best technical
}> solution? I think there is no.
}
}I apologize for being imprecsise.  When I said "best technical 
}solution yet devised", I meant "for a single worldwide character set".
}
}My reasons for believing that 10646 is the best technical solution
}yet devised are: (a) 10646 does appear to have been developed by
}a great number of experts from all over the world, and (b) I haven't
}seen an alternative that has a similar amount of expertise behind
}it.  Have you?
} 
}> > Finally, registration of 10646 as a MIME charset is NOT an endorsement of
}> > 10646,
}> 
}> So, why you can't register it with profiling?
}
}MIME requires that the character be completely specified by the character set 
}name.  This is so mail mail readers don't have to make complicated decisions
}about whether they support a particular character set; they can just
}compare character set strings.  Given that mail readers have to support
}many different kinds of content-types, it still seems like this was
}a good engineering decision to minimize the complexity in the mail reader.
}
}For 10646 with profiling to fit within the MIME structure, it would 
}have to be a different content-type.  Something like text/iso-10646.
}It could then have whatever parameters it wanted to contain the
}profiling information.
}
}Keith Moore


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa07700;
          31 Jan 94 12:12 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa07695;
          31 Jan 94 12:12 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa07685;
          31 Jan 94 12:12 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA15879; Mon, 31 Jan 94 11:39:43 EST
Received: from ics.uci.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA15874; Mon, 31 Jan 94 11:39:41 EST
Received: from nma.com by q2.ics.uci.edu id aa26339; 31 Jan 94 8:39 PST
Received: from localhost by odin.nma.com id aa02391; 31 Jan 94 8:09 PST
To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Cc: ietf-822@dimacs.rutgers.edu
Subject: Re: Response 
In-Reply-To: Your message of "Mon, 31 Jan 1994 12:32:11 +0200."
             <9401310332.AA06977@necom830.cc.titech.ac.jp> 
Reply-To: Stef=mime@nma.com
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Einar Stefferud <Stef=mime@nma.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Mon, 31 Jan 1994 08:09:10 -0800
Message-Id: <2389.760032550@odin.nma.com>
X-Orig-Sender: stef@nma.com

Ohtasan -- We are also bored (beyond belief) with 

	"I'm really bored of such nontechnical commercial hype."

Where is the technical content of your statement?

Where is your technical contribution of a specification of he requires
Profile information that must be written in an RFC to precisely state
how the MIME 

	Content-type: text/unicode-japan
    or 	Content-type: text/iso-10646-japan

should be handled to render it properly.

Then someone one else should write one for Korea, and another for
China, and another for all the rest of the world.

I do not expect the authors of these to much agree with each other,
though the CJK group might well collaborate.  

Perhaps they could consolidate on a single set of rules, and
differentate with a charset= parameter.

Or maybe all the distinctions should be done with charset=???

But, it is clear to me at this point that the MIME iso-10646/unicode
problem cannot be resolved by unification, since ISO failed to achieve
unification (regardless of how much they might claim that they did).

What we need to do here is clean up the mess as best we can by
packaging it in profiles with separate MIME labels (to put it in MIME)
and get on with our other business.

Best...\Stef


Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa14365;
          31 Jan 94 15:56 EST
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa14361;
          31 Jan 94 15:56 EST
Received: from dimacs.rutgers.edu by CNRI.Reston.VA.US id aa14567;
          31 Jan 94 15:56 EST
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA24437; Mon, 31 Jan 94 15:26:58 EST
Received: from black-ice.cc.vt.edu by dimacs.rutgers.edu (5.59/SMI4.0/RU1.5/3.08) 
	id AA24433; Mon, 31 Jan 94 15:26:57 EST
Received: from localhost (valdis@localhost) by black-ice.cc.vt.edu (8.6.4/8.6.4) id PAA13312; Mon, 31 Jan 1994 15:26:08 -0500
Message-Id: <199401312026.PAA13312@black-ice.cc.vt.edu>
To: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Cc: ietf-822@dimacs.rutgers.edu, unicored@unicode.org
Subject: Re: Response 
In-Reply-To: Your message of "Wed, 26 Jan 1994 12:23:42 EST."
             <9401260323.AA20899@necom830.cc.titech.ac.jp> 
Sender:ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Valdis.Kletnieks@vt.edu
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Mon, 31 Jan 1994 15:26:07 +22306256
X-Orig-Sender: valdis@black-ice.cc.vt.edu

On Wed, 26 Jan 1994 12:23:42 EST, Masataka Ohta said:
> > >Maybe someone else can summarize the state of the discussion for all of
> > >us, so we can see what issues are still open, and for which the discussion
> > >as yielded some answers.
> 
> As most of Uncoders have absolutely no knowledge on directioinality, it
> is impossible.

Masataka:

Somebody else asserted that Unicode/ISO10646 *did* contain the
necessary directionality features.  Now, if I didn't mis-parse something,
this looks like you are saying "since most implementations don't
implement the standard, it's impossible".

What's wrong with this picture?  And assuming that we *do* specify
"the Masataka solution", why are we to expect that *that* will be
implemented any more widely?

				Valdis Kletnieks
				Computer Systems Engineer
				Virginia Tech