(j3.2006) For consideration at 200 - 21 more proposals

Van Snyder Van.Snyder
Wed Nov 28 20:24:50 EST 2012


This is the kind of valuable technical discussion that we need to have
offline between meetings.

All of the processors we use work only with ASCII as the default
character kind, and support no other kinds.  For ASCII, the definition
of UPPER_CASE is that the 26 latin letters in the required Fortran
character set would be converted, and other characters would remain
unchanged.

It's useful to have advice from one who works with other character sets.

This discussion will be useful to convince my colleagues to give up on
this one.

On Thu, 2012-11-29 at 09:40 +0900, Malcolm Cohen wrote:

> >     2.  Caseless INDEX, SCAN, VERIFY;  UPPER_CASE and LOWER_CASE -
> 
> Sorry but asking for so-called "caseless" SCAN and VERIFY is what I would
> call "fundamentally misguided".  Just list the set of characters you want to
> scan/verify for...
> 
> As for UPPER_CASE and LOWER_CASE, which in the paper was described as "at
> least for ASCII", words fail me.   It is easy to write the one that *you*
> want, not so easy to write the one that *someone else* wants (since you
> don't actually know their requirements).  To write one that works "only for
> ASCII" is trivial and does not warrant inclusion in the standard.  (If you
> really give no care to non-US folk it is a one-liner with ACHAR, IACHAR and
> IAND/IOR.)
> 
> What is not trivial is handling the locale appropriately, and C's locale
> machinery is both complicated and not exactly a shining example of elegant
> success.
> 
> To take just one particular point, what is UPPER_CASE supposed to do with a
> character that has no uppercase variant?  Some will want it to turn it into
> the 2 uppercase characters that it represents, others will want to leave it
> as the single lowercase combined character.  "caseless" INDEX has a similar
> conundrum handling such characters.  What if the input data is in
> ISO-Latin-5 instead of ISO-Latin-1?   Confusion abounds.
> 
> The Unicode consortium have an interesting selection of Technical Reports
> and the like on their website concerning issues like case handling and
> collation.
> 
> IMO, these intrinsics that are guaranteed to give the wrong answer for
> ISO-Latin-1 let alone every other character set in the world apart from
> US-ASCII only, are not suitable for inclusion in an international standard.
> 
> Cheers,





More information about the J3 mailing list