Google
 
Webnews.only-4-geeks.com
Interesting places
news.only-4-geeks.com Forum Index » C++Goto page 1, 2  Next

Force 'char' to be implemented as 'unsigned char'?

 
Jump to:  
 
Matthias Kluwe
PostPosted: Sun Aug 17, 2008 3:28 am    Post subject: Force 'char' to be implemented as 'unsigned char'?
       
Hi!

I came across a coding standard manual (http://www.codingstandard.com/
HICPPCM/index.html) and decided to make up my mind about the suggestions
in there.

I quickly found that I don't understand the justifications for many of
the given rules very well. Here's an example (item 2.2):

"Specify in your compiler configuration that plain 'char' is implemented
as 'unsigned char'. Justification: Support 8-bit ASCII for
internationalisation. The size and sign of char is implementation-
defined. If the range of type char corresponds to 7-bit ASCII, and 8-bit
characters are used, unpredictable behaviour may result. Otherwise prefer
to use wchar_t type."

Hmm, I don't feel very well forcing my compiler in an area where the
language does not enforce. Second, I don't see why an '8-bit character'
having a 'negative value' (encoded as a char) could ever be harmful. Do
you know any examples?

Regards,
Matthias


--
[ See LINK for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 

 
Jack Klein
PostPosted: Sun Aug 17, 2008 2:50 pm    Post subject: Re: Force 'char' to be implemented as 'unsigned char'?
       
On Sat, 16 Aug 2008 21:28:31 CST, Matthias Kluwe <mkluwe@gmail.com>
wrote in comp.lang.c++.moderated:

Quote:
Hi!

I came across a coding standard manual (http://www.codingstandard.com/
HICPPCM/index.html) and decided to make up my mind about the suggestions
in there.

I quickly found that I don't understand the justifications for many of
the given rules very well. Here's an example (item 2.2):

"Specify in your compiler configuration that plain 'char' is implemented
as 'unsigned char'. Justification: Support 8-bit ASCII for
internationalisation. The size and sign of char is implementation-
defined. If the range of type char corresponds to 7-bit ASCII, and 8-bit
characters are used, unpredictable behaviour may result. Otherwise prefer
to use wchar_t type."

I would be suspicious of someone pontificating on coding standards who
can't even be bothered to get either terminology, or logic, right.
Here are just a few examples.

"8-bit ASCII" -- no such thing. ASCII, which is pretty much an
obsolete concept today, always was, is, and always will be a 7-bit
code. There is no such thing as "8-bit ASCII", and never has been.

"If the range of type char corresponds to 7-bit ASCII" -- no such C++
system exists, so the wording is non-sensical. The C++ standard
requires a conforming implementation to provide a minimum range of
values for type char, depending on whether it is signed or unsigned,
or -127 to +127, or 0 to 255. In either case, plain char can hold at
least 127 values that are not in the ASCII character set.

"Otherwise prefer to use wchar_t type", which is neither required nor
guaranteed to be different from the plain char type.

Quote:
Hmm, I don't feel very well forcing my compiler in an area where the
language does not enforce. Second, I don't see why an '8-bit character'
having a 'negative value' (encoded as a char) could ever be harmful. Do
you know any examples?

Careless mixing of signed and unsigned types, not just signed char or
plain char, if signed, can cause unexpected problems.

--
Jack Klein
Home: LINK
FAQs for
comp.lang.c LINK
comp.lang.c++ LINK
alt.comp.lang.learn.c-c++
LINK

[ See LINK for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 

 
Mathias Gaunard
PostPosted: Sun Aug 17, 2008 4:58 pm    Post subject: Re: Force 'char' to be implemented as 'unsigned char'?
       
On 17 août, 05:28, Matthias Kluwe <mkl...@gmail.com> wrote:

Quote:
Hmm, I don't feel very well forcing my compiler in an area where the
language does not enforce. Second, I don't see why an '8-bit character'
having a 'negative value' (encoded as a char) could ever be harmful. Do
you know any examples?

It's not harmful unless you use arthmetic or bitwise operations on it.


--
[ See LINK for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 

 
Francis Glassborow
PostPosted: Sun Aug 17, 2008 4:58 pm    Post subject: Re: Force 'char' to be implemented as 'unsigned char'?
       
Matthias Kluwe wrote:

Quote:
Hmm, I don't feel very well forcing my compiler in an area where the
language does not enforce. Second, I don't see why an '8-bit character'
having a 'negative value' (encoded as a char) could ever be harmful. Do
you know any examples?


bool foo(unsigned char c, char d){

return c<d;
}

int main(){
std::cout << foo(3, 130);
}

What should the output be? Of course this program is silly but it
illustrates the problem of char being implemented in one of two ways.


--
Note that robinton.demon.co.uk addresses are no longer valid.

[ See LINK for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 

 
Daniel T.
PostPosted: Sun Aug 17, 2008 4:58 pm    Post subject: Re: Force 'char' to be implemented as 'unsigned char'?
       
Matthias Kluwe <mkluwe@gmail.com> wrote:

Quote:
I came across a coding standard manual (http://www.codingstandard.com/
HICPPCM/index.html) and decided to make up my mind about the suggestions
in there.

I quickly found that I don't understand the justifications for many of
the given rules very well. Here's an example (item 2.2):

"Specify in your compiler configuration that plain 'char' is implemented
as 'unsigned char'. Justification: Support 8-bit ASCII for
internationalisation. The size and sign of char is implementation-
defined. If the range of type char corresponds to 7-bit ASCII, and 8-bit
characters are used, unpredictable behaviour may result. Otherwise prefer
to use wchar_t type."

Hmm, I don't feel very well forcing my compiler in an area where the
language does not enforce. Second, I don't see why an '8-bit character'
having a 'negative value' (encoded as a char) could ever be harmful. Do
you know any examples?

Since I have to deal with this on a regular basis, I'll give you an
example...

On the Nintendo DS and Nintendo Wii, I have an array of images, each one
corresponding to a particular letter. Displaying a word amounts to
drawing the correct images in the correct order...

// its a bit more complex than the below of course, but this gets
// the point across.

void display( char* c, int x, int y ) {
while ( *c ) {
image[*c].draw( x, y );
y += image[*c].width();
++c;
}
}

Do you see the problem with the above if a char is signed and someone
tries to draw, "René"? (thats 0x52, 0x65, 0x6E, 0xE9.)

Also, converting from a char to a wchar_t leads to surprising (and
incorrect) results sometimes:

char c = 0xE9; 'é'
wchar_t wc = c;
assert( wc == c ); // this will succeed
assert( wc == 0xE9 ); // this will fail

--
[ See LINK for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 

 
Jouko Koski
PostPosted: Sun Aug 17, 2008 6:53 pm    Post subject: Re: Force 'char' to be implemented as 'unsigned char'?
       
"Matthias Kluwe" <mkluwe@gmail.com> wrote:
Quote:
I don't see why an '8-bit character'
having a 'negative value' (encoded as a char) could ever be harmful. Do
you know any examples?

Consider implementing a function for checking "good" characters (in the
spirit of isalpha, isdigit etc.);

bool isgood(char c)
{
bool const * const good = { false, true, true /* etc. whatever... */ };
return good[c];
}

Of course, this sort of a simple implementation is may not be portable due
to other reasons, but it silently assumes only non-negative char values.

--
Jouko

[ See LINK for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 

 
Alf P. Steinbach
PostPosted: Mon Aug 18, 2008 1:28 pm    Post subject: Re: Force 'char' to be implemented as 'unsigned char'?
       
* Matthias Kluwe:
Quote:
I don't see why an '8-bit character'
having a 'negative value' (encoded as a char) could ever be harmful. Do
you know any examples?

This should probably be a FAQ.

I find it a bit curious that respondents so far have not mentioned the
standard
library.

#include <ctype.h>
#include <iostream>

int main()
{
using namespace std;

cout << isdigit( (unsigned char)'æ' ) << endl; // OK
cout << isdigit( 'æ' ) << endl; // !OK, UB.
}

Cheers, & hth.,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?


[ See LINK for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 

 
Guest
PostPosted: Mon Aug 18, 2008 9:08 pm    Post subject: Re: Force 'char' to be implemented as 'unsigned char'?
       
On Aug 17, 7:50 am, Jack Klein <jackkl...@spamcop.net> wrote:
Quote:
On Sat, 16 Aug 200821:28:31 CST, Matthias Kluwe <mkl...@gmail.com
wrote in comp.lang.c++.moderated:
I quickly found that I don't understand the justifications for many of
the given rules very well. Here's an example (item 2.2):

"Specify in your compiler configuration that plain 'char' is implemented
as 'unsigned char'. Justification: Support8-bit ASCII for
internationalisation. The size and sign of char is implementation-
defined. If the range of type char corresponds to 7-bit ASCII, and8-bit
characters are used, unpredictable behaviour may result. Otherwise prefer
to use wchar_t type."

I would be suspicious of someone pontificating on coding standards who
can't even be bothered to get either terminology, or logic, right.
Here are just a few examples.

"8-bit ASCII" -- no such thing. ASCII, which is pretty much an
obsolete concept today, always was, is, and always will be a 7-bit
code. There is no such thing as "8-bit ASCII", and never has been.

"If the range of type char corresponds to 7-bit ASCII" -- no such C++
system exists, so the wording is non-sensical. The C++ standard
requires a conforming implementation to provide a minimum range of
values for type char, depending on whether it is signed or unsigned,
or -127 to +127, or 0 to 255. In either case, plain char can hold at
least 127 values that are not in the ASCII character set.

Now, IIRC, there are some implementations out there which:
1- Are hardcoded to use ASCII
2- Use 8 bit bytes
3- Decided to use the unused 8th bit as a parity check bit.

Thus, if you're working on one of these machines, converting 'a' to a
UTF8 character may not produce the intended result, because the
compiler and hardware will just copy the 8 bit byte over, including
the 8th parity bit, which is not the intended UTF8 character.

However, if that was the intent of the document quoted by the OP, they
did a pisspoor job trying to get that across.

--
[ See LINK for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 

 
Greg Herlihy
PostPosted: Tue Aug 19, 2008 12:42 am    Post subject: Re: Force 'char' to be implemented as 'unsigned char'?
       
On Aug 18, 2:08 pm, JoshuaMaur...@gmail.com wrote:
Quote:
On Aug 17, 7:50 am, Jack Klein <jackkl...@spamcop.net> wrote:

"If the range of type char corresponds to 7-bit ASCII" -- no such C++
system exists, so the wording is non-sensical. The C++ standard
requires a conforming implementation to provide a minimum range of
values for type char, depending on whether it is signed or unsigned,
or -127 to +127, or 0 to 255. In either case, plain char can hold at
least 127 values that are not in the ASCII character set.

Now, IIRC, there are some implementations out there which:
1- Are hardcoded to use ASCII
2- Use 8 bit bytes
3- Decided to use the unused 8th bit as a parity check bit.

There are no "unused" bits in a C++ char type. Instead, every bit must
participate in the char's value representation. So the possiblity that
a C++ char could have any kind of parity bit - is ruled out. And for
unsigned chars, the requirements are even more strict: for an unsigned
char, every possible bit pattern must represent a valid number.
Whereas the presence of a parity bit in a char would necessitate that
certain bit patterns do not represent valid numbers.

Quote:
Thus, if you're working on one of these machines, converting 'a' to a
UTF8 character may not produce the intended result, because the
compiler and hardware will just copy the 8 bit byte over, including
the 8th parity bit, which is not the intended UTF8 character.

So under these circumstances, converting a char value to, say, an int,
will - for certain char values - also wind up adding 128 to the value
of the int?

Greg



--
[ See LINK for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 

 
Guest
PostPosted: Tue Aug 19, 2008 9:43 am    Post subject: Re: Force 'char' to be implemented as 'unsigned char'?
       
On Aug 18, 5:42 pm, Greg Herlihy <gre...@mac.com> wrote:
Quote:
On Aug 18, 2:08 pm, JoshuaMaur...@gmail.com wrote:

On Aug 17, 7:50 am, Jack Klein <jackkl...@spamcop.net> wrote:

"If the range of type char corresponds to 7-bit ASCII" -- no such C++
system exists, so the wording is non-sensical. The C++ standard
requires a conforming implementation to provide a minimum range of
values for type char, depending on whether it is signed or unsigned,
or -127 to +127, or 0 to 255. In either case, plain char can hold at
least 127 values that are not in the ASCII character set.

Now, IIRC, there are some implementations out there which:
1- Are hardcoded to use ASCII
2- Use 8 bit bytes
3- Decided to use the unused 8th bit as a parity check bit.

There are no "unused" bits in a C++ char type. Instead, every bit must
participate in the char's value representation. So the possiblity that
a C++ char could have any kind of parity bit - is ruled out. And for
unsigned chars, the requirements are even more strict: for an unsigned
char, every possible bit pattern must represent a valid number.
Whereas the presence of a parity bit in a char would necessitate that
certain bit patterns do not represent valid numbers.

If I'm reading the standard correctly, which I think I am, it is not
specified how 'a' gets encoded. It may be ASCII. It may not be. Thus:
char a = 'a';
int_32 codepoint c = a;
utf8string str;
str.append(c);
On most machines, they implement English string literals as ASCII, and
thus are also UTF8. However, the machine may not encode English string
literals in ASCII.

For example, a machine may encode string literals as an extended
ASCII, the least significant 7 bits as ASCII with the 8th bit as a
parity check, and this is allowed by the standard. (Within some
limitation. I think the standard requires that the character digits
'0', '1', ..., '9' map to integers '0', '0'+1, ..., '0'+9, so ASCII
encoding with an 8th bit parity check wouldn't actually be allowed...)
This would result in the above code fragment doing not what you
wanted. It would be annoying to catch, as it would work on most
machines as most machines encode English string literals in ASCII.

I brought the 8th bit parity encoding up because I think I remember
reading about real systems out there which do that, and this gotcha
may have been what the quote in the OP was talking about. (Though
using my '0', '1', ..., '9' argument, such a system is probably the
figment of my imagination, thus confirming the source in the OP has no
clue what it's talking about.)

Quote:
Thus, if you're working on one of these machines, converting 'a' to a
UTF8 character may not produce the intended result, because the
compiler and hardware will just copy the 8 bit byte over, including
the 8th parity bit, which is not the intended UTF8 character.

So under these circumstances, converting a char value to, say, an int,
will - for certain char values - also wind up adding 128 to the value
of the int?

No, but you might not get the usual ASCII encoding.

--
[ See LINK for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
 

Page 1 of 2 .:. Goto page 1, 2  Next

Google
 
Webnews.only-4-geeks.com

Windows Update | C++ | C | PHP | JavaScript | Photoshop | Programming | Windows 2000 | Python | Windows XP | Object | Flash | Flash - ActionScript | Paint Shop Pro | Excel | PowerPoint | Access | Word | Windows 98 | Internet Explorer 6.0 | CorelDraw12 | Java | XML | asm x86 | Linux Mandrake | Linux RedHat | Outlook |  | news from newsgroups |_ | s

Web Templates

Awesome Website Templates ©

Pościel projekty garaży Mieszkania Kraków noclegi w bieszczadach Diety