楼主:
ReiFu21 (ReiFu)
2016-09-12 15:06:45开发平台(Platform): (Ex: VC++, GCC, Linux, ...)
Code Blocks C++
"A"这个字,在UTF8为一字节编码
16进位表示法为:41
10进位表示法为:65
"您"这个字,在UTF8为三字节编码
16进位表示法为:E6 82 A8
10进位表示法为:230 130 168
UTF8.txt内容为:您 \n A
我现在想将UTF8.txt内容转化成10进位表示法
#include <stdlib.h>
#include <fstream>
#include <iostream>
#include <cstdlib>
using namespace std;
int main(void)
{
int x;int y;
char txt[80]="";
ifstream ifile("C:\\Users\\Gon\\Desktop\\UTF8.txt",ios::binary);
if(ifile.is_open())
{
while(!ifile.eof())
{
ifile >> txt;
cout << txt<< endl;
x=char (txt[0]);
switch(x)
{ case 0-127:
cout <<"1st byte~ " <<x << endl;
break;
case 240-247:
cout <<"1st byte~ " <<x << endl;
y=char (txt[1]);
cout <<"2nd byte~ " <<y << endl;
y=char (txt[2]);
cout <<"3rd byte~ " <<y << endl;
break;
default:
cout <<"1st byte~ " <<x << endl;
y=char (txt[1]);
cout <<"2nd byte~ " <<y << endl;
y=char (txt[2]);
cout << "3rd byte~ " <<y << endl;
y=char (txt[3]);
cout << "4th byte~ " <<y << endl;
y=char (txt[4]);
cout << "5th byte~ " <<y << endl;
y=char (txt[5]);
cout << "6th byte~ " <<y << endl;
}
}
}
else
cout << "fail to open file" << endl;
ifile.close(); // close file
system("pause");
return 0;
}
我想要得到的结果是:
您
1st byte~ 230
2nd byte~ 130
3rd byte~ 168
A
1st byte~ 65
可是实际跑出来的结果是:
您
1st byte~ -26
2nd byte~ -126
3rd byte~ -88
4th byte~ 0
5th byte~ 0
6th byte~ 0
A
1st byte~ 65
2nd byte~ 0
3rd byte~ -88
4th byte~ 0
5th byte~ 0
6th byte~ 0
几个问题点:
1. "A"的1st byte是65 应该代入case 0-127 可实际上却代入default case 为何?
2. "A"跑出来是单字节 数值65没错 "您"跑出来是三个字节 数值完全不对 请问修改法?
有请大大们帮忙指出问题所在 感谢!!
作者: yvb 2016-09-12 15:15:00
x=char (txt[0]); 改为 x = (unsigned char) txt[0];
作者:
PkmX (阿猫)
2016-09-12 15:21:00C和C++没有range case的写法啊...case m ... n: 是编译器的 extension
作者:
Caesar08 (Caesar)
2016-09-12 16:30:00为啥default是印6 byte?然后又有印3 byte的UTF8?另外,你的while与ifstream那样写,会出错喔ㄜ,标准的UTF8是没有支援到6个byte的。另外,还是建议2跟4 byte的都做一下处理比较好
作者:
uranusjr (â†é€™äººæ˜¯è¶…級笨蛋)
2016-09-13 19:28:00UTF-8 的 encoding 规则可以 encode 到 6 bytes, 但只会用到 4-byte encoding 因为更后面的用不到了