[问题] 请问python爬虫乱码问题 araymilesli PTT批踢踢实业坊

[问题] 请问python爬虫乱码问题

楼主: araymilesli (为善不欲人知) 2014-08-08 17:17:08

各位板大大德，小弟是最近刚踏入python殿堂的新手，用网络上范例小改了一只属于自己
的爬虫
，用第一个print印时显示是正常的，但用split分割每个逗号，并把它存入阵打印出却出
现以下
码（只截取一小部分）：
(5\xe5\xb9\xb4)', '189.01', '0.00', '0.00%', '08/07', '',
'\xe5\xa5\xa7\xe5\x9c\xb0\xe5
\x88\xa9CDS(5\xe5\xb9\xb4)', '34.65', '0.15', '0.43%', '08/07', '',
'\xe4\xb8\xad\xe5\x
9c\x8bCDS(5\xe5\xb9\xb4)', '138.00', '16.00', '13.11%', '08/07', '',
'\xe4\xb8\xb9\xe9\
xba\xa5CDS(5\xe5\xb9\xb4)', '25.475', '0.36', '1.43%', '08/07', '',
'\xe5\x8d\x97\xe9\x9f
\x93CDS(5\xe5\xb9\xb4)', '59.00', '-0.63', '-1.06%', '08/07', '',
'\xe7\x91\x9e\xe5\xa3\x
abCDS(5\xe5\xb9\xb4)', '30.01', '0.00', '0.00%', '08/07', '',
'\xe7\xbe\x8e\xe5\x9c\x8bCDS(
5\xe5\xb9\xb4)', '16.38', '0.39', '2.44%', '08/07', ''
程式码如下：
#-*- coding:utf-8 -*-
from sgmllib import SGMLParser
import urllib,re,os,sys
class handleFuturePrice(SGMLParser):
def reset(self):
SGMLParser.reset(self)
self.number=0
self.new=""
def start_tr(self,attrs):
if attrs==[('class', 'row1')] or attrs==[('class', 'row2')]:
self.number=1
def handle_data(self, data):
if self.number==1:
self.new+=data
def end_tr(self):
self.number=0
def getFuturePrice(url,parser):
try:
URLprice = urllib.urlopen(url)
parser.feed(URLprice.read())
URLprice.close()
except:
return
def readOneLine(new):
lines=""
returnLines=[]
lines+=new.readline()
returnLines.append(lines)
return returnLines
a=[]
startURL = "http://www.stockq.org/"
parser = handleFuturePrice()
getFuturePrice(startURL,parser)
new=parser.new.replace('\n',',')
print new # 这里显示正常
#下面用逗号分割并把它存入a阵列再印出就出现乱码了
a=new.split(',')
print a
python版本是使用2.7.8，请各位大大帮小弟解决疑惑...已经改３天了...
谢谢大家帮忙！

作者: uranusjr (â†é€™äººæ˜¯è¶…ç´šç¬¨è›‹) 2014-08-08 17:37:00

https://github.com/moskytw/uniout其实它没有乱码, 你 print a[0] 就会发现正常了

楼主: araymilesli (为善不欲人知) 2014-08-08 17:46:00

可是我在GUI接口操作为什么会出现上述乱码呢？

作者: uranusjr (â†é€™äººæ˜¯è¶…ç´šç¬¨è›‹) 2014-08-08 17:53:00

http://d.pr/qUrn 不用谢了

作者: carylorrk (carylorrk) 2014-08-11 22:15:00

编码问题每个新手一开始都会很困扰啊环境编码、档案编码、资料来源/储存编码、内部表示编码

继续阅读

[问题] django 动态 url 新手提问redbxh [问题] 错误?ghkckhg [问题] python 速度 FOR_LOOPredonizuka Re: [问题] Mac python3 pygame 安装Neisseria [问题] Mac python3 pygame 安装WEILONGCHIN [问题] datatime如何加减CoASH [问题] beautifulsoup 使用问题yshihyu [问题] 请问要怎么写才能保证一定会做clean up？os653 [问题] 请问是否有能直接生成exe的gui tool?skyline99 [推荐] Anacondamaze