[python语法语义] httplib抓取UTF8编码的网页,将内容解码时出错
ewong
2011-05-24
# -*- coding: utf-8 -*- import codecs import httplib import sys #reload(sys) #sys.setdefaultencoding('utf8') #print sys.getdefaultencoding() conn=httplib.HTTPConnection('www.douban.com',80) conn.request('GET', '/') resp = conn.getresponse() f = codecs.open('C:\\tmp\\web.log', 'w', 'utf8') f.write(resp.read().decode('utf8')) f.close() conn.close() 我试了3个网站,www.douban.com,www.google.com.hk,www.iteye.com,都是UTF8编码的 但只有豆瓣能保存成功,后面两个报错如下: Traceback (most recent call last): File "test.py", line 15, in <module> f.write(resp.read().decode('utf8')) File "D:\Python25\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 16960-16961: invalid data 请大家帮忙看看是什么原因 |
|
suxianbaozi
2011-08-31
不用decode 直接保存就行了
|
|
乖睡觉
2012-07-02
.decode('utf8')) 去掉应该就可以了
|
|
edisonlz
2012-07-16
使用大字符集合,gb18030
|