请教如何用Python爬虫爬取中文编码格式出错的网页数据

flyingba

打算在深圳交易所网站爬取“终止上市公司”名单，在这个网址：
http://www.szse.cn/market/companys/suspend/index.html
这个页面上有个“下载”按钮，直接点击这个“下载”按钮，则能够下载1份EXCEL文件。

右击“下载”按钮，点“复制链接地址”，
http://www.szse.cn/api/report/ShowReport?SHOWTYPE=xlsx&CATALOGID=1793_ssgs&TABKEY=tab2&random=0.4682778939187813

用Python的request功能，爬取时出现如下错误：
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 47: invalid start byte
换了编码格式也不行：
UnicodeDecodeError: 'gbk' codec can't decode byte 0xc4 in position 51: illegal multibyte sequence
UnicodeDecodeError: 'gb2312' codec can't decode byte 0xad in position 47: illegal multibyte sequence
UnicodeDecodeError: 'gb18030' codec can't decode byte 0xc3 in position 51: illegal multibyte sequence
UnicodeDecodeError: 'big5' codec can't decode byte 0xad in position 47: illegal multibyte sequence
UnicodeDecodeError: 'big5hkscs' codec can't decode byte 0xad in position 47: illegal multibyte sequence

加了'ignore'，读是读出来了，但是打印出来，全是乱码。

请教各位大师，如何解决。

谢谢。

请教如何用Python爬虫爬取中文编码格式出错的网页数据

共 0 个关于本帖的回复最后回复于 2019-8-27 10:14

推荐板块

精彩推荐

热门话题

热门用户

	B Color Image Link Quote Code Smilies 高级模式您需要登录后才可以回帖登录 \| 立即注册回帖并转播回帖后跳转到最后一页

请教如何用Python爬虫爬取中文编码格式出错的网页数据

共 0 个关于本帖的回复 最后回复于 2019-8-27 10:14

推荐板块

精彩推荐

热门话题

热门用户

共 0 个关于本帖的回复最后回复于 2019-8-27 10:14