热度 18
2017-11-11 17:01
2377 次阅读|
0 个评论
(一) 功能描述: 目标:获取上交所和深交所所有股票的名称和交易信息; 输出:保存到文件 技术路线:requests-bs4-re 候选数据网站的选择 新浪股票: http://finance.sina.com.cn/stock/ 百度股票: https://gupiao.baidu.com/stock/ 网站选取原则: 股票信息静态存在于HTML页面中,非js代码生成,没有robots协议限制; 程序的结构设计 步骤1:从东方财富网获取股票列表; 步骤2:根据股票列表逐个到百度股票获取个股信息; 步骤3:将结果存储到文件; (二)代码 import requests from bs4 import BeautifulSoup import traceback import re def getHTMLText(url,code='utf-8'): try: r = requests.get(url,timeout=30) r.raise_for_status() r.encoding = code return r.text except: return def getStockList(lst,stockURL): html = getHTMLText(stockURL,'GB2312') soup = BeautifulSoup(html,'html.parser') a = soup.find_all('a') for i in a: try: href = i.attrs lst.append(re.findall(r \d{6},href) ) except: continue def getStockInfo(lst,stockURL,fpath): count = 0 for stock in lst: url = stockURL + stock + .html html = getHTMLText(url) try: if html == : continue infoDict = {} soup=BeautifulSoup(html,'html.parser') stockInfo = soup.find('div',attrs={'class':'stock-bets'}) name = stockInfo.find_all(attrs={'class':'bets-name'}) infoDict.update({'股票名称':name.text.split() }) keyList = stockInfo.find_all('dt') valueList = stockInfo.find_all('dd') for i in range(len(keyList)): key = keyList .text val = valueList .text infoDict = val with open(fpath,'a',encoding='utf-8') as f: f.write(str(infoDict)+'\n') count = count + 1 print(\r当前进度:{:.2f}%.format(count*100/len(lst)),end=) except: traceback.print_exc() continue def main(): stock_list_url = http://quote.eastmoney.com/stocklist.html stock_info_url = https://gupiao.baidu.com/stock/ output_file = H://BaiduStockInfo.txt slist=[] getStockList(slist,stock_list_url) getStockInfo(slist,stock_info_url,output_file) main()