python的requests库是基于urllib库实现的,但是使用起来更方便,更像jquery那样简洁。
内部是通过Session类实现所有功能:
class Session(SessionRedirectMixin):
def request(self, method, url,
params=None,
data=None,
headers=None,
cookies=None,
files=None,
auth=None,
timeout=None,
allow_redirects=True,
proxies=None,
hooks=None,
stream=None,
verify=None,
cert=None):
"""Constructs a :class:`Request <Request>`, prepares it and sends it.
Returns :class:`Response <Response>` object.
:param method: method for the new :class:`Request` object.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary or bytes to be sent in the query
string for the :class:`Request`.
:param data: (optional) Dictionary or bytes to send in the body of the
:class:`Request`.
:param headers: (optional) Dictionary of HTTP Headers to send with the
:class:`Request`.
:param cookies: (optional) Dict or CookieJar object to send with the
:class:`Request`.
:param files: (optional) Dictionary of 'filename': file-like-objects
for multipart encoding upload.
:param auth: (optional) Auth tuple or callable to enable
Basic/Digest/Custom HTTP Auth.
:param timeout: (optional) Float describing the timeout of the
request.
:param allow_redirects: (optional) Boolean. Set to True by default.
:param proxies: (optional) Dictionary mapping protocol to the URL of
the proxy.
:param stream: (optional) whether to immediately download the response
content. Defaults to ``False``.
:param verify: (optional) if ``True``, the SSL cert will be verified.
A CA_BUNDLE path can also be provided.
:param cert: (optional) if String, path to ssl client cert file (.pem).
If Tuple, ('cert', 'key') pair.
"""
但封装成api之后,就非常类似jquery的ajax了,如:
import requests
requests.get()
requests.post()
requests.put()
...
使用urllib就知道,参数需要使用urllib.urlencode进行编码处理,requests自动解决这些:
requests.get('https://www.example.com/xxx', verify=False, params={'type': 'test'})
requests.post('https://www.example.com/xxx', verify=False, data={'type': '测试'})
也可以post一个json,如:
import json
requests.post('https://www.example.com/xxx', verify=False, data=json.dumps({'type': '测试'}))
类似jquery的ajax,返回结果可以看成普通text,也可以自动解json:
r = requests.get(...)
print(r.text)
print(r.json())
python牛逼的语法优势就显现出来了,可以直接获取某个url api的值,就像没有发起网络请求一样:
name = requests.get('https://www.example.com/api/xxx/', verify=False).json()['name']
文件上传也非常方便,各种情形requests模块也都考虑周全了。
files = {'video': open('/tmp/test.video')}
r = requests.post(url, files=files)
文件可以有多种格式,即所谓的:Dictionary of 'name': file-like-objects (or {'name': ('filename', fileobj)}),如果fileobj为字符串,会帮你转成StringIO,如果是bytes类型,会帮你转成BytesIO,非常省心:
files = {'score': ('test.txt', '测试一下')}
cookies是一个类dict实现,可以r.cookies.get()和r.cookies.get_dict(),最重要的是,使用同一个session请求同一域名时,cookies能自动带上:
s = requests.Session()
r = s.get('https://www.example.com/login', verify=False)
r = s.post('https://www.example.com/login',
data={'username': 'xxx', 'password': 'xxx', '_xsrf': s.cookies.get('_xsrf')},
verify=False,
headers=headers)
r = s.get('https://www.example.com/index', verify=False)
支持HTTPProxyAuth、HTTPDigestAuth等认证方式
任何网络请求都有可能失败,更为安全的请求写法应该是非200时抛出异常:
try:
r = requests.get(...)
r.raise_for_status()
except requests.RequestException as e:
print(e)
else:
return r.json()
有时候为了加快请求,可以手动指定IP地址,而不是每次都进行DNS查询,考虑到https的证书问题,也一并指定不验证:
requests.get('https://xx.xx.xx.xx', headers={'Host':'test.com'}, verify=False)
借助stream=True,可以实现流式下载大文件:
def download_file(url, local_filename):
# NOTE the stream=True parameter below
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024 * 32):
# If you have chunk encoded response uncomment if
# and set chunk_size parameter to None.
#if chunk:
f.write(chunk)
print('.', sep='', end='', flush=True)
return local_filename