Asynchronous Requests with Python requests

Асинхронные запросы с помощью запросов Python

Я попробовал пример, приведенный в документации к библиотеке запросов для python.

С помощью async.map(rs) я получаю коды ответов, но я хочу получать содержимое каждой запрашиваемой страницы. Это, например, не работает:

out = async.map(rs)
print out[0].content

Переведено автоматически

Ответ 1

Примечание

Приведенный ниже ответ не применим к запросам версии 0.13.0+. Асинхронная функциональность была перенесена в grequests после написания этого вопроса. Однако вы могли бы просто заменить requests на grequests приведенное ниже, и это должно сработать.

Я оставил этот ответ как есть, чтобы отразить исходный вопрос, который касался использования запросов < v0.13.0.

Чтобы выполнять несколько задач с async.map асинхронно, вам необходимо:

Определите функцию для того, что вы хотите делать с каждым объектом (ваша задача)

Добавьте эту функцию в качестве перехватчика событий в свой запрос

Вызовите async.map список всех запросов / действий

Пример:

from requests import async
# If using requests > v0.13.0, use
# from grequests import async

urls = [
    'http://python-requests.org',
    'http://httpbin.org',
    'http://python-guide.org',
    'http://kennethreitz.com'
]

# A simple task to do to each response object
def do_something(response):
    print response.url

# A list to hold our things to do via async
async_list = []

for u in urls:
    # The "hooks = {..." part is where you define what you want to do
    # 
    # Note the lack of parentheses following do_something, this is
    # because the response will be used as the first argument automatically
    action_item = async.get(u, hooks = {'response' : do_something})

    # Add the task to our list of things to do via async
    async_list.append(action_item)

# Do our list of things to do via async
async.map(async_list)

Ответ 2

async теперь это независимый модуль : grequests.

Смотрите здесь :https://github.com/spyoungtech/grequests

И там: идеальный метод для отправки нескольких HTTP-запросов через Python?

установка:

$ pip install grequests

использование:

создание стека:

import grequests

urls = [
    'http://www.heroku.com',
    'http://tablib.org',
    'http://httpbin.org',
    'http://python-requests.org',
    'http://kennethreitz.com'
]

rs = (grequests.get(u) for u in urls)

отправить стек

grequests.map(rs)

результат выглядит следующим образом

[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]

похоже, что grequests не устанавливают ограничения для одновременных запросов, т. Е. Когда несколько запросов отправляются на один и тот же сервер.

Ответ 3

Я тестировал оба запроса-фьючерса и grequests. Grequests быстрее, но приносит обезьяньи исправления и дополнительные проблемы с зависимостями. запросы-фьючерсы выполняются в несколько раз медленнее grequests. Я решил написать свой собственный и просто обернул запросы в ThreadPoolExecutor, и это было почти так же быстро, как grequests, но без внешних зависимостей.

import requests
import concurrent.futures

def get_urls():
    return ["url1","url2"]

def load_url(url, timeout):
    return requests.get(url, timeout = timeout)

with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:

    future_to_url = {executor.submit(load_url, url, 10): url for url in     get_urls()}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            resp_err = resp_err + 1
        else:
            resp_ok = resp_ok + 1

Ответ 4

К сожалению, насколько я знаю, библиотека requests не оборудована для выполнения асинхронных запросов. Вы можете обернуть async/await синтаксис вокруг requests, но это сделает базовые запросы не менее синхронными. Если вам нужны настоящие асинхронные запросы, вы должны использовать другие инструменты, которые это предоставляют. Одним из таких решений является aiohttp (Python 3.5.3+). По моему опыту, это хорошо работает с синтаксисом Python 3.7 async/await. Ниже я описываю три реализации выполнения n веб-запросов с использованием

Чисто синхронные запросы (sync_requests_get_all) с использованием библиотеки Python requests

Синхронные запросы (async_requests_get_all) с использованием requests библиотеки Python, обернутой в синтаксис Python 3.7 async/await и asyncio

По-настоящему асинхронная реализация (async_aiohttp_get_all) с aiohttp библиотекой Python, обернутой в синтаксис Python 3.7 async/await и asyncio

"""
Tested in Python 3.5.10
"""

import time
import asyncio
import requests
import aiohttp

from asgiref import sync

def timed(func):
    """
    records approximate durations of function calls
    """
    def wrapper(*args, **kwargs):
        start = time.time()
        print('{name:<30} started'.format(name=func.__name__))
        result = func(*args, **kwargs)
        duration = "{name:<30} finished in {elapsed:.2f} seconds".format(
            name=func.__name__, elapsed=time.time() - start
        )
        print(duration)
        timed.durations.append(duration)
        return result
    return wrapper

timed.durations = []


@timed
def sync_requests_get_all(urls):
    """
    performs synchronous get requests
    """
    # use session to reduce network overhead
    session = requests.Session()
    return [session.get(url).json() for url in urls]


@timed
def async_requests_get_all(urls):
    """
    asynchronous wrapper around synchronous requests
    """
    session = requests.Session()
    # wrap requests.get into an async function
    def get(url):
        return session.get(url).json()
    async_get = sync.sync_to_async(get)

    async def get_all(urls):
        return await asyncio.gather(*[
            async_get(url) for url in urls
        ])
    # call get_all as a sync function to be used in a sync context
    return sync.async_to_sync(get_all)(urls)

@timed
def async_aiohttp_get_all(urls):
    """
    performs asynchronous get requests
    """
    async def get_all(urls):
        async with aiohttp.ClientSession() as session:
            async def fetch(url):
                async with session.get(url) as response:
                    return await response.json()
            return await asyncio.gather(*[
                fetch(url) for url in urls
            ])
    # call get_all as a sync function to be used in a sync context
    return sync.async_to_sync(get_all)(urls)


if __name__ == '__main__':
    # this endpoint takes ~3 seconds to respond,
    # so a purely synchronous implementation should take
    # little more than 30 seconds and a purely asynchronous
    # implementation should take little more than 3 seconds.
    urls = ['https://postman-echo.com/delay/3']*10

    async_aiohttp_get_all(urls)
    async_requests_get_all(urls)
    sync_requests_get_all(urls)
    print('----------------------')
    [print(duration) for duration in timed.durations]

На моей машине это результат:

async_aiohttp_get_all          started
async_aiohttp_get_all          finished in 3.20 seconds
async_requests_get_all         started
async_requests_get_all         finished in 30.61 seconds
sync_requests_get_all          started
sync_requests_get_all          finished in 30.59 seconds
----------------------
async_aiohttp_get_all          finished in 3.20 seconds
async_requests_get_all         finished in 30.61 seconds
sync_requests_get_all          finished in 30.59 seconds