Downloading Files in Python with aiohttp

Feb 22, 2024 ยท 3 min read

Python's aiohttp library provides a simple way to asynchronously download files using Python. In this guide, we'll walk through how to download files from a URL and save them locally with aiohttp.

Why aiohttp for Downloading Files?

The aiohttp library is great for downloading files for a few reasons:

  • It's asynchronous and non-blocking - aiohttp uses asyncio under the hood, so our code won't block while waiting for file downloads. This makes it very fast.
  • Simple API - Just a couple lines of code to download a file.
  • Handling streams - It properly handles downloading file streams instead of needing to load the entire file contents into memory.
  • Downloading a File

    Here's a simple example to download a file and save it locally:

    import aiohttp
    import asyncio
    
    async def download_file(url, filename):
        async with aiohttp.ClientSession() as session:
            async with session.get(url) as response:
                data = await response.read()
                with open(filename, "wb") as f: 
                    f.write(data)
    
    asyncio.run(download_file("https://example.com/image.png", "image.png"))

    We create an aiohttp.ClientSession() which handles making HTTP requests and managing connections. This is designed to be used in an async with block so it's properly cleaned up.

    We make a GET request to the URL to download, then stream the response data and write it directly to a file.

    The response.read() is asynchronous, so control will yield back to the event loop while waiting for the download, avoiding any blocking.

    That's all there is to the basics! Now let's go over some tweaks and optimizations.

    Handling Large Files

    When dealing with larger downloads, we may not want to load the entire file into memory at once.

    We can stream the response directly to disk as data comes in by iterating through the response content:

    with open(filename, "wb") as f:
        chunk_size = 4096
        async for data in response.content.iter_chunked(chunk_size):
            f.write(data)

    This reads the response in 4KB chunks, writing each chunk to disk before requesting the next. This prevents excessive memory usage even with huge file downloads.

    Progress Reporting

    We can report download progress by checking the Content-Length header on the response:

    total_size = int(response.headers.get("Content-Length", 0))
    downloaded = 0 
    
    with open(filename, "wb") as f:
        while True: 
            chunk = await response.content.read(chunk_size)
            if not chunk:
                break
            f.write(chunk)
            downloaded += len(chunk)
            print(f"Downloaded {downloaded}/{total_size} bytes")

    We track bytes downloaded versus total size to display a simple progress meter.

    Handling Errors

    It's good practice to handle exceptions in case of issues like connection errors or invalid URLs:

    try:
        async with session.get(url) as response:
           ... 
    except aiohttp.ClientConnectionError:
        print("Connection error") 
    except aiohttp.ClientResponseError:
        print("Invalid response")

    This makes our script more robust to real-world scenarios.

    Conclusion

    The aiohttp library provides a great way to efficiently download files through Python without blocking the main thread. Features like streaming support and progress reporting also help create robust scripts to fetch files.

    Some next things to explore are:

  • Downloading multiple files concurrently for faster transfers
  • Handling authentication if downloading protected files
  • Resuming partial downloads
  • Browse by tags:

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: