Mastering User Agents with Python Requests

Oct 22, 2023 ยท 6 min read

Ready to level up your Python requests skills? Setting the user agent is one of the most important things you need to do to make your requests look legit. In this guide, we'll cover everything you need to know about user agents in requests to help you become a pro!

What's a User Agent?

A user agent is a string that identifies the application, browser, and operating system making a request to a web server. Here's an example:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36

This tells the server that the request is coming from Chrome browser version 74 on Windows 10.

By default, Python requests identifies itself in the user agent:

python-requests/2.22.0

That's a dead giveaway that you're not a real browser! Many sites block or throttle obvious bot traffic, so we need to set a real browser user agent.

Setting a User Agent in Requests

Setting a user agent in requests is simple - just pass it as a header. Here's an example:

import requests

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}

response = requests.get('<https://www.website.com>', headers=headers)

This will make your request look like it's coming from a desktop Chrome browser.

For convenience, you can also set headers at the session level so they are applied to all requests from that session:

session = requests.Session()
session.headers.update({'User-Agent': 'Mozilla/5.0...'})

Picking Random User Agents

Using the same user agent for all your requests makes your traffic easy to detect. A better technique is to pick user agents randomly from a list to appear more human.

Start by compiling a list of various desktop and mobile user agents. You can easily find these online.

Then in your code, choose one randomly for each request:

import requests
import random

user_agents = ['Mozilla/5.0...',
               'Mozilla/5.0...',
               ...]

user_agent = random.choice(user_agents)

headers = {'User-Agent': user_agent}

response = requests.get(url, headers=headers)

This makes every request look like it's coming from a different device and browser.

Beyond User Agent - Full Headers

Sophisticated bots look beyond just the user agent to determine if a request is automated. Real browsers send additional headers that identify the platform, accepted encodings, languages, and more.

We can leverage curl to copy all headers a real browser would send:

$ curl -v <https://www.website.com> 2>&1 | grep -i header
> GET / HTTP/1.1
> Host: www.website.com
> User-Agent: chrome
> Accept: text/html
> Accept-Language: en-US

Then insert these headers into your requests to appear more legitimate:

headers = {
  'User-Agent': 'chrome',
  'Accept': 'text/html',
  'Accept-Language': 'en-US'
}

The Session Approach

Dealing with headers on every request can get tedious. Sessions allow us to set headers just once and have them applied to all requests from that session.

session = requests.Session()
session.headers = {
  'User-Agent': 'chrome',
  'Accept': 'text/html',
  'Accept-Language': 'en-US'
}

response = session.get('<https://www.website.com>')

Much cleaner! And it keeps cookies between requests as well.

I like to create a Session with headers, cookies, and other settings configured for each site I'm scraping:

# Session for website A
session_a = requests.Session()
session_a.headers = {'User-Agent': 'chrome'}

# Session for website B
session_b = requests.Session()
session_b.headers = {'User-Agent': 'firefox'}

This approach makes it easy to customize each scraper.

Pro Tips and Tricks

Here are some pro tips I've picked up for mastering user agents with Python requests:

  • Use a list of 10-20 recent real user agents and rotate randomly. Using the same 1 or 2 gets detectable.
  • Refresh your user agent list once a month to keep up with new browser versions.
  • Use sessions and group requests by browser type - e.g. one session per Chrome, Firefox etc.
  • Vary other headers too - Accept, Encoding, Languages - don't just change the user agent.
  • Use a proxy or VPN to mask your IP as well for maximum secrecy!
  • Monitor blocks - if you start seeing captchas or 503 errors, rotate sessions and proxies.
  • Consider using a browser automation tool like Selenium to generate real browser traffic.
  • For serious scraping, use commercial tools with advanced browser and traffic spoofing.
  • FAQ

    What is the default user agent for Python Requests?

    The default user agent for Python Requests is something like "python-requests/2.26.0". You can access it via requests.utils.default_user_agent().

    import requests
    print(requests.utils.default_user_agent())
    

    How do I change or set a custom user agent in Python Requests?

    You can set a custom user agent by passing the user_agent parameter to requests methods like get() and post().

    import requests
    
    url = '<https://www.example.com>'
    custom_user_agent = 'My User Agent'
    
    response = requests.get(url, headers={'User-Agent': custom_user_agent})
    

    How can I spoof a user agent in Python Requests?

    To spoof a user agent like a browser, device or bot, simply set the user_agent to match the one you want to imitate. Be aware of ethical concerns around spoofing.

    headers = {'User-Agent': 'Mozilla/5.0 (Linux; Android 8.0.0; SM-G930F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.101 Mobile Safari/537.36'}
    
    response = requests.get(url, headers=headers)
    

    What are some common user agents I can spoof in Python Requests?

    Some common user agents to spoof:

  • Browsers: Chrome, Safari, Firefox
  • Devices: iPhone, Android phones
  • Bots: Googlebot, Bingbot
  • How do I set a mobile user agent in Python Requests?

    Use a user agent string from a mobile browser like Safari iOS or Chrome Android.

    mobile_ua = 'Mozilla/5.0 (iPhone; CPU iPhone OS 13_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/80.0.3987.95 Mobile/15E148 Safari/604.1'
    

    What is the purpose of setting a user agent in Python Requests?

    Setting a user agent can help mimic a browser or device to get past blocks on certain user agents. It also helps web servers identify the client.

    What are best practices around spoofing user agents?

    Avoid spoofing user agents for unethical purposes. Only modify user agents for testing or if required for access.

    How do I set a browser user agent like Chrome or Firefox in Python Requests?

    Use the browser's user agent string. For example:

    firefox_ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/110.0'
    

    How can I detect and save user agents from requests using Python Requests?

    The user_agent property of a Response contains the user agent. You can save it to a file.

    import requests
    
    response = requests.get(url)
    user_agent = response.request.headers['User-Agent']
    
    with open('user_agents.txt', 'a') as f:
        f.write(user_agent + '\\n')
    

    How do user agents work with authentication in Python Requests?

    User agents are sent as normal headers even with authentication. Just add the user_agent parameter when using requests.auth.

    How can I use a proxy and set a user agent in Python Requests?

    Pass the proxy URL to the proxies parameter and set user_agent as normal:

    proxies = {'http': '<http://10.10.1.10:3128>'}
    headers = {'User-Agent': 'Mozilla/5.0'...}
    
    response = requests.get(url, proxies=proxies, headers=headers)
    

    What is the difference between a user agent and other headers in Python Requests?

    The user agent provides info about the client while other headers like Accept provide info about the request being made.

    Are there any libraries that help manage user agents in Python Requests?

    user_agent and fake_useragent can generate and manage user agents.

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: