Getting Past "Access Denied" Errors with Selenium and Requests

Apr 2, 2024 ยท 3 min read

Have you ever tried to scrape or test a website with Selenium or Requests in Python, only to be greeted by pesky "Access Denied" errors? These forbidden access messages can be frustrating, but with the right approach you can often bypass them.

Common Causes of Access Errors

There are a few main reasons why you might encounter access errors:

  • Blocking by IP address - Many sites block traffic from certain IP ranges to prevent abuse. If your code runs from a blocked IP, you'll get errors.
  • Missing browser headers - Modern sites often check headers like User-Agent to ensure requests come from real browsers. Headless Selenium and Requests don't send these by default.
  • Bot protection systems - Some sites use systems like Cloudflare to detect bot traffic and deny access. These can be tricky to bypass.
  • Tips for Bypassing Access Errors

    Here are some tips for handling "Access Denied" errors:

    Use a Proxy or VPN

    One easy fix is to route your traffic through a proxy service or VPN. This gives your code a different IP address that may not be blocked:

    proxied_session = requests.Session()
    proxied_session.proxies = {"http": "http://192.168.1.1:3128"} 
    response = proxied_session.get("https://example.com")

    With Selenium, you can configure the browser proxy settings to route traffic through a proxy.

    Mimic a Real Browser

    For headless Selenium or Requests, mimic a real web browser by adding browser User-Agent and other headers:

    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)..."}
    
    requests.get("https://example.com", headers=headers)

    This makes your code appear to the site as a regular browser.

    Use a "Real" Browser with Selenium

    Consider using a normal Selenium-controlled browser like Chrome or Firefox instead of headless mode. Many sites have better bot protection against headless browsers.

    A real GUI browser can more easily bypass protections. Just be careful about scaling this approach up.

    Slow Down Requests

    Sometimes simple rate limiting does the trick. Sites may block you if they detect unusually fast automated access:

    import time
    
    for page in range(10):
       response = requests.get("https://example.com/page"+str(page))  
       time.sleep(5) # Pause 5 seconds   

    This crawling pattern appears more human.

    Cache and Reuse Cookies

    For sites that track your session, reuse cookies from a real browser session instead of allowing headless Selenium or Requests to accept new cookies each time:

    # After logging into site manually...
    cookies = selenium_driver.get_cookies()  
    
    for cookie in cookies:
        requests.get("https://example.com", cookies={cookie['name']: cookie['value']})

    This lets your code reuse an authenticated session.

    When All Else Fails...

    Sometimes elaborate bot protection will still block everything. If you absolutely must access the site and the methods above don't work, consider automating an actual browser instead of headless mode.

    This uses more resources, but tools like Selenium allow controlling a real Chrome browser to bypass protections websites apply specifically against headless browsers and bots.

    Key Takeaways

    Here are some key tips to remember:

  • Use proxies, VPNs, or browser headers to mimic a real user
  • Slow down request speed to appear human
  • Cache and reuse browser cookies to reuse authenticated sessions
  • Use a real browser instead of headless where you can
  • When advanced protection blocks everything else, consider automating a visible browser instead of headless
  • With the right approach, you can often find a way to bypass pesky access errors while web scraping and testing sites. The methods above should give you some options to try next time you get denied.

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: