Solving Cloudflare Errors with Selenium and Undetected Chromedriver

Apr 2, 2024 ยท 3 min read

Selenium is a powerful tool for web scraping and automation. However, when faced with Cloudflare protection, it can throw errors. This is where undetected_chromedriver comes in.

What is Undetected Chromedriver?

Undetected Chromedriver is a Python package that provides a way to use Selenium with Chromium browser without being detected as a bot. It helps bypass Cloudflare and other anti-bot measures.

Common Cloudflare Errors

When using Selenium with a regular Chrome webdriver, you might encounter Cloudflare errors like:

  • "Access denied"
  • "Checking your browser before accessing"
  • CAPTCHA challenges
  • These errors occur because Cloudflare detects the automated browser.

    Using Undetected Chromedriver

    To solve these issues, we can use undetected_chromedriver instead of the regular Chrome webdriver. Here's how:

    from undetected_chromedriver import Chrome
    
    driver = Chrome()
    driver.get("<https://example.com>")
    

    This creates a Chrome instance that appears like a regular user browser to Cloudflare.

    Benefits of Undetected Chromedriver

    Using undetected_chromedriver has several advantages:

    1. Bypasses Cloudflare anti-bot detection
    2. Reduces the chances of getting blocked
    3. Allows scraping websites protected by Cloudflare

    Headless Mode

    Undetected Chromedriver also supports headless mode, which runs the browser without a visible UI. This is useful for running scripts on servers or saving system resources.

    from undetected_chromedriver import Chrome
    
    options = Chrome.options()
    options.headless = True
    
    driver = Chrome(options=options)
    

    Handling CAPTCHAs

    Even with undetected_chromedriver, you might occasionally face CAPTCHA challenges. To solve them, you can:

    1. Use a CAPTCHA solving service
    2. Implement a CAPTCHA solver using image recognition
    3. Retry the request after a delay

    Here's an example of retrying after a delay:

    import time
    
    MAX_RETRIES = 3
    retry_count = 0
    
    while retry_count < MAX_RETRIES:
        try:
            driver.get("<https://example.com>")
            break
        except:
            retry_count += 1
            time.sleep(5)  # Wait for 5 seconds before retrying
    

    Best Practices

    When using undetected_chromedriver, follow these best practices:

  • Use a pool of IP addresses or proxies
  • Add random delays between requests
  • Rotate user agents and other headers
  • Avoid making too many requests in a short period
  • Limitations

    While undetected_chromedriver is effective, it has some limitations:

    1. It may not work for all websites
    2. Cloudflare may still detect and block the browser in some cases
    3. It is slower compared to using a regular webdriver

    Conclusion

    Undetected Chromedriver is a valuable tool for web scraping when faced with Cloudflare protection. By mimicking a regular user browser, it helps bypass anti-bot measures and allows scraping websites that would otherwise block Selenium.

    However, it's important to use it responsibly and follow best practices to avoid getting blocked. With proper implementation, undetected_chromedriver can greatly enhance your web scraping capabilities.

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: