Building a Simple Proxy Rotator with PHP and SimpleHTMLDOM

Oct 2, 2023 · 4 min read

In the beginning stages of a web crawling project or when you have to scale it to only a few hundred requests, you might want a simple proxy rotator that uses the free proxy pools available on the internet to populate itself now and then.

We can use a website like https://sslproxies.org/ to fetch public proxies every few minutes and use them in our PHP projects.

This is what the site looks like:

And if you check the HTML using the inspect tool, you will see the full content is encapsulated in a table with the id proxylisttable

The IP and port are the first and second elements in each row.

We can use the following code to select the table and its rows to iterate on and further pull out the first and second elements of the elements.

Fetching the Proxies

First, we'll need to install SimpleHTMLDOM to parse the HTML from the proxy site. You can install it via Composer:

composer require sunra/php-simple-html-dom-parser

Then we can fetch the HTML using cURL:

<?php

require 'vendor/autoload.php';

use Sunra\\PhpSimple\\HtmlDomParser;

$url = '<https://sslproxies.org/>';

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($ch);
curl_close($ch);

$dom = new HtmlDomParser();
$dom->load($html);

This will fetch the HTML from sslproxies.org and load it into a SimpleHTMLDOM object.

Parsing the Proxies

Next, we need to parse the table on the page to extract the proxies. The IPs and ports are in the first and second columns:

$proxies = [];

foreach($dom->find('table#proxylisttable tr') as $row) {
  $cols = $row->find('td');

  if(count($cols) > 1) {
    $proxies[] = [
      'ip' => $cols[0]->plaintext,
      'port' => $cols[1]->plaintext
    ];
  }
}

This loops through the table rows, gets the TD elements, and extracts the IP and port if there are at least 2 columns.

Using a Random Proxy

To use a random proxy from the list, we can pick one at random:

$randomProxy = $proxies[array_rand($proxies)];

$proxyIp = $randomProxy['ip'];
$proxyPort = $randomProxy['port'];

Then we can use it in a cURL request:

$ch = curl_init('<https://example.com>');

curl_setopt($ch, CURLOPT_PROXY, $proxyIp . ':' . $proxyPort);

$result = curl_exec($ch);

curl_close($ch);

Full Code

Here is the full code to fetch proxies and use a random one:

<?php

require 'vendor/autoload.php';

use Sunra\\PhpSimple\\HtmlDomParser;

function getProxies() {

  $url = '<https://sslproxies.org/>';

  $ch = curl_init($url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  $html = curl_exec($ch);
  curl_close($ch);

  $dom = new HtmlDomParser();
  $dom->load($html);

  $proxies = [];

  foreach($dom->find('table#proxylisttable tr') as $row) {
    $cols = $row->find('td');

    if(count($cols) > 1) {
      $proxies[] = [
        'ip' => $cols[0]->plaintext,
        'port' => $cols[1]->plaintext
      ];
    }
  }

  return $proxies;
}

$proxies = getProxies();

$randomProxy = $proxies[array_rand($proxies)];

$proxyIp = $randomProxy['ip'];
$proxyPort = $randomProxy['port'];

$ch = curl_init('<https://example.com>');

curl_setopt($ch, CURLOPT_PROXY, $proxyIp . ':' . $proxyPort);

$result = curl_exec($ch);

curl_close($ch);

This provides a simple way to implement a rotating proxy in PHP. The getProxies() function can be called every few minutes to refresh the list.

If you want to use this in production and want to scale to thousands of links, then you will find that many free proxies won't hold up under the speed and reliability requirements. In this scenario, using a rotating proxy service to rotate IPs is almost a must.

Otherwise, you tend to get IP blocked a lot by automatic location, usage, and bot detection algorithms.

Our rotating proxy server Proxies API provides a simple API that can solve all IP Blocking problems instantly.

• With millions of high speed rotating proxies located all over the world • With our automatic IP rotation • With our automatic User-Agent-String rotation (which simulates requests from different, valid web browsers and web browser versions) • With our automatic CAPTCHA solving technology

Hundreds of our customers have successfully solved the headache of IP blocks with a simple API.

A simple API can access the whole thing like below in any programming language.

curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

We have a running offer of 1000 API calls completely free. Register and get your free API Key here.

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


Try ProxiesAPI for free

curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

<!doctype html>
<html>
<head>
    <title>Example Domain</title>
    <meta charset="utf-8" />
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
...

X

Don't leave just yet!

Enter your email below to claim your free API key: