Scraping Booking.com Property Listings with PHP in 2023

In this article, we will learn how to scrape property listings from Booking.com using PHP. We will use common PHP libraries to fetch the HTML content and then parse and extract key information like property name, location, ratings, etc.

Prerequisites

To follow along, you will need:

PHP 7.0 or higher

Composer installed to add PHP packages

Basic knowledge of PHP and HTML

Installing Dependencies

We will use two PHP packages - Guzzle for sending HTTP requests and Symfony DomCrawler for parsing HTML.

Install them using Composer:

composer require guzzlehttp/guzzle symfony/dom-crawler

This will download the packages into the vendor folder.

Including Dependencies

At the top of your PHP script, include the Composer autoloader and the packages:

require __DIR__ . '/vendor/autoload.php';

use GuzzleHttp\\Client;
use Symfony\\Component\\DomCrawler\\Crawler;

The autoloader will load the classes when needed.

Defining the Target URL

We will scrape listings from this URL on Booking.com:

$url = '<https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2>';

You can modify the parameters as needed.

Setting User Agent

We need to set a valid User Agent string:

$userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36';

Fetching the HTML Page

Use Guzzle to send a GET request and get the response:

$client = new Client(['headers' => ['User-Agent' => $userAgent]]);

$response = $client->request('GET', $url);

$html = $response->getBody();

We configure Guzzle with the User Agent header and fetch the page HTML.

Parsing the HTML

Use DomCrawler to parse the HTML:

$crawler = new Crawler($html);

This creates a Crawler instance with the document structure.

Extracting Property Cards

The property cards have a data-testid attribute of property-card:

$cards = $crawler->filter('div[data-testid="property-card"]');

This extracts all divs with that attribute into a Crawler collection.

Looping Through Cards

Loop through the cards:

foreach ($cards as $card) {

  // Extract data from $card

}

Inside the loop we can extract information from each $card node.

Extracting Property Name

The title is in a h3 element:

$title = $card->filter('h3')->text();

Get the h3 element from card and extract its text.

Extracting Location

The location is in a span:

$location = $card->filter('span[data-testid="address"]')->text();

Filter by the data-testid attribute to find the span.

Extracting Rating

Get the aria-label attribute of the star rating div:

$rating = $card->filter('div.e4755bbd60')->attr('aria-label');

Filter by the CSS class name.

Extracting Review Count

Get text of the review count div:

$reviewCount = $card->filter('div.abf093bdfe')->text();

Again filter by class name.

Extracting Description

Get the description div text:

$description = $card->filter('div.d7449d770c')->text();

Printing the Data

Print out the extracted information:

echo "Name: $title\\n";
echo "Location: $location\\n";
echo "Rating: $rating\\n";
echo "Review Count: $reviewCount\\n";
echo "Description: $description\\n\\n";

This prints the key details for each property listing card.

You can also store the data in an array instead of printing.

Full Script

Here is the full scraping script:

<?php

require __DIR__ . '/vendor/autoload.php';

use GuzzleHttp\\Client;
use Symfony\\Component\\DomCrawler\\Crawler;

$url = '<https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2>';

$userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36';

$client = new Client(['headers' => ['User-Agent' => $userAgent]]);

$response = $client->request('GET', $url);

$html = $response->getBody();

$crawler = new Crawler($html);

$cards = $crawler->filter('div[data-testid="property-card"]');

foreach ($cards as $card) {

  $title = $card->filter('h3')->text();

  $location = $card->filter('span[data-testid="address"]')->text();

  $rating = $card->filter('div.e4755bbd60')->attr('aria-label');

  $reviewCount = $card->filter('div.abf093bdfe')->text();

  $description = $card->filter('div.d7449d770c')->text();

  echo "Name: $title\\n";
  echo "Location: $location\\n";
  echo "Rating: $rating\\n";
  echo "Review Count: $reviewCount\\n";
  echo "Description: $description\\n\\n";

}

This script scrapes and prints key details from Booking.com property listings using PHP and common libraries like Guzzle and DomCrawler. The same technique can be applied to any site.

While these examples are great for learning, scraping production-level sites can pose challenges like CAPTCHAs, IP blocks, and bot detection. Rotating proxies and automated CAPTCHA solving can help.

Proxies API offers a simple API for rendering pages with built-in proxy rotation, CAPTCHA solving, and evasion of IP blocks. You can fetch rendered pages in any language without configuring browsers or proxies yourself.

This allows scraping at scale without headaches of IP blocks. Proxies API has a free tier to get started. Check out the API and sign up for an API key to supercharge your web scraping.

With the power of Proxies API combined with Python libraries like Beautiful Soup, you can scrape data at scale without getting blocked.

Scraping Booking.com Property Listings with PHP in 2023

Prerequisites

Installing Dependencies

Including Dependencies

Defining the Target URL

Setting User Agent

Fetching the HTML Page

Parsing the HTML

Extracting Property Cards

Looping Through Cards

Extracting Property Name

Extracting Location

Extracting Rating

Extracting Review Count

Extracting Description

Printing the Data

Full Script

Browse by language:

The easiest way to do Web Scraping

Scraping Booking.com Property Listings with PHP in 2023

Prerequisites

Installing Dependencies

Including Dependencies

Defining the Target URL

Setting User Agent

Fetching the HTML Page

Parsing the HTML

Extracting Property Cards

Looping Through Cards

Extracting Property Name

Extracting Location

Extracting Rating

Extracting Review Count

Extracting Description

Printing the Data

Full Script

The easiest way to do Web Scraping

Don't leave just yet!