Scraping Real Estate Listings From Realtor in Java

In this post, we'll walk through code that scrapes real estate listing data from Realtor.com using a Java library called Jsoup.

Why Scrape Realtor.com?

Realtor.com contains rich listing information for properties across the United States. By scraping this data, we can analyze real estate trends programmatically or build applications using large-scale housing data.

This is the listings page we are talking about…

Importing Jsoup

We first import the Jsoup Java library that enables web scraping:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

Jsoup handles connecting to web pages, parsing HTML, finding DOM elements, extracting data - everything needed for web scraping.

We also import Java IO capabilities:

import java.io.IOException;

With the imports set up, let's look at the main logic.

Connecting to the Webpage

We define the Realtor URL we want to scrape:

String url = "<https://www.realtor.com/realestateandhomes-search/San-Francisco_CA>";

This URL searches for San Francisco listings on Realtor.com.

Next, we use Jsoup to send a GET request to this URL:

Document doc = Jsoup.connect(url)
   .userAgent("Mozilla/5.0...")
   .get();

Jsoup.connect(url) initializes a connection to send to our defined URL.

.userAgent() sets the user agent header to mimic a real browser request.

.get() sends the request and fetches the HTML content.

The returned Document contains the parsed DOM structure of the Realtor webpage, ready for data extraction.

Extracting Listing Data

Inspecting the element

When we inspect element in Chrome we can see that each of the listing blocks is wrapped in a div with a class value as shown below…

With the base DOM parsed, we can now query elements and extract information.

Realtor.com loads listings dynamically via JavaScript. To locate the raw listing blocks, we use this selector:

Elements listingBlocks = doc.select("div.BasePropertyCard_propertyCardWrap__J0xUj");

This fetches all

elements with a class name matching BasePropertyCard_propertyCardWrap__J0xUj, which contain listing data.

We loop through each listing:

for (Element listingBlock : listingBlocks) {

  // Extract listing data...

}

And inside this loop, we extract various fields using additional selectors:

// Broker information
Element brokerInfo = listingBlock.selectFirst("div.BrokerTitle_brokerTitle__ZkbBW");

// Status
String status = listingBlock.selectFirst("div.message").text().trim();

// Price
String price = listingBlock.selectFirst("div.card-price").text().trim();

// Beds
String beds = listingBlock.select("li[data-testid=property-meta-beds]").text().trim();

// Baths
String baths = listingBlock.select("li[data-testid=property-meta-baths]").text().trim();

// Address
String address = listingBlock.selectFirst("div.card-address").text().trim();

Let's analyze the beds selector:

listingBlock.select("li[data-testid=property-meta-beds]")

listingBlock contains a single listing's DOM subtree.

select() queries this subtree to find elements matching the CSS selector.

li[data-testid=property-meta-beds] looks for

tags having a data-testid attribute value equal to property-meta-beds.

Calling .text() on the matched element returns its inner text, which we trim() of whitespace.

The other selectors work the same way to extract additional fields.

Finally, we print the output:

System.out.println("Beds: " + beds);
System.out.println("Price: " + price);
// etc...

The full listing data is now programmatically extracted from Realtor.com using Jsoup and some knowledge of CSS selectors. The possibilities are endless for how these real estate datasets could be utilized!

Full Code

Here is the complete runnable code to scrape Realtor listings with Jsoup in Java:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;

public class RealtorScraper {
    public static void main(String[] args) {
        // Define the URL of the Realtor.com search page
        String url = "https://www.realtor.com/realestateandhomes-search/San-Francisco_CA";

        try {
            // Send a GET request to the URL
            Document doc = Jsoup.connect(url)
                    .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36")
                    .get();

            // Find all the listing blocks using the provided class name
            Elements listingBlocks = doc.select("div.BasePropertyCard_propertyCardWrap__J0xUj");

            // Loop through each listing block and extract information
            for (Element listingBlock : listingBlocks) {
                // Extract the broker information
                Element brokerInfo = listingBlock.selectFirst("div.BrokerTitle_brokerTitle__ZkbBW");
                String brokerName = brokerInfo.selectFirst("span.BrokerTitle_titleText__20u1P").text().trim();

                // Extract the status (e.g., For Sale)
                String status = listingBlock.selectFirst("div.message").text().trim();

                // Extract the price
                String price = listingBlock.selectFirst("div.card-price").text().trim();

                // Extract other details like beds, baths, sqft, and lot size
                String beds = listingBlock.select("li[data-testid=property-meta-beds]").text().trim();
                String baths = listingBlock.select("li[data-testid=property-meta-baths]").text().trim();
                String sqft = listingBlock.select("li[data-testid=property-meta-sqft]").text().trim();
                String lotSize = listingBlock.select("li[data-testid=property-meta-lot-size]").text().trim();

                // Extract the address
                String address = listingBlock.selectFirst("div.card-address").text().trim();

                // Print the extracted information
                System.out.println("Broker: " + brokerName);
                System.out.println("Status: " + status);
                System.out.println("Price: " + price);
                System.out.println("Beds: " + beds);
                System.out.println("Baths: " + baths);
                System.out.println("Sqft: " + sqft);
                System.out.println("Lot Size: " + lotSize);
                System.out.println("Address: " + address);
                System.out.println("-".repeat(50));  // Separating listings
            }

        } catch (IOException e) {
            System.err.println("Failed to retrieve the page.");
            e.printStackTrace();
        }
    }
}

Scraping Real Estate Listings From Realtor in Java

Why Scrape Realtor.com?

Importing Jsoup

Connecting to the Webpage

Extracting Listing Data

Full Code

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

Scraping Real Estate Listings From Realtor in Java

Why Scrape Realtor.com?

Importing Jsoup

Connecting to the Webpage

Extracting Listing Data

Full Code

The easiest way to do Web Scraping

Don't leave just yet!