Scraping Real Estate Listings From Realtor in Java

Jan 9, 2024 · 5 min read

In this post, we'll walk through code that scrapes real estate listing data from Realtor.com using a Java library called Jsoup.

Why Scrape Realtor.com?

Realtor.com contains rich listing information for properties across the United States. By scraping this data, we can analyze real estate trends programmatically or build applications using large-scale housing data.

This is the listings page we are talking about…

Importing Jsoup

We first import the Jsoup Java library that enables web scraping:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

Jsoup handles connecting to web pages, parsing HTML, finding DOM elements, extracting data - everything needed for web scraping.

We also import Java IO capabilities:

import java.io.IOException;

With the imports set up, let's look at the main logic.

Connecting to the Webpage

We define the Realtor URL we want to scrape:

String url = "<https://www.realtor.com/realestateandhomes-search/San-Francisco_CA>";

This URL searches for San Francisco listings on Realtor.com.

Next, we use Jsoup to send a GET request to this URL:

Document doc = Jsoup.connect(url)
   .userAgent("Mozilla/5.0...")
   .get();
  • Jsoup.connect(url) initializes a connection to send to our defined URL.
  • .userAgent() sets the user agent header to mimic a real browser request.
  • .get() sends the request and fetches the HTML content.
  • The returned Document contains the parsed DOM structure of the Realtor webpage, ready for data extraction.

    Extracting Listing Data

    Inspecting the element

    When we inspect element in Chrome we can see that each of the listing blocks is wrapped in a div with a class value as shown below…

    With the base DOM parsed, we can now query elements and extract information.

    Realtor.com loads listings dynamically via JavaScript. To locate the raw listing blocks, we use this selector:

    Elements listingBlocks = doc.select("div.BasePropertyCard_propertyCardWrap__J0xUj");
    

    This fetches all

    elements with a class name matching BasePropertyCard_propertyCardWrap__J0xUj, which contain listing data.

    We loop through each listing:

    for (Element listingBlock : listingBlocks) {
    
      // Extract listing data...
    
    }
    

    And inside this loop, we extract various fields using additional selectors:

    // Broker information
    Element brokerInfo = listingBlock.selectFirst("div.BrokerTitle_brokerTitle__ZkbBW");
    
    // Status
    String status = listingBlock.selectFirst("div.message").text().trim();
    
    // Price
    String price = listingBlock.selectFirst("div.card-price").text().trim();
    
    // Beds
    String beds = listingBlock.select("li[data-testid=property-meta-beds]").text().trim();
    
    // Baths
    String baths = listingBlock.select("li[data-testid=property-meta-baths]").text().trim();
    
    // Address
    String address = listingBlock.selectFirst("div.card-address").text().trim();
    

    Let's analyze the beds selector:

    listingBlock.select("li[data-testid=property-meta-beds]")
    
  • listingBlock contains a single listing's DOM subtree.
  • select() queries this subtree to find elements matching the CSS selector.
  • li[data-testid=property-meta-beds] looks for
  • tags having a data-testid attribute value equal to property-meta-beds.
  • Calling .text() on the matched element returns its inner text, which we trim() of whitespace.
  • The other selectors work the same way to extract additional fields.

    Finally, we print the output:

    System.out.println("Beds: " + beds);
    System.out.println("Price: " + price);
    // etc...
    

    The full listing data is now programmatically extracted from Realtor.com using Jsoup and some knowledge of CSS selectors. The possibilities are endless for how these real estate datasets could be utilized!

    Full Code

    Here is the complete runnable code to scrape Realtor listings with Jsoup in Java:

    import org.jsoup.Jsoup;
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    import org.jsoup.select.Elements;
    
    import java.io.IOException;
    
    public class RealtorScraper {
        public static void main(String[] args) {
            // Define the URL of the Realtor.com search page
            String url = "https://www.realtor.com/realestateandhomes-search/San-Francisco_CA";
    
            try {
                // Send a GET request to the URL
                Document doc = Jsoup.connect(url)
                        .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36")
                        .get();
    
                // Find all the listing blocks using the provided class name
                Elements listingBlocks = doc.select("div.BasePropertyCard_propertyCardWrap__J0xUj");
    
                // Loop through each listing block and extract information
                for (Element listingBlock : listingBlocks) {
                    // Extract the broker information
                    Element brokerInfo = listingBlock.selectFirst("div.BrokerTitle_brokerTitle__ZkbBW");
                    String brokerName = brokerInfo.selectFirst("span.BrokerTitle_titleText__20u1P").text().trim();
    
                    // Extract the status (e.g., For Sale)
                    String status = listingBlock.selectFirst("div.message").text().trim();
    
                    // Extract the price
                    String price = listingBlock.selectFirst("div.card-price").text().trim();
    
                    // Extract other details like beds, baths, sqft, and lot size
                    String beds = listingBlock.select("li[data-testid=property-meta-beds]").text().trim();
                    String baths = listingBlock.select("li[data-testid=property-meta-baths]").text().trim();
                    String sqft = listingBlock.select("li[data-testid=property-meta-sqft]").text().trim();
                    String lotSize = listingBlock.select("li[data-testid=property-meta-lot-size]").text().trim();
    
                    // Extract the address
                    String address = listingBlock.selectFirst("div.card-address").text().trim();
    
                    // Print the extracted information
                    System.out.println("Broker: " + brokerName);
                    System.out.println("Status: " + status);
                    System.out.println("Price: " + price);
                    System.out.println("Beds: " + beds);
                    System.out.println("Baths: " + baths);
                    System.out.println("Sqft: " + sqft);
                    System.out.println("Lot Size: " + lotSize);
                    System.out.println("Address: " + address);
                    System.out.println("-".repeat(50));  // Separating listings
                }
    
            } catch (IOException e) {
                System.err.println("Failed to retrieve the page.");
                e.printStackTrace();
            }
        }
    }
    

    Browse by tags:

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: