Finding Headers in BeautifulSoup

Oct 6, 2023 ยท 2 min read

When parsing HTML and XML documents, accessing and working with headers is a common task. In BeautifulSoup, headers like to tags have some particular behaviors and access patterns it's useful to understand.

Finding Headers

To find header tags, you can use:

soup.find('h1')
soup.find_all('h2')
soup.select('h3')

This will match the first h1, all h2 tags, or all h3 tags respectively.

Contents Access

The main contents of a header tag can be accessed through the .string attribute:

h1 = soup.find('h1')
title_text = h1.string

The .text attribute also works but handles nested tags differently.

Stripping Whitespace

Header tags often contain extra whitespace around them. You can strip whitespace with:

title = h1.get_text(strip=True)

Or for multiline headers:

title = h1.text.strip()

Heading Levels

To get the heading level (e.g. 1 for

), use:

level = h1.name[1]

This extracts the number from the tag name.

Next Sibling

A common pattern is finding a header and then extracting the next sibling element:

h1 = soup.find('h1')
content = h1.next_sibling

This gets the element immediately following the header.

Conclusion

In summary, remember headers can be accessed like any other tag but have some useful attributes and patterns like:

  • Using .string for contents
  • Stripping whitespace
  • Extracting the heading level
  • Grabbing next siblings
  • Mastering these header nuances will help you better parse and process documents in BeautifulSoup.

    Browse by tags:

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: