Formatting HTML with BeautifulSoup's prettify()

Oct 6, 2023 ยท 2 min read

When parsing HTML using BeautifulSoup in Python, the prettify() method is handy for formatting and printing the HTML in a more readable way.

What prettify() Does

The prettify() method takes a BeautifulSoup object and returns a string containing the parsed HTML formatted with proper whitespace and indentation.

For example:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.prettify())

Instead of printing a long single line of HTML text, it will print something like:

<html>
 <head>
  <title>
   Page Title
  </title>
 </head>
 <body>
  <h1>
   Main Heading
  </h1>
  <p>
   Lorem ipsum dolor sit amet.
  </p>
 </body>
</html>

Making the HTML much easier to read!

Specifying Encoder

By default prettify() uses UTF-8 encoding. You can change this using the encoder argument:

print(soup.prettify(encoder="latin-1"))

Output to a File

To store the formatted HTML in a file, open a file for writing bytes and pass prettify() contents to it:

with open("formatted.html", "wb") as file:
    file.write(soup.prettify(encoder="utf-8"))

This persists the reformatted HTML to disk.

Limitations

One caveat is that prettify() won't fix or restructure poorly formatted HTML. It mainly just spaces out elements and attributes cleanly.

Overall, prettify() is invaluable for debugging and visually inspecting HTML during web scraping with BeautifulSoup.

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


Try ProxiesAPI for free

curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

<!doctype html>
<html>
<head>
    <title>Example Domain</title>
    <meta charset="utf-8" />
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
...

X

Don't leave just yet!

Enter your email below to claim your free API key: