Downloading Images from a Website with Objective-C and Ono

Oct 15, 2023 · 4 min read

In this article, we will learn how to use Objective-C and the AFNetworking and Ono libraries to download all the images from a Wikipedia page.

—-

Overview

The goal is to extract the names, breed groups, local names, and image URLs for all dog breeds listed on this Wikipedia page. We will store the image URLs, download the images and save them to a local folder.

Here are the key steps we will cover:

  1. Import required frameworks
  2. Send HTTP request to fetch the Wikipedia page
  3. Parse the page HTML using Ono
  4. Find the table with dog breed data using a CSS selector
  5. Iterate through the table rows
  6. Extract data from each column
  7. Download images and save locally
  8. Print/process extracted data

Let's go through each of these steps in detail.

Imports

We need these frameworks:

#import <AFNetworking/AFNetworking.h>
#import "Ono.h"
  • AFNetworking - Sends HTTP requests
  • Ono - Parses HTML/XML
  • Send HTTP Request

    To download the web page:

    NSURL *url = [NSURL URLWithString:@"<https://commons.wikimedia.org/wiki/List_of_dog_breeds>"];
    
    AFHTTPSessionManager *manager = [AFHTTPSessionManager manager];
    [manager GET:url.absoluteString headers:@{@"User-Agent": @"MyApp"}
      success:^(NSURLSessionDataTask *task, id responseObject) {
    
        // Parse HTML
    
    } failure:^(NSURLSessionDataTask *task, NSError *error) {
    
    }];
    

    We make a GET request and provide a custom user-agent header.

    Parse HTML

    To parse the HTML:

    Ono *ono = [Ono onoWithHTML:responseObject];
    

    The ono object allows querying the document.

    Find Breed Table

    We use a CSS selector to find the table element:

    OnoNode *table = [ono find:@"table.wikitable.sortable"];
    

    This selects the

    with the required CSS classes.

    Iterate Through Rows

    We loop through the rows:

    for (OnoNode *row in [table find:@"tr"]) {
    
      // Extract data
    
    }
    

    We iterate through

    elements within the table.

    Extract Column Data

    Inside the loop, we get the column data:

    NSArray *cells = [row find:@"td, th"];
    
    NSString *name = [[cells objectAtIndex:0] find:@"a"].text;
    NSString *group = [cells objectAtIndex:1].text;
    
    OnoNode *localNameNode = [cells objectAtIndex:2];
    NSString *localName = localNameNode.find:@"span"].text ?: @"";
    
    OnoNode *imgNode = [cells objectAtIndex:3];
    NSString *photograph = [imgNode getAttribute:@"src"];
    

    We use text for text and getAttribute for attributes.

    Download Images

    To download and save images:

    if (photograph) {
    
      NSData *imageData = [NSData dataWithContentsOfURL:[NSURL URLWithString:photograph]];
    
      [imageData writeToFile:[NSString stringWithFormat:@"dog_images/%@.jpg", name] atomically:YES];
    
    }
    

    We download the image data and write it to a file.

    Store Extracted Data

    We store the extracted data:

    [names addObject:name];
    [groups addObject:group];
    [localNames addObject:localName];
    [photographs addObject:photograph];
    

    The arrays can then be processed as needed.

    And that's it! Here is the full code:

    // Imports
    #import <AFNetworking/AFNetworking.h>
    #import "Ono.h"
    
    // Arrays to store data
    NSArray *names = [NSArray array];
    NSArray *groups = [NSArray array];
    NSArray *localNames = [NSArray array];
    NSArray *photographs = [NSArray array];
    
    // Send HTTP request
    NSURL *url = [NSURL URLWithString:@"<https://commons.wikimedia.org/wiki/List_of_dog_breeds>"];
    
    AFHTTPSessionManager *manager = [AFHTTPSessionManager manager];
    [manager GET:url.absoluteString headers:@{@"User-Agent": @"MyApp"}
      success:^(NSURLSessionDataTask *task, id responseObject) {
    
        // Parse HTML
        Ono *ono = [Ono onoWithHTML:responseObject];
    
        // Find table
        OnoNode *table = [ono find:@"table.wikitable.sortable"];
    
        // Iterate rows
        for (OnoNode *row in [table find:@"tr"]) {
    
          // Get cells
          NSArray *cells = [row find:@"td, th"];
    
          // Extract data
          NSString *name = [[cells objectAtIndex:0] find:@"a"].text;
          NSString *group = [cells objectAtIndex:1].text;
    
          OnoNode *localNameNode = [cells objectAtIndex:2];
          NSString *localName = localNameNode.find:@"span"].text ?: @"";
    
          OnoNode *imgNode = [cells objectAtIndex:3];
          NSString *photograph = [imgNode getAttribute:@"src"];
    
          // Download image
          if (photograph) {
    
            NSData *imageData = [NSData dataWithContentsOfURL:[NSURL URLWithString:photograph]];
    
            [imageData writeToFile:[NSString stringWithFormat:@"dog_images/%@.jpg", name] atomically:YES];
    
          }
    
          // Store data
          [names addObject:name];
          [groups addObject:group];
          [localNames addObject:localName];
          [photographs addObject:photograph];
    
        }
    
      } failure:^(NSURLSessionDataTask *task, NSError *error) {
    
      }];
    

    This provides a complete Objective-C solution using AFNetworking and Ono to scrape data and images from HTML tables. The same approach can apply to many websites.

    While these examples are great for learning, scraping production-level sites can pose challenges like CAPTCHAs, IP blocks, and bot detection. Rotating proxies and automated CAPTCHA solving can help.

    Proxies API offers a simple API for rendering pages with built-in proxy rotation, CAPTCHA solving, and evasion of IP blocks. You can fetch rendered pages in any language without configuring browsers or proxies yourself.

    This allows scraping at scale without headaches of IP blocks. Proxies API has a free tier to get started. Check out the API and sign up for an API key to supercharge your web scraping.

    With the power of Proxies API combined with Python libraries like Beautiful Soup, you can scrape data at scale without getting blocked.

    Browse by tags:

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: