Web scraper for airline data

Overview

The Aero scraper is a Playwright-based tool that extracts airline data from FlightRadar24. It collects airline names, IATA/ICAO codes, and fleet information to populate the database.

Use at your own risk. Web scraping may violate terms of service. Ensure you have permission to scrape data from the target website.

Architecture

The scraper uses Playwright with Chromium to navigate FlightRadar24 and extract structured airline data:

┌─────────────────────────────────────────┐
│       Playwright (Chromium)             │
│  Headless browser automation            │
└─────────────┬───────────────────────────┘
              │
              │ Navigate to page
              ▼
┌─────────────────────────────────────────┐
│    FlightRadar24 Airlines Page          │
│  https://flightradar24.com/data/airlines│
└─────────────┬───────────────────────────┘
              │
              │ Extract table data
              ▼
┌─────────────────────────────────────────┐
│        data.json output                 │
│  Structured airline information         │
└─────────────────────────────────────────┘

Location

The scraper is located in the scraper/ directory:

scraper/
├── index.ts         # Entry point
├── scraper.ts       # Main scraper logic
├── package.json     # Dependencies
├── tsconfig.json    # TypeScript config
└── bun.lockb        # Lock file

Prerequisites

The scraper requires:

Bun runtime (v1.1.42 or higher)
Playwright (automatically installed with dependencies)

Installation

Navigate to scraper directory

cd scraper

Install dependencies

bun install

This installs:

Playwright (v1.49.1+)
TypeScript types

Install Playwright browsers

bunx playwright install chromium

Running the scraper

Execute the scraper with a single command:

bun run index.ts

The scraper will:

Launch a Chromium browser
Navigate to FlightRadar24 airlines page
Wait for the table to load
Extract airline data from table rows
Save results to data.json
Close the browser

The scraper runs in non-headless mode by default (headless: false), so you can see the browser in action. This is useful for debugging.

Implementation details

Entry point (index.ts)

import { scrapeAirlines } from "./scraper";

await scrapeAirlines().catch(console.error);

Core scraper logic (scraper.ts)

import { chromium, firefox } from "playwright";

export async function scrapeAirlines() {
  // Launch browser
  const browser = await chromium.launch({
    headless: false, // Set to false for debugging
    timeout: 0,
  });

  try {
    // Create new page
    const page = await browser.newPage();

    // Navigate to airlines page
    await page.goto("https://www.flightradar24.com/data/airlines", {
      waitUntil: "domcontentloaded",
    });

    // Wait for the table to load
    await page.waitForSelector("table");

    // Extract airline data
    const airlines = await page.evaluate(() => {
      const rows = document.querySelectorAll("table tr");
      return Array.from(rows)
        .map((row) => {
          const cells = row.querySelectorAll("td");
          if (cells.length < 3) return null;

          // Extract airline info
          const nameLink = cells[1].querySelector("a");
          return {
            name: cells[2]?.textContent?.trim() || "",
            code: cells[2]?.textContent?.trim() || "",
            fleet: cells[3]?.textContent?.trim() || "",
          };
        })
        .filter((airline) => airline !== null && airline.fleet.length > 3);
    });

    // Output results
    console.log("Found airlines:", airlines.length);
    Bun.write("data.json", JSON.stringify(airlines, null, 4));

    return airlines;
  } catch (error) {
    console.error("Error scraping airlines:", error);
    throw error;
  } finally {
    // Always close browser
    await browser.close();
  }
}

Configuration options

Browser settings

You can customize the browser launch options:

const browser = await chromium.launch({
  headless: true,        // Run without UI
  timeout: 30000,        // Set timeout in ms
  slowMo: 100,          // Slow down operations for debugging
});

Alternative browsers

The scraper can use Firefox or WebKit:

import { firefox, webkit } from "playwright";

// Use Firefox
const browser = await firefox.launch({ headless: false });

// Use WebKit
const browser = await webkit.launch({ headless: false });

Output format

The scraper outputs data to data.json in the following format:

[
  {
    "name": "American Airlines",
    "code": "American Airlines",
    "fleet": "AA/AAL"
  },
  {
    "name": "Delta Air Lines",
    "code": "Delta Air Lines",
    "fleet": "DL/DAL"
  },
  {
    "name": "United Airlines",
    "code": "United Airlines",
    "fleet": "UA/UAL"
  }
]

Using scraped data

After scraping, you can use the data management scripts to import the data into your database:

cd ../scripts
bun run json.ts

This will parse data.json and insert the airlines into the database.

Customizing the scraper

Scrape additional fields

Modify the page.evaluate() function to extract more data:

const airlines = await page.evaluate(() => {
  const rows = document.querySelectorAll("table tr");
  return Array.from(rows)
    .map((row) => {
      const cells = row.querySelectorAll("td");
      if (cells.length < 3) return null;

      return {
        name: cells[2]?.textContent?.trim() || "",
        code: cells[2]?.textContent?.trim() || "",
        fleet: cells[3]?.textContent?.trim() || "",
        country: cells[4]?.textContent?.trim() || "", // Add country
        founded: cells[5]?.textContent?.trim() || "", // Add founded year
      };
    })
    .filter((airline) => airline !== null);
});

Change target URL

To scrape different pages:

await page.goto("https://www.flightradar24.com/data/airports", {
  waitUntil: "domcontentloaded",
});

Add error handling

Enhance error handling with retries:

const maxRetries = 3;
let attempt = 0;

while (attempt < maxRetries) {
  try {
    await page.goto(url, { waitUntil: "domcontentloaded" });
    break;
  } catch (error) {
    attempt++;
    console.log(`Retry ${attempt}/${maxRetries}`);
    if (attempt === maxRetries) throw error;
  }
}

Troubleshooting

Playwright not installed

If you see errors about missing browsers:

bunx playwright install

Timeout errors

Increase the timeout or wait for specific elements:

await page.goto(url, {
  waitUntil: "networkidle",
  timeout: 60000, // 60 seconds
});

Selector not found

Inspect the page to find the correct selectors:

// Wait for specific selector
await page.waitForSelector(".airline-table", { timeout: 10000 });

// Take screenshot for debugging
await page.screenshot({ path: "debug.png" });

Rate limiting

Add delays between requests to avoid rate limiting:

await page.waitForTimeout(2000); // Wait 2 seconds

Best practices

Respect robots.txt - Check the website’s robots.txt file
Add delays - Don’t overwhelm the target server
Handle errors gracefully - Implement proper error handling and retries
Use headless mode in production - Set headless: true for automated runs
Cache results - Store scraped data to avoid repeated requests
Monitor changes - Website structure may change; update selectors accordingly
Legal compliance - Ensure you have permission to scrape the data

Alternative data sources

Instead of web scraping, consider using official APIs:

OpenSky Network - Free ADS-B flight tracking data
Aviation Stack - Commercial flight API (used in Aero)
AeroDataBox - Comprehensive aviation data API
FlightAware - Official FlightAware API (used in Aero)

These APIs provide reliable, structured data without the fragility of web scraping.

Getting Started

Mobile App

Wear OS

Backend

Development

Web scraper for airline data

Overview

Architecture

Location

Prerequisites

Installation

Running the scraper

Implementation details

Entry point (index.ts)

Core scraper logic (scraper.ts)

Configuration options

Browser settings

Alternative browsers

Output format

Using scraped data

Customizing the scraper

Scrape additional fields

Change target URL

Add error handling

Troubleshooting

Playwright not installed

Timeout errors

Selector not found

Rate limiting

Best practices

Alternative data sources

Build docs developers (and LLMs) love

Getting Started

Mobile App

Wear OS

Backend

Development

​Overview

​Architecture

​Location

​Prerequisites

​Installation

​Running the scraper

​Implementation details

​Entry point (index.ts)

​Core scraper logic (scraper.ts)

​Configuration options

​Browser settings

​Alternative browsers

​Output format

​Using scraped data

​Customizing the scraper

​Scrape additional fields

​Change target URL

​Add error handling

​Troubleshooting

​Playwright not installed

​Timeout errors

​Selector not found

​Rate limiting

​Best practices

​Alternative data sources

Build docs developers (and LLMs) love

Overview

Architecture

Location

Prerequisites

Installation

Running the scraper

Implementation details

Entry point (index.ts)

Core scraper logic (scraper.ts)

Configuration options

Browser settings

Alternative browsers

Output format

Using scraped data

Customizing the scraper

Scrape additional fields

Change target URL

Add error handling

Troubleshooting

Playwright not installed

Timeout errors

Selector not found

Rate limiting

Best practices

Alternative data sources