Overview
The Aero scraper is a Playwright-based tool that extracts airline data from FlightRadar24. It collects airline names, IATA/ICAO codes, and fleet information to populate the database.
Use at your own risk. Web scraping may violate terms of service. Ensure you have permission to scrape data from the target website.
Architecture
The scraper uses Playwright with Chromium to navigate FlightRadar24 and extract structured airline data:
┌─────────────────────────────────────────┐
│ Playwright (Chromium) │
│ Headless browser automation │
└─────────────┬───────────────────────────┘
│
│ Navigate to page
▼
┌─────────────────────────────────────────┐
│ FlightRadar24 Airlines Page │
│ https://flightradar24.com/data/airlines│
└─────────────┬───────────────────────────┘
│
│ Extract table data
▼
┌─────────────────────────────────────────┐
│ data.json output │
│ Structured airline information │
└─────────────────────────────────────────┘
Location
The scraper is located in the scraper/ directory:
scraper/
├── index.ts # Entry point
├── scraper.ts # Main scraper logic
├── package.json # Dependencies
├── tsconfig.json # TypeScript config
└── bun.lockb # Lock file
Prerequisites
The scraper requires:
- Bun runtime (v1.1.42 or higher)
- Playwright (automatically installed with dependencies)
Installation
Navigate to scraper directory
Install dependencies
This installs:
- Playwright (v1.49.1+)
- TypeScript types
Install Playwright browsers
bunx playwright install chromium
Running the scraper
Execute the scraper with a single command:
The scraper will:
- Launch a Chromium browser
- Navigate to FlightRadar24 airlines page
- Wait for the table to load
- Extract airline data from table rows
- Save results to
data.json
- Close the browser
The scraper runs in non-headless mode by default (headless: false), so you can see the browser in action. This is useful for debugging.
Implementation details
Entry point (index.ts)
import { scrapeAirlines } from "./scraper";
await scrapeAirlines().catch(console.error);
Core scraper logic (scraper.ts)
import { chromium, firefox } from "playwright";
export async function scrapeAirlines() {
// Launch browser
const browser = await chromium.launch({
headless: false, // Set to false for debugging
timeout: 0,
});
try {
// Create new page
const page = await browser.newPage();
// Navigate to airlines page
await page.goto("https://www.flightradar24.com/data/airlines", {
waitUntil: "domcontentloaded",
});
// Wait for the table to load
await page.waitForSelector("table");
// Extract airline data
const airlines = await page.evaluate(() => {
const rows = document.querySelectorAll("table tr");
return Array.from(rows)
.map((row) => {
const cells = row.querySelectorAll("td");
if (cells.length < 3) return null;
// Extract airline info
const nameLink = cells[1].querySelector("a");
return {
name: cells[2]?.textContent?.trim() || "",
code: cells[2]?.textContent?.trim() || "",
fleet: cells[3]?.textContent?.trim() || "",
};
})
.filter((airline) => airline !== null && airline.fleet.length > 3);
});
// Output results
console.log("Found airlines:", airlines.length);
Bun.write("data.json", JSON.stringify(airlines, null, 4));
return airlines;
} catch (error) {
console.error("Error scraping airlines:", error);
throw error;
} finally {
// Always close browser
await browser.close();
}
}
Configuration options
Browser settings
You can customize the browser launch options:
const browser = await chromium.launch({
headless: true, // Run without UI
timeout: 30000, // Set timeout in ms
slowMo: 100, // Slow down operations for debugging
});
Alternative browsers
The scraper can use Firefox or WebKit:
import { firefox, webkit } from "playwright";
// Use Firefox
const browser = await firefox.launch({ headless: false });
// Use WebKit
const browser = await webkit.launch({ headless: false });
The scraper outputs data to data.json in the following format:
[
{
"name": "American Airlines",
"code": "American Airlines",
"fleet": "AA/AAL"
},
{
"name": "Delta Air Lines",
"code": "Delta Air Lines",
"fleet": "DL/DAL"
},
{
"name": "United Airlines",
"code": "United Airlines",
"fleet": "UA/UAL"
}
]
Using scraped data
After scraping, you can use the data management scripts to import the data into your database:
cd ../scripts
bun run json.ts
This will parse data.json and insert the airlines into the database.
Customizing the scraper
Scrape additional fields
Modify the page.evaluate() function to extract more data:
const airlines = await page.evaluate(() => {
const rows = document.querySelectorAll("table tr");
return Array.from(rows)
.map((row) => {
const cells = row.querySelectorAll("td");
if (cells.length < 3) return null;
return {
name: cells[2]?.textContent?.trim() || "",
code: cells[2]?.textContent?.trim() || "",
fleet: cells[3]?.textContent?.trim() || "",
country: cells[4]?.textContent?.trim() || "", // Add country
founded: cells[5]?.textContent?.trim() || "", // Add founded year
};
})
.filter((airline) => airline !== null);
});
Change target URL
To scrape different pages:
await page.goto("https://www.flightradar24.com/data/airports", {
waitUntil: "domcontentloaded",
});
Add error handling
Enhance error handling with retries:
const maxRetries = 3;
let attempt = 0;
while (attempt < maxRetries) {
try {
await page.goto(url, { waitUntil: "domcontentloaded" });
break;
} catch (error) {
attempt++;
console.log(`Retry ${attempt}/${maxRetries}`);
if (attempt === maxRetries) throw error;
}
}
Troubleshooting
Playwright not installed
If you see errors about missing browsers:
Timeout errors
Increase the timeout or wait for specific elements:
await page.goto(url, {
waitUntil: "networkidle",
timeout: 60000, // 60 seconds
});
Selector not found
Inspect the page to find the correct selectors:
// Wait for specific selector
await page.waitForSelector(".airline-table", { timeout: 10000 });
// Take screenshot for debugging
await page.screenshot({ path: "debug.png" });
Rate limiting
Add delays between requests to avoid rate limiting:
await page.waitForTimeout(2000); // Wait 2 seconds
Best practices
- Respect robots.txt - Check the website’s robots.txt file
- Add delays - Don’t overwhelm the target server
- Handle errors gracefully - Implement proper error handling and retries
- Use headless mode in production - Set
headless: true for automated runs
- Cache results - Store scraped data to avoid repeated requests
- Monitor changes - Website structure may change; update selectors accordingly
- Legal compliance - Ensure you have permission to scrape the data
Alternative data sources
Instead of web scraping, consider using official APIs:
- OpenSky Network - Free ADS-B flight tracking data
- Aviation Stack - Commercial flight API (used in Aero)
- AeroDataBox - Comprehensive aviation data API
- FlightAware - Official FlightAware API (used in Aero)
These APIs provide reliable, structured data without the fragility of web scraping.