Scrape Listings

The listing scrape phase visits individual property URLs and extracts structured data. This is the second step in the scraping workflow.

Overview

Listing spiders read URLs from CSV files (typically generated by URL collection spiders) and extract detailed property information from each listing page. Data is written incrementally to CSV files in outputs/data/.

CSV Path Requirements

Both spiders require a csv_path argument pointing to a CSV file containing a url column.

csv_path

string

required

Path to CSV file containing listing URLs. Can be absolute or relative to project root.Expected format:

Must have a url column
Optional fetch_date column (Jiji only)

Default paths:

Jiji: outputs/urls/jiji_urls.csv
Meqasa: outputs/urls/meqasa_urls.csv

If the CSV file doesn’t exist or contains no URLs, the spider will log an error and exit without scraping.

Jiji Listing Spider

Extracts detailed property information from Jiji listing pages.

Output

File: outputs/data/jiji_data.csv Fields:

url - Listing URL
fetch_date - Date when listing was scraped
title - Property title
location - Property location/region
house_type - Type of property (apartment, house, etc.)
bedrooms - Number of bedrooms
bathrooms - Number of bathrooms
price - Rental price
properties - JSON object with additional attributes
amenities - JSON array of available amenities
description - Property description text

Examples

Scrape from default URL CSV:

scrapy crawl jiji_listings

Scrape from custom CSV:

scrapy crawl jiji_listings -a csv_path=outputs/urls/jiji_urls.csv

Scrape from absolute path:

scrapy crawl jiji_listings -a csv_path=/path/to/custom_urls.csv

Automatic Cleaning

After Jiji listing scrapes complete (when run through main.py), the project automatically runs a cleaning script that processes jiji_data.csv and outputs a cleaned version. Cleaned output: outputs/data/raw.csv This cleaning step:

Standardizes price formats
Normalizes location names
Parses structured fields from JSON
Removes duplicates and invalid entries

Automatic cleaning only runs when using the interactive main.py runner, not when running spiders directly via scrapy crawl.

Meqasa Listing Spider

Extracts property details from Meqasa listing pages.

Output

File: outputs/data/meqasa_data.csv Fields:

url - Listing URL
Title - Property title
Price - Rental price
Rate - Price period (per month, per year, etc.)
Description - Property description
fetch_date - Date when listing was scraped
Categories - Property categories
Lease options - Available lease options
Bedrooms - Number of bedrooms
Bathrooms - Number of bathrooms
Garage - Garage/parking information
Furnished - Furnished status
Amenities - Available amenities
Address - Property address
Reference - Listing reference ID
details - JSON object with all extracted table data

Meqasa extracts data from dynamic table structures. The details field contains all key-value pairs found in listing detail tables, allowing for flexible schema evolution.

Examples

Scrape from default URL CSV:

scrapy crawl meqasa_listings

Scrape from custom CSV:

scrapy crawl meqasa_listings -a csv_path=outputs/urls/meqasa_urls.csv

Scrape from absolute path:

scrapy crawl meqasa_listings -a csv_path=/path/to/custom_urls.csv

Progress Tracking

Both spiders display real-time progress with:

Current item number
Total items to scrape
Failed requests count
Estimated completion percentage

Progress is updated after each successfully scraped listing.

Get Started

Core Concepts

Usage

Commands

Configuration

Reference

Overview

CSV Path Requirements

Jiji Listing Spider

Output

Examples

Automatic Cleaning

Meqasa Listing Spider

Output

Examples

Progress Tracking

Build docs developers (and LLMs) love

Get Started

Core Concepts

Usage

Commands

Configuration

Reference

​Overview

​CSV Path Requirements

​Jiji Listing Spider

​Output

​Examples

​Automatic Cleaning

​Meqasa Listing Spider

​Output

​Examples

​Progress Tracking

Build docs developers (and LLMs) love

Overview

CSV Path Requirements

Jiji Listing Spider

Output

Examples

Automatic Cleaning

Meqasa Listing Spider

Output

Examples

Progress Tracking