Quick Start

This guide will walk you through scraping your first rental property listings from Jiji Ghana or Meqasa using the interactive CLI.

Make sure you’ve completed the installation before proceeding.

Two-Phase Workflow

ScrapeAccraProperties uses a two-phase approach:

Collect listing URLs from search/result pages
Scrape listing details by visiting each URL

This separation allows you to validate URLs before scraping and enables efficient resume operations.

Your First Scrape

Launch the interactive CLI

Start the interactive runner:

python main.py

You’ll see the main menu:

╭─────────────────────── main.py ────────────────────────╮
│ Accra Property Scraper                                 │
│ - Interactive multi-spider runner                      │
│ - Listing resume mode queues only missing URLs         │
│ - CSV writes happen item-by-item during crawl          │
│ - Jiji listings are cleaned to outputs/data/raw.csv    │
╰────────────────────────────────────────────────────────╯

Choose action
  1. Collect listing URLs
  2. Scrape listing details
  3. Resume listing scrape (missing URLs only)
  4. Exit
Enter choice [1]:

Collect listing URLs

Select option 1 to collect listing URLs.Choose your source:

Select source
  1. Jiji only
  2. Meqasa only
  3. Both Jiji and Meqasa
Enter choice [3]:

For this quickstart, let’s choose 1 (Jiji only).Configure pagination:

Jiji start page [1]: 1

Jiji URL mode
  1. Auto detect total pages
  2. Fixed number of pages
  3. Convert expected listings to page count
Enter choice [1]: 2

Jiji max pages [5]: 2

This will scrape the first 2 pages of Jiji rental listings.

Start with a small number of pages (2-5) for your first run to test the workflow.

The spider will start crawling and display progress:

2024-03-03 10:15:32 [scrapy.core.engine] INFO: Spider opened
2024-03-03 10:15:33 [scrapy.core.engine] INFO: Crawled 1 pages (at 1 pages/min)
...

URLs are saved to: outputs/urls/jiji_urls.csv

Scrape listing details

Run the CLI again and select option 2 to scrape listing details:

python main.py

Select 2. Scrape listing detailsChoose your source:

Select source
  1. Jiji only
  2. Meqasa only
  3. Both Jiji and Meqasa
Enter choice [3]: 1

Specify the URL CSV path:

Jiji URL CSV [outputs/urls/jiji_urls.csv]:

Press Enter to use the default path. The spider will:

Read URLs from outputs/urls/jiji_urls.csv
Visit each listing page
Extract structured data (title, location, price, bedrooms, amenities, etc.)
Write incrementally to outputs/data/jiji_data.csv

After scraping completes, Jiji listings are automatically cleaned:

[green]Jiji cleaned CSV saved:[/] outputs/data/raw.csv (143 rows)
[bold green]Done.[/]

The cleaning step (producing raw.csv) is currently Jiji-only.

Explore your data

Check the outputs/ directory for your scraped data:

ls -R outputs/

outputs/
├── urls/
│   └── jiji_urls.csv
└── data/
    ├── jiji_data.csv
    └── raw.csv

View the data:

head -n 5 outputs/data/raw.csv

You’ll see rental listings with fields like:

url - Listing URL
title - Property title
location - Area/neighborhood
house_type - Apartment, house, etc.
bedrooms - Number of bedrooms
bathrooms - Number of bathrooms
price - Rental price
amenities - List of amenities
description - Full listing description
fetch_date - When the data was scraped

Resume Scraping

If your scrape is interrupted, you can resume without re-scraping existing listings:

Select resume mode

Run the CLI and choose option 3. Resume listing scrape (missing URLs only)

python main.py

Review the queue summary

The CLI will show which URLs are already scraped vs. pending:

Use default URL/data CSV paths for resume mode? [Y/n]: y

┌─────── Resume Queue Summary ────────┐
│ Source  │ URL Pool │ Already Scraped │ Pending │
├─────────┼──────────┼─────────────────┼─────────┤
│ Jiji    │      156 │             143 │      13 │
└─────────┴──────────┴─────────────────┴─────────┘

Only the 13 pending URLs will be scraped.

Scrape completes

The spider queues only missing URLs and completes the scrape. Temporary queue files are automatically cleaned up.

Running Spiders Directly

You can bypass the interactive CLI and run spiders directly with scrapy:

scrapy crawl jiji_urls -a start_page=1

When running spiders directly, you’ll need to manually run clean.py for Jiji data cleaning. The interactive CLI handles this automatically.

Next Steps

Understand the Workflow

Learn about the two-phase workflow, resume mode, and data cleaning pipeline

Configure Settings

Customize spider behavior, pagination, concurrency, and Playwright options

Output Schema

Understand the structure of URL CSVs and listing data CSVs

Troubleshooting

Fix common issues like browser failures, empty fields, and resume errors

Get Started

Core Concepts

Usage

Commands

Configuration

Reference

Quick Start

Quick Start

Two-Phase Workflow

Your First Scrape

Resume Scraping

Running Spiders Directly

Next Steps

Understand the Workflow

Configure Settings

Output Schema

Troubleshooting

Build docs developers (and LLMs) love

Get Started

Core Concepts

Usage

Commands

Configuration

Reference

​Quick Start

​Two-Phase Workflow

​Your First Scrape

​Resume Scraping

​Running Spiders Directly

​Next Steps

Understand the Workflow

Configure Settings

Output Schema

Troubleshooting

Build docs developers (and LLMs) love

Quick Start

Two-Phase Workflow

Your First Scrape

Resume Scraping

Running Spiders Directly

Next Steps