Skip to main content

Welcome to ScrapeAccraProperties

ScrapeAccraProperties is a Scrapy + Playwright project for collecting rental listings in Greater Accra from Jiji Ghana and Meqasa. It features an interactive CLI with rich progress UI, JavaScript-rendered scraping, and intelligent resume capabilities.

Installation

Get started with Python 3.12+, install dependencies, and set up Playwright browser

Quick Start

Run your first scrape in minutes with the interactive CLI

Workflow

Learn the two-phase workflow: collect URLs, then scrape listings

Configuration

Configure spiders, customize settings, and optimize performance

Key Features

Uses scrapy-playwright with Chromium to handle dynamically loaded content from modern property listing sites. Asset blocking is enabled for images, media, fonts, and stylesheets to improve performance.
Rich progress UI with per-spider summaries guides you through the entire workflow. Choose platforms, configure pagination, and monitor scraping progress all from an intuitive menu system.
Phase 1: Collect listing URLs from search/result pages
Phase 2: Visit each listing URL and extract structured data
This separation allows you to validate URLs before scraping and enables efficient resume operations.
Listing data is written to CSV files incrementally as items are scraped. URL deduplication ensures no duplicate entries, and all outputs are organized under outputs/ directory.
Automatically compares URL CSVs to data CSVs and queues only missing URLs. Resume from where you left off without re-scraping existing listings.
Jiji listings are automatically cleaned after scraping using clean.py, producing a standardized dataset at outputs/data/raw.csv.

Supported Platforms

Jiji Ghana

Scrapes rental listings with detailed property attributes including:
  • Title, location, and description
  • House type, bedrooms, and bathrooms
  • Price and amenities
  • Custom property attributes

Meqasa

Extracts comprehensive listing information:
  • Title, price, and rate type
  • Full description
  • Dynamic property details from listing tables
  • Flexible schema adapts to varying attributes

Output Structure

All scraping outputs are automatically organized:
outputs/
├── urls/
│   ├── jiji_urls.csv
│   ├── meqasa_urls.csv
│   └── *_resume_queue.csv (temporary)
└── data/
    ├── jiji_data.csv
    ├── meqasa_data.csv
    └── raw.csv (cleaned Jiji data)
Responsible Use: Review and respect target site Terms of Service and robots.txt. Keep request volume and crawl frequency reasonable. Use scraped data in line with applicable laws and privacy obligations.

Next Steps

Install Now

Set up Python dependencies and Playwright browser

Run Your First Scrape

Follow the quickstart tutorial to collect your first listings

Build docs developers (and LLMs) love