Overview
ScrapeAccraProperties uses 4 specialized spiders built on a common base class:- JijiUrlSpider - Collects listing URLs from Jiji search pages
- JijiListingSpider - Extracts detailed data from Jiji listings
- MeqasaUrlSpider - Collects listing URLs from Meqasa search pages
- MeqasaListingSpider - Extracts detailed data from Meqasa listings
PropertyBaseSpider, which provides CSV management, progress tracking, and URL deduplication.
PropertyBaseSpider
Base class providing core functionality for all property spiders. Location:property_bot/spiders/base_spider.py:55
Class Attributes
Path to the output CSV file where scraped items will be saved
Field name used for URL deduplication
Ordered tuple of field names for CSV output. If
None, fields are dynamically determined from items.Instance Attributes
Total number of items scraped in current session
Total number of failed requests
Set of URLs already scraped (loaded from existing CSV on startup)
Core Methods
save_item()
- Serializes complex types (dict, list) to JSON
- Skips duplicate URLs based on
URL_FIELD - Creates CSV with headers if file doesn’t exist
- Thread-safe with CSV lock
update_ui()
current_page- Current page number or item counttotal_pages- Total pages or items to process
errback_close_page()
Private Methods
_serialize_for_csv()
_serialize_for_csv()
dict→ JSON stringlist/tuple/set→ JSON array stringstr→ Normalized whitespaceNone→ Empty string
_normalize_item()
_normalize_item()
_serialize_for_csv() to all values in an item dictionary._get_output_fieldnames()
_get_output_fieldnames()
OUTPUT_FIELDS is None, dynamically appends new fields as they appear.JijiUrlSpider
Collects rental listing URLs from Jiji Ghana search result pages. Spider Name:jiji_urlsLocation:
property_bot/spiders/jiji_urls.py:16
Configuration
Spider identifier for Scrapy
outputs/urls/jiji_urls.csv("url", "page", "fetch_date")Constructor Arguments
First page number to scrape
Maximum number of pages to scrape. Mutually exclusive with
total_listing.Automatically calculates
max_pages based on listing count (20 listings/page). Overrides max_pages if provided.Behavior
- Auto-detection mode: If neither
max_pagesnortotal_listingis specified, spider scrapes first page and auto-detects total result count from breadcrumb - URL format:
https://jiji.com.gh/greater-accra/houses-apartments-for-rent?page={page_num} - Listings per page: 20
- Wait strategy: Waits for
.b-advert-listingselector (15s timeout)
Methods
start_requests()
- If
max_pagesspecified → yields all page requests immediately - Otherwise → yields single detector request for first page
parse()
Usage Examples
JijiListingSpider
Extracts structured rental data from individual Jiji listing pages. Spider Name:jiji_listingsLocation:
property_bot/spiders/jiji_listing.py:13
Configuration
Spider identifier
outputs/data/jiji_data.csvConstructor Arguments
Path to CSV containing URLs to scrape. Can be absolute or relative to project root.
Extracted Fields
| Field | Source | Example |
|---|---|---|
url | Response URL | https://jiji.com.gh/... |
fetch_date | From URL CSV or current date | 2026-03-03 |
title | h1 div::text | 3 Bedroom Apartment |
location | .b-advert-info-statistics--region::text | Accra Metropolitan, Greater Accra |
house_type | Icon attributes or properties | Apartment |
bedrooms | Icon attributes or properties | 3 Bedrooms |
bathrooms | Icon attributes or properties | 2 Bathrooms |
price | .qa-advert-price-view-value::text | GH₵ 3,500 |
properties | .b-advert-attribute (dict) | {"Condition": "Newly Built", ...} |
amenities | .b-advert-attributes__tag (list) | ["Wi-Fi", "24-hour Electricity", ...] |
description | .qa-description-text::text | Full text description |
Methods
_load_urls()
{"url": ..., "fetch_date": ...} dictionaries.
parse()
Usage Example
MeqasaUrlSpider
Collects rental listing URLs from Meqasa search result pages. Spider Name:meqasa_urlsLocation:
property_bot/spiders/meqasa_urls.py:16
Configuration
Spider identifier
outputs/urls/meqasa_urls.csv("url", "page", "fetch_date")Constructor Arguments
First page number to scrape
Total number of pages to scrape. If not specified, auto-detects from result count.
Behavior
- Auto-detection mode: If
total_pagesnot specified, scrapes first page and reads count from#headfiltercount - URL format:
https://meqasa.com/properties-for-rent-in-Greater%20Accra-region?w={page_num} - Listings per page: 16
- Wait strategy: Waits for
.mqs-prop-dt-wrapperselector (15s timeout)
Methods
parse()
Usage Examples
MeqasaListingSpider
Extracts structured rental data from individual Meqasa listing pages. Spider Name:meqasa_listingsLocation:
property_bot/spiders/meqasa_listing.py:12
Configuration
Spider identifier
outputs/data/meqasa_data.csvConstructor Arguments
Path to CSV containing URLs to scrape
Extracted Fields
| Field | Source | Example |
|---|---|---|
url | Response URL | https://meqasa.com/property/... |
Title | h1::text | 3 Bedroom Furnished Apartment |
Price | .price-wrapper > div:nth-child(1)::text | GHS 5,000 |
Rate | .price-wrapper > div:nth-child(2)::text | per month |
Description | .description p::text | Full description |
fetch_date | Current date | 2026-03-03 |
Bedrooms | From details table | 3 |
Bathrooms | From details table | 2 |
details | Full details table (dict) | {"Categories": "Apartment", ...} |
Methods
_get_detail()
parse()
Usage Example
Common Patterns
Playwright Integration
All spiders usescrapy-playwright for JavaScript rendering: