Skip to main content
Bypass the interactive menu and run spiders directly using scrapy crawl commands. This is ideal for automation, scripting, and CI/CD pipelines.

URL Collection Spiders

URL spiders collect property listing URLs from search result pages and write them to CSV files.

Jiji URL Spider

scrapy crawl jiji_urls -a start_page=1

Parameters

start_page
integer
default:"1"
The search results page number to start from. Must be >= 1.
max_pages
integer
Maximum number of pages to scrape. When set, the spider scrapes exactly this many pages starting from start_page.Example: max_pages=5 scrapes 5 pages total.
Cannot be used together with total_listing.
total_listing
integer
Expected total number of listings to collect. The spider calculates how many pages are needed based on Jiji’s listings per page.Example: total_listing=200 might scrape ~9 pages (Jiji shows ~24 listings per page).
Cannot be used together with max_pages.

Output

Writes to: outputs/urls/jiji_urls.csv
url,page,fetch_date
https://jiji.com.gh/accra-metropolitan/houses-apartments-for-rent/...,1,2026-03-03

Meqasa URL Spider

scrapy crawl meqasa_urls -a start_page=1

Parameters

start_page
integer
default:"1"
The search results page number to start from. Must be >= 1.
total_pages
integer
Total number of pages to scrape. When set, the spider scrapes exactly this many pages starting from start_page.Example: total_pages=5 scrapes 5 pages total.

Output

Writes to: outputs/urls/meqasa_urls.csv
url,page,fetch_date
https://meqasa.com/properties-for-rent/ghana/greater-accra/...,1,2026-03-03
When no page control parameter is provided (max_pages, total_pages, or total_listing), spiders auto-detect the total pages by parsing the site’s pagination.

Listing Detail Spiders

Listing spiders read URLs from CSV files and scrape full property details.

Jiji Listing Spider

scrapy crawl jiji_listings -a csv_path=outputs/urls/jiji_urls.csv

Parameters

csv_path
string
required
Path to the CSV file containing URLs to scrape. Must have a url column.Default: outputs/urls/jiji_urls.csvCan be absolute or relative to project root.

Output

Writes to: outputs/data/jiji_data.csv Key fields:
  • url
  • fetch_date
  • title
  • location
  • house_type
  • bedrooms
  • bathrooms
  • price
  • properties (serialized mapping)
  • amenities (serialized list)
  • description
After scraping completes, the interactive runner (main.py) automatically runs clean.py to produce a cleaned dataset at outputs/data/raw.csv. This cleaning step does not run when using direct commands.

Meqasa Listing Spider

scrapy crawl meqasa_listings -a csv_path=outputs/urls/meqasa_urls.csv

Parameters

csv_path
string
required
Path to the CSV file containing URLs to scrape. Must have a url column.Default: outputs/urls/meqasa_urls.csvCan be absolute or relative to project root.

Output

Writes to: outputs/data/meqasa_data.csv Base fields:
  • url
  • Title
  • Price
  • Rate
  • Description
  • fetch_date
Additional columns are extracted from listing detail tables and vary by listing.

Custom CSV Paths

You can specify custom paths for both input (URL CSVs) and output (data CSVs) by modifying the spider arguments.

Reading from Custom URL CSV

scrapy crawl jiji_listings -a csv_path=custom/input/my_urls.csv
The CSV must contain a url column. Other columns are ignored.

Example: Full Workflow

1

Collect Jiji URLs (first 10 pages)

scrapy crawl jiji_urls -a start_page=1 -a max_pages=10
Output: outputs/urls/jiji_urls.csv
2

Scrape Jiji listing details

scrapy crawl jiji_listings -a csv_path=outputs/urls/jiji_urls.csv
Output: outputs/data/jiji_data.csv
3

Collect Meqasa URLs (auto-detect pages)

scrapy crawl meqasa_urls -a start_page=1
Output: outputs/urls/meqasa_urls.csv
4

Scrape Meqasa listing details

scrapy crawl meqasa_listings -a csv_path=outputs/urls/meqasa_urls.csv
Output: outputs/data/meqasa_data.csv

Automation Example

Create a bash script to run the full workflow:
scrape.sh
#!/bin/bash

set -e

echo "Collecting URLs..."
scrapy crawl jiji_urls -a start_page=1 -a max_pages=20
scrapy crawl meqasa_urls -a start_page=1

echo "Scraping listing details..."
scrapy crawl jiji_listings -a csv_path=outputs/urls/jiji_urls.csv
scrapy crawl meqasa_listings -a csv_path=outputs/urls/meqasa_urls.csv

echo "Done!"
Run it:
chmod +x scrape.sh
./scrape.sh

Error Handling

If the URL CSV file doesn’t exist or is empty, the listing spider will exit with an error message.
Common errors:
No URLs found in outputs/urls/jiji_urls.csv
Solution: Run the URL collection spider first.
FileNotFoundError: outputs/urls/jiji_urls.csv
Solution: Verify the path is correct and the file exists.

CSV Output Behavior

  • Incremental writes: Each scraped item is written immediately (item-by-item)
  • URL deduplication: The same URL is not added twice to URL CSVs
  • Append mode: Listing spiders append to existing data CSVs by default
  • Auto-created directories: outputs/urls/ and outputs/data/ are created automatically if missing
For completely fresh data, delete the existing CSV files before running the spiders.

Build docs developers (and LLMs) love