Skip to main content

Installation

Get started with ScrapeAccraProperties by installing Python dependencies and setting up the Playwright browser for JavaScript-rendered scraping.

Prerequisites

Before installing, ensure you have:

Python 3.12+

The project requires Python 3.12 or higher

Chromium Browser

Playwright will install Chromium browser binaries
On Linux systems, you may need additional OS libraries for Playwright. The installation steps below include commands to install these dependencies.

Installation Steps

1

Clone the repository

Clone the ScrapeAccraProperties repository to your local machine:
git clone https://github.com/yourusername/ScrapeAccraProperties.git
cd ScrapeAccraProperties
2

Install Python dependencies

Choose your preferred package manager:
uv sync
Using uv is recommended for faster dependency resolution and installation.

Dependencies Installed

The following packages are installed from pyproject.toml:
  • scrapy >=2.14.1 - Web scraping framework
  • scrapy-playwright >=0.0.46 - Playwright integration for Scrapy
  • playwright >=1.58.0 - Browser automation library
  • rich >=14.3.3 - Rich terminal formatting
  • pandas >=3.0.1 - Data manipulation and CSV handling
  • notebook >=7.5.3 - Jupyter notebook support
  • pregex >=2.3.3 - Regex pattern builder
3

Install Playwright browser

Install the Chromium browser binaries required for Playwright:
python -m playwright install chromium
On Linux systems, you may need additional OS libraries. Run this command to install them:
python -m playwright install-deps chromium
This installs system dependencies like graphics libraries, font rendering libraries, and other OS-level requirements for running Chromium in headless mode.
4

Verify installation

Verify that everything is set up correctly by running the interactive CLI:
python main.py
You should see the interactive menu:
╭─────────────────────── main.py ────────────────────────╮
│ Accra Property Scraper                                 │
│ - Interactive multi-spider runner                      │
│ - Listing resume mode queues only missing URLs         │
│ - CSV writes happen item-by-item during crawl          │
│ - Jiji listings are cleaned to outputs/data/raw.csv    │
╰────────────────────────────────────────────────────────╯

Choose action
  1. Collect listing URLs
  2. Scrape listing details
  3. Resume listing scrape (missing URLs only)
  4. Exit
Enter choice [1]:
If you see this menu, your installation is successful!

Troubleshooting

If you encounter browser launch errors, ensure Playwright’s Chromium is installed:
python -m playwright install chromium
If the issue persists, try reinstalling with system dependencies (Linux):
python -m playwright install-deps chromium
On Linux systems, you may see errors related to missing system libraries. Run:
python -m playwright install-deps chromium
This installs all required OS-level dependencies for running Chromium.
Ensure you’re using Python 3.12 or higher:
python --version
If you have multiple Python versions installed, you may need to use python3.12 or python3 instead of python.
If using pip installation, ensure your virtual environment is activated:
source .venv/bin/activate  # Linux/macOS
# or
.venv\Scripts\activate  # Windows

Next Steps

Quick Start Tutorial

Now that you’ve installed ScrapeAccraProperties, follow the quickstart guide to scrape your first listings!

Build docs developers (and LLMs) love