Installation Steps
Clone the Repository
Clone the project repository to your local machine:The project structure will look like this:
Replace
<repository-url> with the actual URL of your Git repository.Set Up Environment Variables
Create a Add the following environment variables to the
.env file in the root directory to configure your PostgreSQL database:.env file:.env
Prepare the Datasets
Place your CSV files in the The datasets will be automatically mounted into the Docker container at
datasets/ directory:The exact filenames may vary based on your data sources. Ensure all required CSV files are present before proceeding.
/datasets (read-only).Start Docker Containers
Launch the PostgreSQL database using Docker Compose:This command will:
- Pull the PostgreSQL 16 Alpine image
- Create and start the
datawarehousecontainer - Set up the database with your environment variables
- Mount the datasets directory
Verify Installation
Check that the container is running:You should see output similar to:Connect to the database to verify it’s working:
If you see the PostgreSQL prompt (
datawarehouse=#), the installation was successful!Post-Installation
Docker Setup
Learn more about Docker configuration and management
Data Architecture
Understand the Medallion Architecture used in this project
Troubleshooting
Port 5432 already in use
Port 5432 already in use
If you have another PostgreSQL instance running locally:
Permission denied for datasets
Permission denied for datasets
Ensure the datasets directory has proper read permissions:
Container fails to start
Container fails to start
Check the container logs for errors:Common issues include:
- Invalid environment variables in
.env - Insufficient Docker resources
- Corrupted Docker volumes (try
docker compose down -v)
Next Steps
Now that your environment is set up, you can:- Explore the Docker Setup for container management
- Learn about the ETL Procedures for data transformation
- Review the Data Model for analytics