Skip to main content
Follow these steps to get your SQL Data Warehouse project up and running.

Installation Steps

1

Clone the Repository

Clone the project repository to your local machine:
git clone <repository-url>
cd data-warehouse-project
Replace <repository-url> with the actual URL of your Git repository.
The project structure will look like this:
data-warehouse-project/
├── datasets/          # Raw datasets (ERP and CRM data)
├── docs/              # Documentation and architecture diagrams
├── scripts/           # SQL scripts for ETL
│   ├── bronze/        # Raw data extraction scripts
│   ├── silver/        # Data cleaning scripts
│   └── gold/          # Analytical model scripts
├── tests/             # Test scripts
├── docker-compose.yml # Docker configuration
└── README.md          # Project overview
2

Set Up Environment Variables

Create a .env file in the root directory to configure your PostgreSQL database:
touch .env
Add the following environment variables to the .env file:
.env
POSTGRES_USER=warehouse_admin
POSTGRES_PASSWORD=your_secure_password
POSTGRES_DB=datawarehouse
Security Note: Never commit the .env file to version control. It should already be listed in .gitignore.
Choose a strong password for POSTGRES_PASSWORD to secure your database.
3

Prepare the Datasets

Place your CSV files in the datasets/ directory:
datasets/
├── erp_sales.csv
├── erp_products.csv
├── crm_customers.csv
└── crm_interactions.csv
The exact filenames may vary based on your data sources. Ensure all required CSV files are present before proceeding.
The datasets will be automatically mounted into the Docker container at /datasets (read-only).
4

Start Docker Containers

Launch the PostgreSQL database using Docker Compose:
docker compose up -d
This command will:
  • Pull the PostgreSQL 16 Alpine image
  • Create and start the datawarehouse container
  • Set up the database with your environment variables
  • Mount the datasets directory
The -d flag runs containers in detached mode (in the background).
5

Verify Installation

Check that the container is running:
docker ps
You should see output similar to:
CONTAINER ID   IMAGE                  STATUS         PORTS                    NAMES
abc123def456   postgres:16-alpine     Up 10 seconds  0.0.0.0:5432->5432/tcp   datawarehouse
Connect to the database to verify it’s working:
docker exec -it datawarehouse psql -U warehouse_admin -d datawarehouse
If you see the PostgreSQL prompt (datawarehouse=#), the installation was successful!
6

Run Bronze Layer Scripts

Load raw data from CSV files into the Bronze layer:
docker exec -it datawarehouse psql -U warehouse_admin -d datawarehouse -f /scripts/bronze/load_data.sql
The Bronze layer stores raw data as-is from the source systems without any transformations.

Post-Installation

Docker Setup

Learn more about Docker configuration and management

Data Architecture

Understand the Medallion Architecture used in this project

Troubleshooting

If you have another PostgreSQL instance running locally:
# Stop local PostgreSQL service
sudo service postgresql stop

# Or change the port in docker-compose.yml
ports:
  - "5433:5432"  # Use port 5433 instead
Ensure the datasets directory has proper read permissions:
chmod -R 755 datasets/
Check the container logs for errors:
docker logs datawarehouse
Common issues include:
  • Invalid environment variables in .env
  • Insufficient Docker resources
  • Corrupted Docker volumes (try docker compose down -v)
If you encounter persistent issues, try removing all containers and volumes, then start fresh:
docker compose down -v
docker compose up -d

Next Steps

Now that your environment is set up, you can:
  1. Explore the Docker Setup for container management
  2. Learn about the ETL Procedures for data transformation
  3. Review the Data Model for analytics

Build docs developers (and LLMs) love