Aggregate Functions

What is Aggregate Functions?

Aggregate functions in PL/SQL are SQL functions that operate on sets of rows to return a single summary value. They perform calculations across multiple rows grouped by one or more columns, enabling powerful data analysis capabilities directly within database operations.

Unlike scalar functions that operate on single values, aggregate functions process entire result sets to provide statistical summaries, counts, and groupings.

Common aggregate functions include COUNT, SUM, AVG, MIN, and MAX, which are typically used in conjunction with GROUP BY and HAVING clauses for advanced data aggregation.

COUNT Function

The COUNT function returns the number of rows that match specified criteria. It can count all rows (COUNT(*)), distinct values (COUNT(DISTINCT column)), or non-null values in a specific column (COUNT(column)).

DECLARE
    v_total_count NUMBER;
    v_non_null_count NUMBER;
    v_distinct_count NUMBER;
BEGIN
    -- Count all rows in employees table
    SELECT COUNT(*) INTO v_total_count FROM employees;
    
    -- Count non-null values in department_id column
    SELECT COUNT(department_id) INTO v_non_null_count FROM employees;
    
    -- Count distinct department values
    SELECT COUNT(DISTINCT department_id) INTO v_distinct_count FROM employees;
    
    DBMS_OUTPUT.PUT_LINE('Total employees: ' || v_total_count);
    DBMS_OUTPUT.PUT_LINE('Employees with departments: ' || v_non_null_count);
    DBMS_OUTPUT.PUT_LINE('Unique departments: ' || v_distinct_count);
END;
/

SUM Function

The SUM function calculates the total sum of numeric values in a specified column. It ignores NULL values and returns NULL if all values are NULL or if no rows are selected.

DECLARE
    v_total_salary NUMBER;
    v_dept_salary NUMBER;
BEGIN
    -- Sum of all salaries in employees table
    SELECT SUM(salary) INTO v_total_salary FROM employees;
    
    -- Sum salaries by department using GROUP BY
    FOR dept_rec IN (
        SELECT department_id, SUM(salary) as dept_total
        FROM employees 
        WHERE department_id IS NOT NULL
        GROUP BY department_id
    ) LOOP
        DBMS_OUTPUT.PUT_LINE('Dept ' || dept_rec.department_id || 
                           ': $' || dept_rec.dept_total);
    END LOOP;
END;
/

AVG Function

The AVG function calculates the average value of numeric expressions, excluding NULL values from the calculation.

The result precision depends on the data type being averaged.

DECLARE
    v_avg_salary NUMBER;
    v_avg_commission NUMBER;
BEGIN
    -- Average salary across all employees
    SELECT AVG(salary) INTO v_avg_salary FROM employees;
    
    -- Average commission (NULLs are excluded from calculation)
    SELECT AVG(NVL(commission_pct, 0)) INTO v_avg_commission FROM employees;
    
    -- Average salary by job title with GROUP BY
    FOR job_rec IN (
        SELECT job_id, AVG(salary) as avg_salary, COUNT(*) as emp_count
        FROM employees 
        GROUP BY job_id
        HAVING COUNT(*) > 1  -- Only jobs with multiple employees
    ) LOOP
        DBMS_OUTPUT.PUT_LINE('Job ' || job_rec.job_id || 
                           ': Avg Salary $' || ROUND(job_rec.avg_salary, 2) ||
                           ' (' || job_rec.emp_count || ' employees)');
    END LOOP;
END;
/

MIN Function

The MIN function returns the minimum value from a set of values. It works with numeric, date, and character data types, and ignores NULL values.

DECLARE
    v_min_salary NUMBER;
    v_min_hire_date DATE;
    v_min_job_title VARCHAR2(50);
BEGIN
    -- Minimum salary in the company
    SELECT MIN(salary) INTO v_min_salary FROM employees;
    
    -- Earliest hire date
    SELECT MIN(hire_date) INTO v_min_hire_date FROM employees;
    
    -- Minimum job title alphabetically by department
    FOR dept_rec IN (
        SELECT department_id, MIN(job_id) as min_job
        FROM employees 
        WHERE department_id IS NOT NULL
        GROUP BY department_id
    ) LOOP
        DBMS_OUTPUT.PUT_LINE('Dept ' || dept_rec.department_id || 
                           ': First job alphabetically: ' || dept_rec.min_job);
    END LOOP;
END;
/

MAX Function

The MAX function returns the maximum value from a set of values. Like MIN, it handles various data types and excludes NULL values from consideration.

DECLARE
    v_max_salary NUMBER;
    v_max_commission NUMBER;
BEGIN
    -- Maximum salary
    SELECT MAX(salary) INTO v_max_salary FROM employees;
    
    -- Maximum commission percentage by department
    FOR dept_rec IN (
        SELECT department_id, 
               MAX(commission_pct) as max_commission,
               COUNT(*) as total_employees,
               COUNT(commission_pct) as commissioned_employees
        FROM employees 
        WHERE department_id IS NOT NULL
        GROUP BY department_id
        HAVING MAX(commission_pct) IS NOT NULL
    ) LOOP
        DBMS_OUTPUT.PUT_LINE('Dept ' || dept_rec.department_id || 
                           ': Max commission ' || dept_rec.max_commission ||
                           ' (' || dept_rec.commissioned_employees || 
                           '/' || dept_rec.total_employees || ' employees)');
    END LOOP;
END;
/

GROUP BY Clause

The GROUP BY clause groups rows that have the same values in specified columns into summary rows. It’s essential for combining aggregate functions with categorical data analysis.

GROUP BY enables powerful data summarization and reporting capabilities.

DECLARE
    CURSOR dept_summary_cur IS
        SELECT department_id, 
               COUNT(*) as emp_count,
               AVG(salary) as avg_salary,
               MIN(hire_date) as earliest_hire,
               MAX(salary) as max_salary
        FROM employees
        WHERE department_id IS NOT NULL
        GROUP BY department_id
        ORDER BY department_id;
BEGIN
    FOR dept_rec IN dept_summary_cur LOOP
        DBMS_OUTPUT.PUT_LINE(
            'Department ' || dept_rec.department_id || 
            ': ' || dept_rec.emp_count || ' employees | ' ||
            'Avg Salary: $' || ROUND(dept_rec.avg_salary, 2) ||
            ' | Max Salary: $' || dept_rec.max_salary ||
            ' | First Hire: ' || TO_CHAR(dept_rec.earliest_hire, 'DD-MON-YYYY')
        );
    END LOOP;
END;
/

HAVING Clause

The HAVING clause filters groups created by the GROUP BY clause based on aggregate conditions.

Unlike WHERE which filters rows before grouping, HAVING filters after aggregation.

DECLARE
    CURSOR high_volume_jobs_cur IS
        SELECT j.job_title,
               COUNT(e.employee_id) as employee_count,
               AVG(e.salary) as avg_salary,
               SUM(e.salary) as total_salary_budget
        FROM employees e
        JOIN jobs j ON e.job_id = j.job_id
        GROUP BY j.job_title
        HAVING COUNT(e.employee_id) > 5  -- Only jobs with more than 5 employees
           AND AVG(e.salary) > 5000      -- And average salary > 5000
        ORDER BY total_salary_budget DESC;
BEGIN
    DBMS_OUTPUT.PUT_LINE('High Volume/High Salary Jobs:');
    DBMS_OUTPUT.PUT_LINE('================================');
    
    FOR job_rec IN high_volume_jobs_cur LOOP
        DBMS_OUTPUT.PUT_LINE(
            RPAD(job_rec.job_title, 30) ||
            'Employees: ' || LPAD(job_rec.employee_count, 3) ||
            ' | Avg Salary: $' || LPAD(ROUND(job_rec.avg_salary), 6) ||
            ' | Total Budget: $' || LPAD(ROUND(job_rec.total_salary_budget), 10)
        );
    END LOOP;
END;
/

GROUPING SETS

GROUPING SETS allow multiple grouping combinations in a single query, providing a more efficient alternative to multiple UNION ALL queries.

GROUPING SETS enable hierarchical and multi-dimensional analysis efficiently.

DECLARE
    CURSOR multi_dimension_cur IS
        SELECT 
            department_id,
            job_id,
            COUNT(*) as emp_count,
            AVG(salary) as avg_salary,
            GROUPING(department_id) as dept_grouped,
            GROUPING(job_id) as job_grouped
        FROM employees
        WHERE department_id IS NOT NULL
        GROUP BY GROUPING SETS (
            (department_id, job_id),  -- Detail by dept and job
            (department_id),           -- Summary by department only
            (job_id),                  -- Summary by job only
            ()                         -- Grand total
        )
        ORDER BY GROUPING(department_id), department_id, 
                 GROUPING(job_id), job_id;
BEGIN
    FOR agg_rec IN multi_dimension_cur LOOP
        IF agg_rec.dept_grouped = 1 AND agg_rec.job_grouped = 1 THEN
            DBMS_OUTPUT.PUT_LINE('GRAND TOTAL: ' || agg_rec.emp_count || 
                               ' employees, Avg Salary: $' || ROUND(agg_rec.avg_salary, 2));
        ELSIF agg_rec.dept_grouped = 1 THEN
            DBMS_OUTPUT.PUT_LINE('JOB SUMMARY (' || agg_rec.job_id || '): ' || 
                               agg_rec.emp_count || ' employees');
        ELSIF agg_rec.job_grouped = 1 THEN
            DBMS_OUTPUT.PUT_LINE('DEPT SUMMARY (' || agg_rec.department_id || '): ' || 
                               agg_rec.emp_count || ' employees');
        ELSE
            DBMS_OUTPUT.PUT_LINE('DEPT ' || agg_rec.department_id || 
                               ' JOB ' || agg_rec.job_id || ': ' || 
                               agg_rec.emp_count || ' employees');
        END IF;
    END LOOP;
END;
/

Why is Aggregate Functions Important?

Performance Optimization

Single-Query Principle: Aggregate functions enable complex data summarization in a single database round-trip, eliminating the need for multiple individual queries and reducing network overhead.

Data Integrity

ACID Compliance: By performing calculations directly in the database, aggregates maintain transactional consistency and avoid phantom read issues that can occur with application-level calculations.

Scalability

Divide and Conquer Pattern: GROUP BY with HAVING clauses allows partitioning large datasets into manageable groups, enabling efficient processing of massive volumes of data through parallel execution.

Advanced Nuances

NULL Handling Variations

Different aggregate functions handle NULLs differently. Understanding these nuances is critical for accurate reporting.

-- Demonstration of NULL handling in aggregates
SELECT 
    COUNT(*) as total_rows,                    -- Includes NULLs: 100
    COUNT(commission_pct) as non_null_comms,   -- Excludes NULLs: 35
    AVG(commission_pct) as avg_non_null,      -- Avg of non-NULLs: 0.15
    AVG(NVL(commission_pct, 0)) as avg_all    -- Avg including NULLs as 0: 0.0525
FROM employees;

Window Function Integration

Advanced usage involves combining aggregates with window functions for running totals and moving averages:

-- Using aggregates as window functions
SELECT employee_id, department_id, salary,
       AVG(salary) OVER (PARTITION BY department_id) as dept_avg_salary,
       SUM(salary) OVER (PARTITION BY department_id) as dept_total_salary,
       salary - AVG(salary) OVER (PARTITION BY department_id) as diff_from_avg
FROM employees;

How This Fits the Roadmap

Within the “Advanced SQL Features” section, aggregate functions serve as the foundation for sophisticated data analysis capabilities.

Prerequisites For:

Analytic functions (LEAD, LAG, RANK)
Window functions for running calculations
Pivoting operations for cross-tabular reporting

Unlocks:

Materialized views with aggregate summaries
Query rewrite capabilities for optimization
OLAP operations for business intelligence
Data warehousing techniques

Database Fundamentals

SQL Basics

Data Manipulation

PL/SQL Programming

What is Aggregate Functions?

COUNT Function

SUM Function

AVG Function

MIN Function

MAX Function

GROUP BY Clause

HAVING Clause

GROUPING SETS

Why is Aggregate Functions Important?

Performance Optimization

Data Integrity

Scalability

Advanced Nuances

NULL Handling Variations

Window Function Integration

How This Fits the Roadmap

Prerequisites For:

Unlocks:

Build docs developers (and LLMs) love

Database Fundamentals

SQL Basics

Data Manipulation

PL/SQL Programming

​What is Aggregate Functions?

​COUNT Function

​SUM Function

​AVG Function

​MIN Function

​MAX Function

​GROUP BY Clause

​HAVING Clause

​GROUPING SETS

​Why is Aggregate Functions Important?

​Performance Optimization

​Data Integrity

​Scalability

​Advanced Nuances

​NULL Handling Variations

​Window Function Integration

​How This Fits the Roadmap

​Prerequisites For:

​Unlocks:

Build docs developers (and LLMs) love

What is Aggregate Functions?

COUNT Function

SUM Function

AVG Function

MIN Function

MAX Function

GROUP BY Clause

HAVING Clause

GROUPING SETS

Why is Aggregate Functions Important?

Performance Optimization

Data Integrity

Scalability

Advanced Nuances

NULL Handling Variations

Window Function Integration

How This Fits the Roadmap

Prerequisites For:

Unlocks: