Task A - Hobby Frequency - CircleNet Analytics

Overview

Task A analyzes the CircleNet Pages dataset to count the frequency of each hobby.

TaskA (Simple)

Package: circlenet.taskA
Class: TaskA
Source: src/main/java/circlenet/taskA/TaskA.java

Main Method

public static void main(String[] args) throws Exception

Command-Line Arguments

args[0]

string

required

Input path to the CircleNet Pages CSV file (e.g., /circlenet/pages/CircleNetPage.csv)

args[1]

string

required

Output path for results (e.g., /circlenet/output/taskA/simple)

Mapper: TaskAMapper

Extracts hobbies from the Pages dataset and emits (hobby, 1) pairs.

public static class TaskAMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
    private final static IntWritable one = new IntWritable(1);
    private Text hobby = new Text();

    @Override
    protected void map(LongWritable key, Text value, Context context) 
            throws IOException, InterruptedException{
        String line = value.toString();
        String[] fields = line.split(",");
        if(fields.length == 5){
            hobby.set(fields[4]);
            context.write(hobby,one);
        }
    }
}

Reducer: TaskAReducer

Sums the counts for each hobby.

public static class TaskAReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
    private IntWritable result = new IntWritable();

    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) 
            throws IOException, InterruptedException{
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
        result.set(sum);
        context.write(key, result);
    }
}

Example Usage

hadoop jar $JAR circlenet.taskA.TaskA $PAGES $OUT/taskA/simple

TaskAOptimized

Package: circlenet.taskA
Class: TaskAOptimized
Source: src/main/java/circlenet/taskA/TaskAOptimized.java

Main Method

public static void main(String[] args) throws Exception

Command-Line Arguments

args[0]

string

required

Input path to the CircleNet Pages CSV file

args[1]

string

required

Output path for results (e.g., /circlenet/output/taskA/optimized)

Mapper: MapperA

Uses CsvUtils for better CSV parsing and validates hobby field is not empty.

public static class MapperA extends Mapper<LongWritable, Text, Text, IntWritable> {
    private static final IntWritable ONE = new IntWritable(1);
    private final Text hobby = new Text();

    @Override
    protected void map(LongWritable key, Text value, Context context) 
            throws IOException, InterruptedException {
        String[] fields = CsvUtils.split(value.toString());
        if (fields.length >= 5) {
            hobby.set(fields[4].trim());
            if (!fields[4].trim().isEmpty()) {
                context.write(hobby, ONE);
            }
        }
    }
}

Reducer: SumReducer

Combiner: Yes, uses SumReducer as combiner for optimization.

public static class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
    private final IntWritable out = new IntWritable();

    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) 
            throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable v : values) {
            sum += v.get();
        }
        out.set(sum);
        context.write(key, out);
    }
}

Optimizations

Uses combiner to reduce network shuffle
Better CSV parsing with CsvUtils
Empty string validation

Example Usage

hadoop jar $JAR circlenet.taskA.TaskAOptimized $PAGES $OUT/taskA/optimized

MapReduce Jobs

Utilities

Task A - Hobby Frequency

Overview

TaskA (Simple)

Main Method

Command-Line Arguments

Mapper: TaskAMapper

Reducer: TaskAReducer

Example Usage

TaskAOptimized

Main Method

Command-Line Arguments

Mapper: MapperA

Reducer: SumReducer

Optimizations

Example Usage

Build docs developers (and LLMs) love

MapReduce Jobs

Utilities

​Overview

​TaskA (Simple)

​Main Method

​Command-Line Arguments

​Mapper: TaskAMapper

​Reducer: TaskAReducer

​Example Usage

​TaskAOptimized

​Main Method

​Command-Line Arguments

​Mapper: MapperA

​Reducer: SumReducer

​Optimizations

​Example Usage

Build docs developers (and LLMs) love

Overview

TaskA (Simple)

Main Method

Command-Line Arguments

Mapper: TaskAMapper

Reducer: TaskAReducer

Example Usage

TaskAOptimized

Main Method

Command-Line Arguments

Mapper: MapperA

Reducer: SumReducer

Optimizations

Example Usage