Count the frequency of each favorite hobby on CircleNet
Task A analyzes the CircleNetPage dataset to report the frequency of each favorite hobby, demonstrating a classic MapReduce word count pattern with aggregation optimization.
The basic approach uses a standard Map-Reduce pattern without optimization.Mapper (TaskA.java:17-31):
public static class TaskAMapper extends Mapper<LongWritable, Text, Text, IntWritable>{private final static IntWritable one = new IntWritable(1);private Text hobby = new Text();@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{ String line = value.toString(); String[] fields = line.split(","); if(fields.length == 5){ hobby.set(fields[4]); context.write(hobby, one); }}}
Reducer (TaskA.java:33-47):
public static class TaskAReducer extends Reducer<Text, IntWritable, Text, IntWritable>{private IntWritable result = new IntWritable();@Overrideprotected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{ int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result);}}
public static class MapperA extends Mapper<LongWritable, Text, Text, IntWritable> {private static final IntWritable ONE = new IntWritable(1);private final Text hobby = new Text();@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] fields = CsvUtils.split(value.toString()); if (fields.length >= 5) { hobby.set(fields[4].trim()); if (!fields[4].trim().isEmpty()) { context.write(hobby, ONE); } }}}
Combiner/Reducer (TaskAOptimized.java:34-46):
public static class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {private final IntWritable out = new IntWritable();@Overrideprotected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable v : values) { sum += v.get(); } out.set(sum); context.write(key, out);}}
Why Combiners Work Here: The combiner can safely pre-aggregate because counting is both associative and commutative. Combining (1+1+1) at the mapper produces the same result as combining at the reducer.
Optimization Benefits:
Reduces shuffle I/O dramatically (from 200K records to ~unique hobbies)
Combiner optimization is most effective when there’s high key duplication (many users share the same hobby). With only ~100 unique hobbies among 200K users, this task sees excellent combiner performance.