Skip to main content

Overview

Task C filters CircleNet pages to find all users with a specific hobby (e.g., “PodcastBinging”).

TaskCSimple

Package: circlenet.taskC
Class: TaskCSimple
Source: src/main/java/circlenet/taskC/TaskCSimple.java

Main Method

public static void main(String[] args) throws Exception

Command-Line Arguments

args[0]
string
required
Input path to the CircleNet Pages CSV file
args[1]
string
required
Output path for results
args[2]
string
required
Target hobby to filter (e.g., “PodcastBinging”)

Mapper: MapperC

Filters pages by hobby using case-insensitive comparison. The target hobby is configured via setup().
public static class MapperC extends Mapper<LongWritable, Text, Text, Text> {
    private String targetHobby;
    private final Text outKey = new Text();
    private final Text outVal = new Text();

    @Override
    protected void setup(Context context) {
        targetHobby = context.getConfiguration().get("task.c.hobby", "").trim();
    }

    @Override
    protected void map(LongWritable key, Text value, Context context) 
            throws IOException, InterruptedException {
        String[] f = CsvUtils.split(value.toString());
        if (f.length >= 5 && f[4].trim().equalsIgnoreCase(targetHobby)) {
            outKey.set(f[1].trim());
            outVal.set(f[2].trim());
            context.write(outKey, outVal);
        }
    }
}

Reducer: PassReducer

Pass-through reducer that outputs all filtered results.
public static class PassReducer extends Reducer<Text, Text, Text, Text> {
    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context) 
            throws IOException, InterruptedException {
        for (Text v : values) {
            context.write(key, v);
        }
    }
}

Configuration

The target hobby is passed via Hadoop configuration:
Configuration conf = new Configuration();
conf.set("task.c.hobby", args[2]);

Example Usage

hadoop jar $JAR circlenet.taskC.TaskCSimple $PAGES $OUT/taskC/simple PodcastBinging

Notes

  • An optimized version was implemented but showed no performance gain
  • The simple version is recommended for production use

Build docs developers (and LLMs) love