Skip to main content
CircleNet Analytics implements 8 distinct MapReduce tasks to analyze social media data across three datasets: CircleNetPage (200K users), Follows (20M relationships), and ActivityLog (10M actions). Each task demonstrates different MapReduce patterns and optimization techniques including combiners, map-side joins, and map-only jobs.

Available Tasks

Task A: Hobby Frequency

Count the frequency of each favorite hobby on CircleNet

Task B: Popular Pages

Find the top 10 most accessed CircleNet pages

Task C: Hobby Filter

Filter users by a specific favorite hobby

Task D: Popularity Factor

Calculate follower count for each CircleNet page owner

Task E: Favorites Analysis

Analyze total actions and distinct pages accessed per user

Task F: Above Average

Identify users with more followers than average

Task G: Outdated Pages

Find users with no activity in the last 90 days

Task H: One-Way Follows

Detect same-region one-way follow relationships

Optimization Techniques

All tasks implement both simple and optimized approaches:
  • Combiners: Reduce shuffle I/O by pre-aggregating data at the mapper
  • Map-Side Joins: Load small datasets into memory for efficient joins
  • Map-Only Jobs: Skip reduce phase when possible to save I/O costs
  • Job Chaining: Minimize the number of sequential MapReduce jobs

Dataset Structure

CircleNetPage (200,000 records):
ID,NickName,JobTitle,RegionCode,FavoriteHobby
Follows (20,000,000 records):
ColRel,ID1,ID2,DateOfRelation,Description
ActivityLog (10,000,000 records):
ActionId,ByWho,WhatPage,ActionType,ActionTime

Running Tasks

All tasks follow this general pattern:
# Build the JAR
mvn clean package -DskipTests

# Run simple version
hadoop jar $JAR circlenet.taskX.TaskXSimple <inputs> <output>

# Run optimized version
hadoop jar $JAR circlenet.taskX.TaskXOptimized <inputs> <output>
See individual task pages for specific commands and parameters.

Build docs developers (and LLMs) love