Analytics Tasks Overview

Learn more about Mintlify

Enter your email to receive updates about new features and product releases.

Available Tasks
Optimization Techniques
Dataset Structure
Running Tasks

CircleNet Analytics implements 8 distinct MapReduce tasks to analyze social media data across three datasets: CircleNetPage (200K users), Follows (20M relationships), and ActivityLog (10M actions). Each task demonstrates different MapReduce patterns and optimization techniques including combiners, map-side joins, and map-only jobs.

Available Tasks

Task A: Hobby Frequency

Count the frequency of each favorite hobby on CircleNet

Task B: Popular Pages

Find the top 10 most accessed CircleNet pages

Task C: Hobby Filter

Filter users by a specific favorite hobby

Task D: Popularity Factor

Calculate follower count for each CircleNet page owner

Task E: Favorites Analysis

Analyze total actions and distinct pages accessed per user

Task F: Above Average

Identify users with more followers than average

Task G: Outdated Pages

Find users with no activity in the last 90 days

Task H: One-Way Follows

Detect same-region one-way follow relationships

Optimization Techniques

All tasks implement both simple and optimized approaches:

Combiners: Reduce shuffle I/O by pre-aggregating data at the mapper
Map-Side Joins: Load small datasets into memory for efficient joins
Map-Only Jobs: Skip reduce phase when possible to save I/O costs
Job Chaining: Minimize the number of sequential MapReduce jobs

Dataset Structure

CircleNetPage (200,000 records):

ID,NickName,JobTitle,RegionCode,FavoriteHobby

Follows (20,000,000 records):

ColRel,ID1,ID2,DateOfRelation,Description

ActivityLog (10,000,000 records):

ActionId,ByWho,WhatPage,ActionType,ActionTime

Running Tasks

All tasks follow this general pattern:

# Build the JAR
mvn clean package -DskipTests

# Run simple version
hadoop jar $JAR circlenet.taskX.TaskXSimple <inputs> <output>

# Run optimized version
hadoop jar $JAR circlenet.taskX.TaskXOptimized <inputs> <output>

See individual task pages for specific commands and parameters.

Dataset Generation

Task A - Hobby Frequency Analysis

⌘I

Build docs developers (and LLMs) love

Get started for free Talk to us

Get Started

Dataset

Analytics Tasks

Guides

Available Tasks

Task A: Hobby Frequency

Task B: Popular Pages

Task C: Hobby Filter

Task D: Popularity Factor

Task E: Favorites Analysis

Task F: Above Average

Task G: Outdated Pages

Task H: One-Way Follows

Optimization Techniques

Dataset Structure

Running Tasks

Build docs developers (and LLMs) love

Get Started

Dataset

Analytics Tasks

Guides

​Available Tasks

Task A: Hobby Frequency

Task B: Popular Pages

Task C: Hobby Filter

Task D: Popularity Factor

Task E: Favorites Analysis

Task F: Above Average

Task G: Outdated Pages

Task H: One-Way Follows

​Optimization Techniques

​Dataset Structure

​Running Tasks

Build docs developers (and LLMs) love

Available Tasks

Optimization Techniques

Dataset Structure

Running Tasks