Skip to main content

Overview

CircleNet Analytics is a big data analytics platform designed to process and analyze social media data at scale using Hadoop MapReduce. Built on top of Hadoop 2.7.7, it provides powerful tools for analyzing user behavior, relationships, and engagement patterns across a social network dataset. The platform processes three core datasets:

CircleNetPage

User profiles with 200,000 entries including nicknames, job titles, regions, and favorite hobbies

Follows

20 million follow relationships tracking social connections and timestamps

ActivityLog

10 million user actions including page views, pokes, and interactions

Key features

  • MapReduce analytics: Run distributed analytics jobs across large datasets using Hadoop’s parallel processing capabilities
  • Optimized implementations: Compare simple vs. optimized MapReduce jobs with combiner support for better performance
  • Dockerized environment: Complete Hadoop cluster setup with HDFS, web UIs, and monitoring tools
  • Scalable design: Process millions of records efficiently with proper data partitioning and aggregation

Get started

Quickstart

Run your first MapReduce job in 5 minutes

Setup guide

Complete Docker and Hadoop installation instructions

Dataset overview

Learn about the CircleNet data structure

Analytics tasks

Explore all 8 available analytics tasks

What you can analyze

CircleNet Analytics supports eight different analytics tasks:
  • Task A: Report the frequency of each favorite hobby on CircleNet
  • Task B: Find the 10 most popular CircleNetPages based on activity
  • Task C: Find all users whose hobby matches a specific interest
  • Task D: Compute the popularity factor (follower count) for each page owner
  • Task E: Determine user favorites by analyzing access patterns
  • Task F: Report owners more popular than the average user
  • Task G: Identify outdated pages with no activity in 90 days
  • Task H: Find users who follow someone in their region but aren’t followed back

Architecture

The platform runs on a containerized Hadoop cluster with the following components:
  • Hadoop HDFS: Distributed file system for storing datasets
  • MapReduce Engine: Parallel processing framework for analytics
  • NameNode Web UI: Monitor HDFS at http://localhost:3002
  • Job Tracker: Track MapReduce job progress and performance
All analytics tasks include both simple and optimized implementations, allowing you to compare performance and understand MapReduce optimization techniques.

Next steps

1

Set up your environment

Follow the setup guide to configure Docker, Hadoop, and HDFS
2

Load your data

Learn how to upload the CircleNet datasets to HDFS
3

Run your first job

Complete the quickstart to analyze hobby frequencies
4

Explore advanced tasks

Dive into complex analytics with joins and multi-stage MapReduce

Build docs developers (and LLMs) love