Skip to main content
The Follows dataset captures one-sided “follow” relationships between users in the CircleNet social network. Each record represents one user following another, including when the relationship was established and why.

Overview

  • Total Records: 20,000,000 relationships
  • File Format: CSV without headers
  • Primary Key: ColRel
  • Purpose: Track the social graph and user connections
Follow relationships are one-directional. A follow from ID1 to ID2 does not imply ID2 follows ID1 back.

Schema Definition

ColRel
integer
required
Unique sequential identifier for each follow relationship.
  • Range: 1 to 20,000,000
  • Constraint: Must be unique
  • Purpose: Primary key for the relationship record
ID1
integer
required
The user ID of the person who is following.
  • Range: 1 to 200,000
  • Constraint: Must exist in CircleNetPage.ID
  • Role: The follower in the relationship
ID2
integer
required
The user ID of the person being followed.
  • Range: 1 to 200,000
  • Constraint: Must exist in CircleNetPage.ID
  • Constraint: Must be different from ID1 (users cannot follow themselves)
  • Role: The followee in the relationship
DateOfRelation
integer
required
Timestamp indicating when the follow relationship started.
  • Range: 1 to 1,000,000
  • Format: Sequential integer representing a point in time
  • Purpose: Enables temporal analysis of relationship formation
Description
string
required
Textual description of the relationship type or reason for following.
  • Length: 20-50 characters
  • Format: No commas allowed
  • Examples: “college friend”, “work colleague”, “same hobby interest”, “family member”, “met at conference”

Example Records

1,1523,47892,245678,college friend from engineering
2,1523,89234,245680,work colleague on same team
3,47892,1523,567234,met at photography workshop
4,12345,67890,123456,family member cousin
5,67890,12340,789012,same hobby interest in hiking
The file does not include column headers. The order of values corresponds to: ColRel, ID1, ID2, DateOfRelation, Description.

Relationship Characteristics

One-Directional Nature

Important: Follow relationships are NOT symmetric.
  • Record: 100,1523,47892,245678,"college friend" means user 1523 follows user 47892
  • This does NOT mean user 47892 follows user 1523
  • For mutual follows, two separate records are needed

Self-Follow Prevention

# Valid
1,1523,47892,245678,college friend

# Invalid - ID1 and ID2 are the same
2,1523,1523,245680,following myself
The dataset generator must ensure ID1 ≠ ID2 for all records.

Description Creativity

Relationship descriptions should be varied and realistic:
  • Social: “college friend”, “high school buddy”, “childhood neighbor”
  • Professional: “work colleague”, “industry contact”, “met at conference”
  • Interest-based: “same hobby interest”, “book club member”, “gaming partner”
  • Family: “family member cousin”, “sibling connection”, “distant relative”
  • Activity-based: “gym workout partner”, “yoga class mate”, “volunteer together”

Data Constraints

Referential Integrity

Both ID1 and ID2 must reference valid CircleNetPage IDs:
ID1, ID2 ∈ CircleNetPage.ID
ID1 ≠ ID2

Uniqueness

  • Each ColRel must be unique
  • The same (ID1, ID2) pair can theoretically appear multiple times with different ColRel values, though this should be avoided in practice

Analytics Use Cases

This dataset enables:
  • Popularity Analysis: Count followers per user (Task D)
  • Follow-Back Detection: Identify mutual vs. one-way follows (Task H)
  • Regional Network Analysis: Analyze connections within RegionCode groups
  • Relationship Timeline: Track when connections formed
  • Average Popularity: Calculate mean followers for comparison (Task F)

Scale Considerations

With 20 million relationships across 200,000 users:
  • Average of 100 follow relationships per user
  • Some users may have many followers (popular users)
  • Others may have few or no followers
  • Distribution should be realistic (power-law distribution typical in social networks)

Next Steps

ActivityLog Dataset

Track user actions and page views

Generate Datasets

Create the Follows dataset

Build docs developers (and LLMs) love