Overview
- Total Records: 20,000,000 relationships
- File Format: CSV without headers
- Primary Key: ColRel
- Purpose: Track the social graph and user connections
Follow relationships are one-directional. A follow from ID1 to ID2 does not imply ID2 follows ID1 back.
Schema Definition
Unique sequential identifier for each follow relationship.
- Range: 1 to 20,000,000
- Constraint: Must be unique
- Purpose: Primary key for the relationship record
The user ID of the person who is following.
- Range: 1 to 200,000
- Constraint: Must exist in CircleNetPage.ID
- Role: The follower in the relationship
The user ID of the person being followed.
- Range: 1 to 200,000
- Constraint: Must exist in CircleNetPage.ID
- Constraint: Must be different from ID1 (users cannot follow themselves)
- Role: The followee in the relationship
Timestamp indicating when the follow relationship started.
- Range: 1 to 1,000,000
- Format: Sequential integer representing a point in time
- Purpose: Enables temporal analysis of relationship formation
Textual description of the relationship type or reason for following.
- Length: 20-50 characters
- Format: No commas allowed
- Examples: “college friend”, “work colleague”, “same hobby interest”, “family member”, “met at conference”
Example Records
The file does not include column headers. The order of values corresponds to: ColRel, ID1, ID2, DateOfRelation, Description.
Relationship Characteristics
One-Directional Nature
Self-Follow Prevention
The dataset generator must ensure ID1 ≠ ID2 for all records.
Description Creativity
Relationship descriptions should be varied and realistic:- Social: “college friend”, “high school buddy”, “childhood neighbor”
- Professional: “work colleague”, “industry contact”, “met at conference”
- Interest-based: “same hobby interest”, “book club member”, “gaming partner”
- Family: “family member cousin”, “sibling connection”, “distant relative”
- Activity-based: “gym workout partner”, “yoga class mate”, “volunteer together”
Data Constraints
Referential Integrity
Both ID1 and ID2 must reference valid CircleNetPage IDs:Uniqueness
- Each ColRel must be unique
- The same (ID1, ID2) pair can theoretically appear multiple times with different ColRel values, though this should be avoided in practice
Analytics Use Cases
This dataset enables:- Popularity Analysis: Count followers per user (Task D)
- Follow-Back Detection: Identify mutual vs. one-way follows (Task H)
- Regional Network Analysis: Analyze connections within RegionCode groups
- Relationship Timeline: Track when connections formed
- Average Popularity: Calculate mean followers for comparison (Task F)
Scale Considerations
With 20 million relationships across 200,000 users:
- Average of 100 follow relationships per user
- Some users may have many followers (popular users)
- Others may have few or no followers
- Distribution should be realistic (power-law distribution typical in social networks)
Next Steps
ActivityLog Dataset
Track user actions and page views
Generate Datasets
Create the Follows dataset