Overview
- Total Records: 10,000,000 actions
- File Format: CSV without headers
- Primary Key: ActionId
- Purpose: Track user engagement and page interaction patterns
Schema Definition
Unique sequential identifier for each action.
- Range: 1 to 10,000,000
- Constraint: Must be unique
- Purpose: Primary key for the activity record
The user ID of the person performing the action.
- Range: 1 to 200,000
- Constraint: Must exist in CircleNetPage.ID
- Role: The actor who initiated the activity
The user ID of the page that was accessed or interacted with.
- Range: 1 to 200,000
- Constraint: Must exist in CircleNetPage.ID
- Note: Can be the same as ByWho (users can access their own pages)
Description of the action performed on the page.
- Length: 20-50 characters
- Format: No commas allowed
- Examples: “viewed profile page”, “left a comment on recent post”, “poked user”, “liked profile photo”, “sent friend request”
Timestamp of when the action occurred.
- Range: 1 to 1,000,000
- Format: Sequential integer (hour granularity)
- Purpose: Enables temporal analysis and inactivity detection
Example Records
The file does not include column headers. The order of values corresponds to: ActionId, ByWho, WhatPage, ActionType, ActionTime.
Action Type Rules
Valid Action Sequence
Action Type Categories
View Actions (must come first):- “viewed profile page”
- “viewed photos section”
- “viewed recent posts feed”
- “left a comment on recent post”
- “poked user playfully”
- “liked profile photo and banner”
- “sent friend request”
- “shared post to own timeline”
- “reacted with emoji to status”
- “sent private message”
- “tagged in a photo comment”
Be creative with ActionType descriptions while maintaining realism. Think about actual social media interactions.
Data Characteristics
Self-Interaction
Unlike the Follows dataset, users CAN interact with their own pages:Temporal Granularity
- ActionTime represents hours (granularity of 1 hour)
- Range 1-1,000,000 represents approximately 114 years of hourly data
- Used for detecting inactive users (e.g., no activity in 90 days = 2,160 hours)
Referential Integrity
Both ByWho and WhatPage must reference valid CircleNetPage IDs:Analytics Use Cases
This dataset enables:Popularity Metrics (Task B)
Count total accesses per page to find the 10 most popular CircleNetPages.User Engagement (Task E)
For each user, calculate:- Total number of actions performed
- Number of distinct pages accessed
- Identify users with “favorites” (frequent page visits)
Inactivity Detection (Task G)
Identify users with no ActivityLog entries in the last 90 days (2,160 hours).Activity Patterns
- Peak usage times
- Interaction type distributions
- User engagement levels
- Page visit frequency
Scale Considerations
With 10 million actions across 200,000 users:
- Average of 50 actions per user (as actor)
- Average of 50 actions per page (as target)
- Distribution will vary (active users generate more actions)
- Popular pages receive more views and interactions
Generation Requirements
When generating ActivityLog data:- Sequence ActionIds from 1 to 10,000,000
- Randomize ByWho and WhatPage within valid range (1-200,000)
- Ensure view-first rule: For each (ByWho, WhatPage) pair, first action must be a view
- Randomize ActionTime within range (1-1,000,000)
- Create realistic ActionType descriptions (20-50 chars, no commas)
- Vary action types to simulate realistic usage patterns
Next Steps
Dataset Overview
See how all datasets connect
Generate Datasets
Create the ActivityLog dataset