The Three Datasets
The CircleNet Analytics system is built on three core datasets:- CircleNetPage - User profile information for 200,000 users
- Follows - One-sided follow relationships with 20 million records
- ActivityLog - User actions and page accesses with 10 million records
Dataset Relationships
These datasets are designed to work together for comprehensive social network analytics.
How They Connect
- CircleNetPage serves as the master user directory with unique IDs (1-200,000)
- Follows records use
ID1andID2to reference CircleNetPage usersID1followsID2(one-directional relationship)- The relationship is not symmetric: ID1 → ID2 is different from ID2 → ID1
- ActivityLog tracks when users interact with pages
ByWhoreferences the user performing the actionWhatPagereferences the CircleNetPage being accessed
Scale Information
| Dataset | Records | Purpose |
|---|---|---|
| CircleNetPage | 200,000 | User profiles and demographics |
| Follows | 20,000,000 | Social graph relationships |
| ActivityLog | 10,000,000 | User interaction history |
This scale is designed to test big data processing capabilities while remaining manageable for educational and development purposes.
Data Format
All datasets are stored as CSV files without headers:- Values are comma-separated
- No column names in the files
- Column position determines the attribute
- String values do not contain commas
Use Cases
These datasets enable analytics such as:- User popularity rankings
- Hobby-based user segmentation
- Follow-back analysis
- Activity patterns and engagement metrics
- Regional network analysis
- Inactive user identification
Next Steps
CircleNetPage
User profile schema
Follows
Relationship schema
ActivityLog
Activity tracking schema