Overview
The Historia Para Gandules dataset follows a structured schema with 8 fields per video post. Data is collected from Instagram and stored in CSV format with UTF-8 encoding.Schema Structure
The dataset is exported as a CSV file with the following headers:Field Specifications
Temporal Fields
Publication timestamp - Date and time when the video was posted to Instagram
- Format:
YYYY-MM-DD HH:MM:SS - Example:
2024-01-15 14:30:00 - Source:
post.date.strftime('%Y-%m-%d %H:%M:%S') - Timezone: UTC (from Instagram API)
Content Fields
Caption/description text - The text content accompanying the video
- Max length: Variable (Instagram allows up to 2,200 characters)
- Default value:
"Sin texto"if no caption provided - Encoding: UTF-8
- May contain: Hashtags, mentions, emojis, line breaks
Direct video file URL - CDN link to the actual video file
- Format: Full HTTPS URL to Instagram’s CDN
- Default value:
"Sin URL"if unavailable - Expiration: URLs may expire after a period of time
- Usage: Download or stream the video content
Instagram post permalink - Permanent link to the Instagram post
- Format:
https://www.instagram.com/p/{shortcode}/ - Example:
https://www.instagram.com/p/ABC123xyz/ - Uniqueness: Unique identifier for each post
- Permanence: Stable link unless post is deleted
Use this field as the primary key to identify unique posts and avoid duplicates.
Engagement Fields
Like count - Number of likes the video has received
- Range: 0 to unlimited
- Default value:
0if unavailable - Type: Non-negative integer
- Note: Count is at the time of scraping (may increase over time)
Comment count - Number of comments on the video
- Range: 0 to unlimited
- Default value:
0if unavailable - Type: Non-negative integer
- Note: Count reflects total comments at scraping time
- Limitation: Individual comment text is not collected
View count - Total number of times the video has been viewed
- Range: 0 to unlimited when available
- Default value:
"No disponible"if Instagram doesn’t provide the data - Type: Integer (or string for unavailable cases)
- Accuracy: Instagram’s view count methodology
View counts may not be available for all posts depending on Instagram’s API restrictions or account settings.
Technical Fields
Video duration - Length of the video in seconds
- Unit: Seconds (with decimal precision)
- Example:
45.5for a 45.5-second video - Default value:
"No disponible"if unavailable - Type: Float (or string for unavailable cases)
- Typical range: 0 to 90 seconds (Instagram Reels limit)
Complete Example
Here’s a complete record with all fields populated:Data Types Reference
Data Validation
When working with the collected data, apply these validation rules:Missing Data Handling
Some fields may contain default values when data is unavailable:Texto del reel: 'Sin texto'
Texto del reel: 'Sin texto'
Reason: Post was published without a captionHandling: Treat as empty string or null in analysisFrequency: Rare (most posts have captions)
URL del video: 'Sin URL'
URL del video: 'Sin URL'
Reason: Video URL is not accessible (rare)Handling: Skip video download, use only metadataFrequency: Very rare
Visualizaciones: 'No disponible'
Visualizaciones: 'No disponible'
Reason: Instagram API doesn’t provide view count for this postHandling: Exclude from view-based analysis or impute based on likesFrequency: Can occur for older posts or due to API changes
Duración del video: 'No disponible'
Duración del video: 'No disponible'
Reason: Video duration metadata is missingHandling: Download video to calculate duration locallyFrequency: Rare
Data Processing Pipeline
Typical workflow for working with the collected data:Example Data Transformations
- Calculate Engagement Rate
- Convert to JSON
- Time Series Analysis
Schema Evolution
As the project evolves, the schema may be extended with additional fields:Potential Future Fields
- Geolocation data (if posts are tagged)
- Mentioned historical periods
- Topics/categories (manual or AI tagging)
- Sentiment analysis scores
Backward Compatibility
- New fields will be added as optional columns
- Existing fields will maintain their format
- Legacy CSV files will remain importable
Next Steps
Scraping Guide
Learn how to collect data using the scraper
Data Sources
Understand where the data comes from