Primary Data Source
Historia Para Gandules collects all data from a single Instagram account:@historiaparagandules
The official Historia Para Gandules Instagram account containing historical educational content about Puerto Rico.
Account Overview
The@historiaparagandules Instagram account serves as the sole data source for this project. It publishes historical educational content focused on Puerto Rican history and culture.
Content Focus
Historical Education
Historical Education
Educational videos explaining historical events, figures, and cultural aspects of Puerto Rico.
Video Format
Video Format
Content is primarily delivered through Instagram Reels - short-form vertical videos optimized for mobile viewing.
Accessible Language
Accessible Language
Content uses colloquial and accessible language to make history engaging for a broad audience (“para gandules” = “for lazy people” in a playful, approachable way).
Content Types
The scraper specifically targets video posts (reels) from the account:Video Posts (Reels)
Collected: Short-form videos containing historical narrativesThe scraper filters for
post.is_video == True to capture only video content.Image Posts
Not Collected: Static image posts are filtered outThe current implementation focuses exclusively on video content.
Why Video-Only?
The project focuses on video content because:- Rich engagement data: Videos provide more metrics (views, duration)
- Primary content format: Reels are the main content type for the account
- Geospatial visualization: Video content is more suitable for the interactive timeline visualization
- Consistent structure: Videos have standardized metadata fields
Data Collection Method
Data is collected using the Instaloader Python library, which accesses Instagram’s public API:The scraper only accesses publicly available data. No authentication or login is required.
Available Metrics
For each video post, the scraper collects:Temporal Data
- Publication date and time: When the content was posted
Content Data
- Caption/text: The description or narrative accompanying the video
- Video URL: Direct link to the video file
- Post URL: Permalink to the Instagram post
Engagement Metrics
- Likes: Number of likes received
- Comments: Number of comments
- Views: Total video view count
Technical Metadata
- Duration: Video length in seconds
Data Freshness
The scraping script collects all historical posts from the account’s inception. To update the dataset:Incremental Updates
The scraper iterates through all posts each time. For large accounts, consider implementing date-based filtering to collect only new posts.
Data Limitations
Current Limitations
- Public data only: Only publicly visible information is collected
- No historical edits: Caption edits after publication are not tracked
- Deleted posts: Posts deleted from Instagram will not appear in subsequent scrapes
- Rate limiting: Instagram may throttle frequent scraping requests
- View counts: May show as “No disponible” for older posts or due to API restrictions
Future Data Sources
Potential expansions for the data collection:Additional Accounts
Incorporate related Puerto Rican historical education accounts
Cross-Platform
Expand to TikTok, YouTube, or other platforms where similar content exists
Manual Curation
Add manually curated historical data not available on social media
Geolocation Tags
Extract location data from posts that include geotags
Data Ethics
This project collects only publicly available data and respects Instagram’s terms of service. The data is used for educational and research purposes.
Ethical Considerations
- Public content: Only publicly accessible posts are scraped
- No personal data: User comments and personal information are not collected
- Attribution: Content is attributed to the original creator
- Non-commercial: Data is used for educational visualization purposes
- Rate limiting: Scraping is performed responsibly to avoid overloading Instagram’s servers
Next Steps
Scraping Guide
Learn how to run the Instagram scraper
Data Schema
Explore the complete data structure and field specifications