This page provides a detailed statistical analysis of the Historia Para Gandules dataset, examining correlations between metrics, distribution patterns, and statistical significance of findings.
The high standard deviation relative to mean values indicates significant variability in engagement, with some videos achieving viral status while others maintain steady baseline performance.
import numpy as np# Analyze likes distributionprint(f"Mean Likes: {df['Likes'].mean():.2f}")print(f"Median Likes: {df['Likes'].median():.2f}")print(f"Mode: Median < Mean indicates right-skewed distribution")# Visualize distributionfig, ax = plt.subplots(figsize=(12, 6))ax.hist(df['Likes'], bins=30, edgecolor='black', alpha=0.7)ax.axvline(df['Likes'].mean(), color='red', linestyle='--', label='Mean')ax.axvline(df['Likes'].median(), color='green', linestyle='--', label='Median')ax.set_xlabel('Likes')ax.set_ylabel('Frequency')ax.set_title('Distribution of Likes across Videos')ax.legend()plt.show()
Key Finding: The distribution is right-skewed, meaning most videos cluster around the median (828 likes) with a few high-performing outliers pulling the mean higher (1,316 likes).
# Analyze video durationprint(f"Average Duration: {df['Duración del video (s)'].mean():.2f} seconds")print(f"Median Duration: {df['Duración del video (s)'].median():.2f} seconds")print(f"Range: {df['Duración del video (s)'].min():.0f}s - {df['Duración del video (s)'].max():.0f}s")
# Correlation between duration and engagementfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))# Duration vs Likesax1.scatter(df['Duración del video (s)'], df['Likes'], alpha=0.6)ax1.set_xlabel('Duration (seconds)')ax1.set_ylabel('Likes')ax1.set_title('Duration vs Likes')# Duration vs Viewsax2.scatter(df['Duración del video (s)'], df['Visualizaciones'], alpha=0.6, color='orange')ax2.set_xlabel('Duration (seconds)')ax2.set_ylabel('Views')ax2.set_title('Duration vs Views')plt.tight_layout()plt.show()
Duration shows weak correlation with engagement, suggesting that content quality and topic relevance are more important than video length for this educational content.