Skip to main content

Overview

TikTok Miner’s Instagram integration uses Apify’s Instagram scraper to collect comprehensive profile data, post metrics, and engagement analytics. This integration provides access to public Instagram data without requiring official API access.

How It Works

The Instagram integration uses the ActorManager class with specialized scraping methods:
  1. Profile Scraping: Collects account information, follower/following counts, bio, and verification status
  2. Post Analysis: Retrieves recent posts with likes, comments, and engagement data
  3. Discovery: Searches for creators by username, hashtag, or location
  4. Metrics Calculation: Computes engagement rates and audience quality scores

Architecture

ActorManager.scrapeInstagramProfile() → ApifyClient → Apify API

InstagramTransformer → UnifiedCreatorData → InstagramMetrics (Database)
Key Files:
  • Actor Manager: lib/apify/actor-manager.ts
  • Transformers: lib/apify/transformers.ts
  • Configuration: lib/apify/config.ts
  • Database Schema: prisma/schema.prisma (InstagramMetrics model)

Configuration

Environment Variables

# Apify Configuration
APIFY_API_KEY=your_apify_api_key_here
APIFY_BASE_URL=https://api.apify.com
APIFY_DEFAULT_TIMEOUT_SECS=300
APIFY_MAX_RETRIES=3

# Instagram Scrapers
APIFY_INSTAGRAM_SCRAPER_ID=apify/instagram-scraper
APIFY_INSTAGRAM_POST_SCRAPER_ID=apify/instagram-post-scraper

Service Setup

import { ActorManager } from '@/lib/apify/actor-manager';

const actorManager = new ActorManager({
  apiKey: process.env.APIFY_API_KEY!,
  baseUrl: process.env.APIFY_BASE_URL,
  maxRetries: 3,
  requestTimeoutMs: 120000,
});

Data Collection

Profile Metrics

The InstagramMetrics table stores:
FieldTypeDescription
accountIdStringInstagram account ID
usernameStringInstagram username (unique)
fullNameStringDisplay name
followerCountIntTotal followers
followingCountIntTotal following
mediaCountIntTotal posts
averageLikesFloatAverage likes per post
averageCommentsFloatAverage comments per post
engagementRateFloatCalculated engagement rate (%)
isVerifiedBooleanBlue checkmark status
isBusinessAccountBooleanBusiness account flag
businessCategoryStringBusiness category (if applicable)
reachIntReach metric (business accounts)
impressionsIntImpressions (business accounts)
profileViewsIntProfile views (business accounts)
websiteClicksIntWebsite clicks (business accounts)

Data Model

interface InstagramData {
  accountId: string;
  username: string;
  fullName?: string;
  profileUrl: string;
  bio?: string;
  website?: string;
  isVerified: boolean;
  isBusinessAccount?: boolean;
  businessCategory?: string;
  followerCount: number;
  followingCount: number;
  mediaCount: number;
  averageLikes?: number;
  averageComments?: number;
  engagementRate?: number;
  reach?: number;
  impressions?: number;
  profileViews?: number;
  websiteClicks?: number;
}

Usage Examples

import { ActorManager } from '@/lib/apify/actor-manager';
import { InstagramTransformer } from '@/lib/apify/transformers';

const actorManager = new ActorManager({
  apiKey: process.env.APIFY_API_KEY!,
});

// Scrape a profile
const result = await actorManager.scrapeInstagramProfile('instagram', {
  resultsLimit: 30,  // Number of posts to analyze
});

// Get dataset results
const profiles = await actorManager.getRunDataset(
  result.datasetId,
  { limit: 1 }
);

const profile = profiles[0];
console.log({
  username: profile.username,
  followers: profile.followersCount,
  posts: profile.postsCount,
  verified: profile.isVerified,
});

Rate Limits & Quotas

Apify Limits

  • Free Tier: $5 in free credits monthly
  • Personal Plan: $49/month with more compute units
  • Default Timeout: 300 seconds per run
  • Memory: 512 MB default (configurable up to 32 GB)
  • Concurrent Runs: Based on subscription tier

Instagram Scraper Configuration

const input = {
  directUrls: [`https://www.instagram.com/${username}/`],
  resultsType: 'details',
  resultsLimit: 30,      // Limit posts to reduce cost
  searchType: 'user',
  searchLimit: 1,
  addParentData: true,
};

const runOptions = {
  timeoutSecs: 300,
  memoryMbytes: 512,
};

Cost Optimization

Reduce Apify costs by:
  • Limiting resultsLimit to only necessary posts
  • Using shorter timeouts for simple profiles
  • Caching results in your database
  • Running bulk operations during off-peak hours
  • Monitoring usage with ApifyRunMetrics
// Track costs in database
const metrics = await prisma.apifyRunMetrics.create({
  data: {
    actorId: 'apify/instagram-scraper',
    platform: 'instagram',
    status: 'SUCCEEDED',
    startedAt: new Date(),
    finishedAt: new Date(),
    duration: 45000,  // milliseconds
    datasetItemCount: 1,
    costUsd: 0.03,
  },
});

Post Scraping

Scrape Posts by Hashtag

import { ActorManager } from '@/lib/apify/actor-manager';

const postScraperConfig = {
  actorId: 'apify/instagram-post-scraper',
  defaultInput: {
    search: '#techstartup',
    searchType: 'hashtag',
    searchLimit: 100,
    resultsLimit: 500,
    enableComments: true,
    commentsLimit: 50,
    enableLikes: true,
    likesLimit: 100,
  },
};

// This would require implementing a post scraping method
// or using the Apify client directly

Error Handling

try {
  const result = await actorManager.scrapeInstagramProfile(username);
  const profiles = await actorManager.getRunDataset(result.datasetId);
  
  if (!profiles || profiles.length === 0) {
    console.log('Profile not found or is private');
  }
} catch (error) {
  if (error.message.includes('Actor run failed')) {
    console.error('Scraper encountered an error');
  } else if (error.message.includes('timeout')) {
    console.error('Scraper timed out');
  } else {
    console.error('Unexpected error:', error);
  }
}

Data Validation

The InstagramTransformer ensures data quality:
import { InstagramTransformer } from '@/lib/apify/transformers';

const transformer = new InstagramTransformer();
const result = transformer.transform(apifyProfile);

if (!result.validation.isValid) {
  console.error('Validation errors:', result.validation.errors);
  console.warn('Warnings:', result.validation.warnings);
} else {
  // Data is safe to use
  const unifiedData = result.data;
}
Validation includes:
  • Username sanitization
  • Bio HTML stripping
  • URL normalization
  • Numeric type validation
  • Required field checks

Pipeline Discovery

Store discovered profiles for later processing:
const profile = await prisma.instagramProfile.create({
  data: {
    username: 'techcreator',
    fullName: 'Tech Creator',
    bio: 'Creating tech content',
    followerCount: 50000,
    followingCount: 1000,
    postsCount: 250,
    isVerified: true,
    isBusinessAccount: true,
    // 30-day metrics
    posts30d: 12,
    likesTotal30d: 60000,
    likesAvg30d: 5000,
    commentsTotal30d: 1200,
    commentsAvg30d: 100,
    engagementRate30d: 10.2,
    discoveryKeywords: ['tech', 'startup'],
  },
});

Monitoring

Track Data Quality

const quality = await prisma.apifyDataQualityMetrics.create({
  data: {
    platform: 'instagram',
    totalItemsProcessed: 150,
    validItemsCount: 145,
    invalidItemsCount: 5,
    validationErrors: [
      { error: 'Missing follower count', count: 3 },
      { error: 'Invalid username format', count: 2 },
    ],
  },
});

Set Up Alerts

const alert = await prisma.apifyAlert.create({
  data: {
    platform: 'instagram',
    severity: 'MEDIUM',
    alertType: 'DATA_QUALITY',
    message: 'Instagram data quality below threshold',
    conditions: { threshold: 0.95, actual: 0.93 },
  },
});

Best Practices

Instagram Scraping Guidelines:
  • Respect rate limits to avoid IP bans
  • Use Apify’s proxy rotation (enabled by default)
  • Cache results to minimize redundant requests
  • Only scrape public profiles (private accounts return limited data)
  • Monitor for changes in Instagram’s structure

Limitations

  • Private Accounts: Cannot scrape private profiles
  • Stories: Story data is not available via scraping
  • Reels Insights: Limited reel metrics without Business API
  • Real-time Data: Slight delay compared to official API
  • Business Insights: Partial data; full insights require Graph API

Next Steps

Build docs developers (and LLMs) love