Skip to main content
The LinkedIn integration pulls guest career history, profile data, and professional summary from LinkedIn using RapidAPI. This data powers career timeline generation and enriches episode metadata.

Overview

Provider: RapidAPI LinkedIn Scraper Data retrieved:
  • Guest name and headline
  • Professional summary
  • Career history (positions, companies, dates, descriptions)
  • LinkedIn URL
Caching: Profile data is cached in Sanity for 24 hours to minimize API calls

Setup

1

Get RapidAPI key

  1. Sign up at https://rapidapi.com
  2. Subscribe to a LinkedIn scraper API
  3. Copy your API key from the dashboard
2

Add to environment

Local development (.dev.vars):
RAPIDAPI_KEY=your-rapidapi-key-here
Production:
wrangler secret put RAPIDAPI_KEY
3

Test connection

Scrape a guest LinkedIn profile from the Episode page to verify the integration works.

API endpoint

Scrape profile

Endpoint: POST /api/linkedin/profile Request:
{
  "linkedinUrl": "https://www.linkedin.com/in/chris-pacifico/"
}
Response:
{
  "profile": {
    "name": "Chris Pacifico",
    "headline": "CTO at TechCorp | Former VP Engineering",
    "summary": "Experienced technology leader with 15+ years...",
    "careerHistory": [
      {
        "_key": "pos-0-1234567890",
        "title": "CTO",
        "company": "TechCorp",
        "startDate": "Jan 2020",
        "endDate": null,
        "description": "Leading technology strategy..."
      },
      {
        "_key": "pos-1-1234567891",
        "title": "VP Engineering",
        "company": "StartupX",
        "startDate": "Mar 2017",
        "endDate": "Dec 2019",
        "description": "Built engineering team from 5 to 50..."
      }
    ],
    "scrapedAt": "2024-03-15T10:30:00Z",
    "linkedinUrl": "https://www.linkedin.com/in/chris-pacifico/"
  }
}

Usage in Pulse Content

Scrape guest profile

import { scrapeLinkedInProfile } from '@/services/linkedin'

const profile = await scrapeLinkedInProfile(
  'https://www.linkedin.com/in/chris-pacifico/'
)

console.log(profile.name) // "Chris Pacifico"
console.log(profile.headline) // "CTO at TechCorp"
console.log(profile.positions.length) // Number of career positions

With Sanity caching

import { 
  scrapeLinkedInProfile,
  fetchGuestProfileByUrl,
  upsertGuestProfile 
} from '@/services/linkedin'

// Check Sanity cache first
const cached = await fetchGuestProfileByUrl(linkedinUrl)

// Scrape with cache awareness
const profile = await scrapeLinkedInProfile(linkedinUrl, {
  cachedProfile: cached,
  forceRefresh: false, // Use cache if fresh
})

// Store in Sanity for future use
await upsertGuestProfile({
  linkedinUrl: profile.linkedinUrl,
  name: profile.name,
  headline: profile.headline,
  summary: profile.summary,
  careerHistory: linkedInToCareerHistory(profile),
})

Force refresh

// Skip cache and scrape fresh data
const profile = await scrapeLinkedInProfile(linkedinUrl, {
  forceRefresh: true,
})

Caching strategy

Three-tier cache

1

In-memory cache

First check: Session-level cache (cleared on page refresh)Fastest: No network requestDuration: Until page refresh
2

Sanity cache

Second check: Database-persisted cacheFast: Single Sanity queryDuration: 24 hours from scrapedAt timestamp
3

RapidAPI scrape

Last resort: Fresh scrape from LinkedInSlow: External API call (3-10 seconds)Cost: Consumes RapidAPI quota

Cache freshness check

function isCacheFresh(scrapedAt?: string): boolean {
  if (!scrapedAt) return false
  
  const scrapedTime = new Date(scrapedAt).getTime()
  const now = Date.now()
  const oneDayMs = 24 * 60 * 60 * 1000
  
  return (now - scrapedTime) < oneDayMs
}

Clear cache

import { clearProfileCache } from '@/services/linkedin'

// Clear specific profile
clearProfileCache('https://www.linkedin.com/in/chris-pacifico/')

// Clear all cached profiles
clearProfileCache()

LinkedIn URL validation

Supported formats

import { isValidLinkedInUrl, normalizeLinkedInUrl } from '@/services/linkedin'

// Valid URLs
isValidLinkedInUrl('https://www.linkedin.com/in/chris-pacifico/') // true
isValidLinkedInUrl('linkedin.com/in/chris-pacifico') // true
isValidLinkedInUrl('www.linkedin.com/in/chris-pacifico/') // true

// Invalid URLs
isValidLinkedInUrl('https://twitter.com/chrispac') // false
isValidLinkedInUrl('https://linkedin.com/company/techcorp') // false

// Normalization
normalizeLinkedInUrl('linkedin.com/in/Chris-Pacifico')
// Returns: 'https://www.linkedin.com/in/chris-pacifico/'

Extract from transcript

Transcript headers often include LinkedIn URLs:
385-Chris Pacifico
Host: Phil Howard
Guest: Chris Pacifico
https://www.linkedin.com/in/chris-pacifico/
Pulse Content automatically extracts and normalizes these URLs during episode creation.

Career timeline generation

Generate prompt

import { generateCareerTimelinePrompt } from '@/services/linkedin'

const prompt = generateCareerTimelinePrompt(profile)

// Prompt includes:
// - Guest name and current title
// - Up to 6 career positions (most recent first)
// - Visual style guidelines (isometric 3D, YBH brand colors)
// - Layout instructions (horizontal timeline, numbered milestones)
// - Quality standards (no logos, no facial features, professional tone)

Generate image

import { scrapeLinkedInProfile, generateCareerTimelinePrompt } from '@/services/linkedin'
import { createTask, waitForTask } from '@/services/kieai'

// Scrape profile
const profile = await scrapeLinkedInProfile(linkedinUrl)

// Generate prompt
const prompt = generateCareerTimelinePrompt(profile)

// Create Kie.ai task
const { taskId } = await createTask({
  prompt,
  aspectRatio: '16:9',
  resolution: '2K',
})

// Wait for completion
const result = await waitForTask(taskId)

if (result.state === 'success') {
  console.log('Career timeline URL:', result.imageUrl)
}

Data structure

LinkedInProfile

interface LinkedInProfile {
  name: string
  headline: string
  summary: string
  positions: Array<{
    title: string
    company: string
    startDate: string
    endDate?: string
    description?: string
  }>
  scrapedAt?: string
  linkedinUrl?: string
}

CareerPosition (Sanity)

interface CareerPosition {
  _key: string // Unique key for Sanity array items
  title: string
  company: string
  startDate: string // "Jan 2020"
  endDate?: string // "Dec 2022" or undefined for current
  description?: string
}

GuestProfile (Sanity schema)

interface GuestProfile {
  _id: string
  _type: 'guestProfile'
  name: string
  linkedinUrl: string
  headline: string
  summary: string
  careerHistory: CareerPosition[]
  scrapedAt: string // ISO 8601 timestamp
}
import { linkGuestToEpisode } from '@/services/sanity'

// After creating or finding guest profile
const guestProfile = await upsertGuestProfile({ ... })

// Link to episode
await linkGuestToEpisode(episodeId, guestProfile._id)

// Episode now has guestRef field pointing to guest

Error handling

Profile not found

try {
  const profile = await scrapeLinkedInProfile(linkedinUrl)
} catch (error) {
  if (error.message.includes('404')) {
    console.error('LinkedIn profile not found')
    // Show error to user: Invalid LinkedIn URL
  }
}

Rate limit exceeded

try {
  const profile = await scrapeLinkedInProfile(linkedinUrl)
} catch (error) {
  if (error.message.includes('429')) {
    console.error('RapidAPI rate limit exceeded')
    // Show error: Too many requests, try again later
    // Or: Upgrade RapidAPI plan
  }
}

Invalid LinkedIn URL

import { isValidLinkedInUrl } from '@/services/linkedin'

if (!isValidLinkedInUrl(linkedinUrl)) {
  throw new Error('Invalid LinkedIn URL. Must be in format: linkedin.com/in/username')
}

Network timeout

// Set timeout for scrape request
const controller = new AbortController()
const timeoutId = setTimeout(() => controller.abort(), 15000) // 15 seconds

try {
  const response = await fetch('/api/linkedin/profile', {
    method: 'POST',
    signal: controller.signal,
    body: JSON.stringify({ linkedinUrl }),
  })
} catch (error) {
  if (error.name === 'AbortError') {
    console.error('LinkedIn scrape timed out')
  }
} finally {
  clearTimeout(timeoutId)
}

Cost management

RapidAPI pricing

LinkedIn scraper APIs typically charge per request:
  • Free tier: 10-50 requests per month
  • Paid tiers: 500-5000+ requests per month

Reduce API calls

1

Use Sanity cache

Always check Sanity for existing profile before scraping:
const cached = await fetchGuestProfileByUrl(linkedinUrl)
if (cached && isCacheFresh(cached.scrapedAt)) {
  return cached // No API call needed
}
2

Batch scrapes

Scrape multiple guest profiles during off-peak hours rather than on-demand.
3

Manual entry fallback

Allow team to manually enter career history if API quota is exhausted.
4

Monitor usage

Track RapidAPI usage in dashboard and set alerts for quota limits.

Best practices

  • Always normalize URLs: Use normalizeLinkedInUrl() before querying or caching
  • Check cache first: Avoid unnecessary API calls by checking Sanity cache
  • Store in Sanity: Always upsert profile after successful scrape
  • Link to episodes: Connect guest profiles to episodes via guestRef
  • Handle errors gracefully: Show clear error messages for invalid URLs or rate limits
  • Refresh manually: Provide “Refresh LinkedIn Data” button for users to force refresh
  • Validate before scraping: Use isValidLinkedInUrl() to catch invalid URLs early
  • Generate timelines async: Scrape profile and generate career timeline in background

Troubleshooting

API key invalid

Error: API error: 401
Solution: Verify RAPIDAPI_KEY is set correctly:
echo $RAPIDAPI_KEY

Profile data incomplete

Causes:
  • Guest has minimal LinkedIn profile
  • Privacy settings restrict scraping
  • Profile URL is incorrect
Solutions:
  • Ask guest to update LinkedIn profile
  • Manually enter career history
  • Verify LinkedIn URL is correct

Scrape takes too long

Expected time: 3-10 seconds If longer:
  • RapidAPI service may be slow
  • Network connectivity issues
  • Increase timeout to 15-30 seconds

Career history dates inconsistent

Issue: LinkedIn returns dates in various formats (“Jan 2020”, “2020”, “Present”) Solution: Normalize dates in UI:
function formatDateRange(start: string, end?: string): string {
  if (!end) return `${start} - Present`
  return `${start} - ${end}`
}

Build docs developers (and LLMs) love