--profile, JobSpy automatically tracks which jobs you’ve already seen. On subsequent runs, only new jobs are returned. This eliminates duplicate work and focuses your attention on fresh opportunities.
How It Works
Deduplication uses two strategies:- URL rolling window — Job URLs seen within the last 7 days are filtered out.
- Date watermark — Jobs with a
date_postedon or before the last run’s most recent date are skipped.
state section of jobspy.json and updated automatically after each run.
Enabling Deduplication
Dedup is automatic when using--profile:
--profile, no dedup tracking occurs (every run is independent).
First Run
The first run of a profile returns all matching jobs and saves state:Subsequent Runs
Run the same profile again hours or days later:URL Rolling Window
Every job URL is recorded with aseenAt timestamp (ISO date, e.g., "2026-03-05"). URLs seen within the last 7 days are filtered out on the next run.
After 7 days, the URL is pruned from state and can be seen again if it reappears in search results.
Example state:
2026-03-10, the entry from 2026-03-03 (7 days old) will be pruned automatically.
Date Watermark
Each provider tracks the most recentdate_posted value seen:
date_posted <= lastSeenDate are skipped. This catches jobs that reappear in search results with the same URL but an older date.
Note: Some sites (e.g., Bayt) do not provide date_posted. For these, only the URL rolling window is used.
Skipping Dedup (—all)
To temporarily bypass dedup filtering while still updating state, use--all:
--all) will filter them out.
Use case: Re-fetch all jobs for a one-time export without losing your dedup history.
Viewing Profile State
Check when each profile was last run:State File Structure
Thestate section of jobspy.json stores per-profile, per-provider dedup data:
State Fields
| Field | Type | Description |
|---|---|---|
version | number | State format version (currently 1) |
profiles | object | Map of profile names to profile state |
lastRunAt | string (ISO timestamp) | When the profile was last executed |
providers | object | Map of site names to provider state |
lastSeenDate | string (ISO date) | Most recent date_posted from last run |
seenUrls | array | List of { url, seenAt } objects within the 7-day window |
Ad-Hoc Profiles
You don’t need a config entry injobspy.json to use dedup. Running --profile with any name creates state tracking:
my-search profile in the state section, even though there’s no matching entry in config.profiles.
Subsequent runs will use the saved state:
State Pruning
URLs older than 7 days are automatically removed fromseenUrls on each run. This keeps the state file size manageable.
If a job URL is pruned and reappears in search results 8+ days later, it will be treated as new.
When Dedup Doesn’t Apply
Dedup is not active when:- Running without
--profile - Using
--describeor--id(single job fetching) - Using
--initor--list-profiles(utility commands)
Example: Daily Job Scraping
Setup:jobspy.json:
Resetting State
To clear dedup history for a profile, delete its entry from thestate.profiles object in jobspy.json:
Before:
--profile frontend will treat all jobs as new.
Alternatively, create a new profile with a different name:
Dedup Accuracy
Dedup is based on:- Exact URL matching —
https://linkedin.com/jobs/view/123vshttps://linkedin.com/jobs/view/456 - Date comparison —
2026-03-05vs2026-03-04
?utm_source=...) are stored as-is. Minor URL variations may cause duplicates.
Limitation: Jobs that change URLs (e.g., reposted with a new ID) will be treated as new.
Multi-Site Dedup
Each provider (site) maintains its own state. A job seen on LinkedIn does not prevent the same job from appearing in Indeed results:job_url is identical).
Dedup and --offset
Using --offset for pagination does not affect dedup. Jobs are filtered after scraping, so offset only controls which page of results to fetch.
Example:
Related Pages
Config Profiles
Define reusable search profiles with jobspy.json
Commands
Complete CLI flag reference
