file source reads log data from files on the local filesystem. It supports file globbing, rotation, checkpointing, and multiline aggregation.
Configuration
Parameters
Array of file patterns to include. Globbing is supported.
Array of file patterns to exclude. Takes precedence over
include.Where to start reading when a new file is discovered.Options:
beginning: Read from the start of the fileend: Read only new data appended to the file
The event field name for the file path. Set to empty string to disable.
Directory for storing file checkpoint positions. Defaults to the global
data_dir.Maximum size of a line before it is discarded (in bytes).
Ignore files with a modification date older than this many seconds.
Overrides the field name for the hostname. Defaults to global
log_schema.host_key.If set, adds the byte offset within the file to each event.
Delay between file discovery calls (in milliseconds).
Configuration for file identification strategy.Checksum strategy:Device and inode strategy:
Multiline aggregation configuration.
Character encoding configuration for non-UTF-8 files.
Prioritize draining oldest files before reading from newer ones.
Seconds to wait after reaching EOF before removing the file from tracking.
String sequence used to separate lines.
Output Schema
The file source produces log events with the following fields:| Field | Type | Description |
|---|---|---|
message | string | The log line content |
file | string | Full path to the source file |
host | string | Hostname where Vector is running |
offset | integer | Byte offset in the file (if offset_key is set) |
timestamp | timestamp | When the event was ingested |
source_type | string | Always “file” |
Examples
Basic File Tailing
Multiple Patterns with Exclusions
Java Stack Trace Aggregation
Custom Fields and Checkpointing
Non-UTF-8 Encoding
How It Works
File Discovery
The file source periodically scans for files matching theinclude patterns. The scan frequency is controlled by glob_minimum_cooldown_ms.
Checkpointing
File positions are checkpointed to disk, allowing Vector to resume from where it left off after a restart. Checkpoints are stored in thedata_dir.
File Rotation
The source handles file rotation gracefully:- Files are identified using the configured fingerprinting strategy
- When a file is rotated, the source continues reading the old file until EOF
- The new file is automatically discovered and tailed
Fingerprinting
Two strategies are available:- Checksum (default): Uses the first N lines to compute a checksum
- Device and Inode: Uses filesystem metadata (Unix only)
Performance
- Handles millions of events per second with proper tuning
- Memory usage scales with the number of files being watched
- Checkpoint frequency impacts performance and durability trade-off
Best Practices
- Use specific glob patterns to minimize file discovery overhead
- Set
ignore_older_secsto avoid processing old rotated files - Configure appropriate
max_line_bytesto prevent memory issues - Use
oldest_first = truewhen backfilling historical data - Enable
remove_after_secsto prevent unbounded state growth