Data Enrichment

Data enrichment adds contextual information to your events, making them more valuable for analysis and troubleshooting. Vector provides powerful enrichment capabilities through enrichment tables and VRL functions.

What is Data Enrichment?

Enrichment enhances raw event data by adding contextual information from external sources. Common enrichment use cases include:

GeoIP lookups: Add geographic information based on IP addresses
Service mapping: Add service metadata based on identifiers
User information: Enrich events with user profiles
Asset inventory: Add device or infrastructure details
Cost allocation: Add billing or organization tags

Enrichment Tables Overview

Vector supports multiple enrichment table types:

File tables: CSV or other structured files
GeoIP tables: MaxMind GeoIP databases
MMDB tables: Generic MaxMind database format

Enrichment tables are defined globally and accessed from VRL scripts.

Setting Up Enrichment Tables

File-Based Enrichment

File-based enrichment tables load data from CSV or structured files.

Create your enrichment data file

Create a CSV file with your enrichment data:

# services.csv
service_id,service_name,team,cost_center
srv-001,web-frontend,platform,CC-1001
srv-002,api-gateway,backend,CC-1002
srv-003,user-service,backend,CC-1002
srv-004,payment-service,finance,CC-1003

Configure the enrichment table

Add the table to your Vector configuration:

# Global enrichment table configuration
[enrichment_tables.services]
  type = "file"
  file.path = "/etc/vector/services.csv"
  file.encoding.type = "csv"
  
  # Optional: Define schema
  schema.service_id = "string"
  schema.service_name = "string"
  schema.team = "string"
  schema.cost_center = "string"

Use the table in VRL

Query the table from your remap transform:

# Enrich event with service information
service_data = get_enrichment_table_record!(
  "services",
  { "service_id": .service_id }
)

.service_name = service_data.service_name
.team = service_data.team
.cost_center = service_data.cost_center

GeoIP Enrichment

GeoIP enrichment adds geographic information based on IP addresses.

Download GeoIP database

Obtain a MaxMind GeoIP database (GeoLite2 or commercial):

# Download GeoLite2 City database
wget https://git.io/GeoLite2-City.mmdb -O /etc/vector/GeoLite2-City.mmdb

Configure GeoIP table

[enrichment_tables.geoip]
  type = "geoip"
  path = "/etc/vector/GeoLite2-City.mmdb"

Enrich with geographic data

# Lookup IP address
geo = get_enrichment_table_record!(
  "geoip",
  { "ip": .client_ip }
)

# Add geographic fields
.geo.city = geo.city_name
.geo.country = geo.country_name
.geo.country_code = geo.country_code
.geo.continent = geo.continent_code
.geo.latitude = geo.latitude
.geo.longitude = geo.longitude
.geo.timezone = geo.timezone
.geo.postal_code = geo.postal_code

Enrichment Table Functions

VRL provides two functions for querying enrichment tables:

get_enrichment_table_record

Returns a single record that matches the condition:

# Get single record
user_info = get_enrichment_table_record!(
  "users",
  { "user_id": .user_id }
)

.user_name = user_info.name
.user_email = user_info.email

Error handling:

# Handle missing records
user_info, err = get_enrichment_table_record(
  "users",
  { "user_id": .user_id }
)

if err == null {
  .user_name = user_info.name
} else {
  .user_name = "unknown"
  log("User not found: " + .user_id, level: "warn")
}

find_enrichment_table_records

Returns all records that match the condition:

# Find multiple records
matching_services = find_enrichment_table_records!(
  "services",
  { "team": .team_name }
)

# Extract service names
.team_services = map_values(matching_services) -> |value| {
  value.service_name
}

.service_count = length(matching_services)

Advanced Enrichment Patterns

Multi-Field Matching

Match on multiple fields for precise lookups:

# services_by_environment.csv
# service_id,environment,endpoint,port
# srv-001,production,prod-api.example.com,443
# srv-001,staging,stage-api.example.com,443

# Lookup with multiple conditions
service = get_enrichment_table_record!(
  "services",
  {
    "service_id": .service_id,
    "environment": .environment
  }
)

.endpoint = service.endpoint
.port = service.port

Date Range Enrichment

Enrich based on time ranges:

# pricing_history.csv
# product_id,start_date,end_date,price
# prod-001,2023-01-01,2023-06-30,99.99
# prod-001,2023-07-01,2023-12-31,89.99

pricing = find_enrichment_table_records!(
  "pricing",
  {
    "product_id": .product_id,
    "from": .purchase_date,
    "to": .purchase_date
  }
)

if length(pricing) > 0 {
  .price = pricing[0].price
}

Wildcard Matching

Use wildcards for flexible matching:

# network_zones.csv
# subnet,zone_name,security_level
# 10.0.0.0/8,internal,high
# 192.168.0.0/16,office,medium

zone = get_enrichment_table_record(
  "network_zones",
  { "subnet": .client_ip },
  wildcard: "0.0.0.0"
)

if zone != null {
  .security_zone = zone.zone_name
  .security_level = zone.security_level
}

Nested Enrichment

Perform multiple enrichment lookups:

# First lookup: get service info
service = get_enrichment_table_record!(
  "services",
  { "service_id": .service_id }
)

.service_name = service.service_name
.team_id = service.team_id

# Second lookup: get team info based on first result
team = get_enrichment_table_record!(
  "teams",
  { "team_id": .team_id }
)

.team_name = team.name
.team_email = team.email
.cost_center = team.cost_center

Practical Enrichment Recipes

Recipe 1: Complete GeoIP Enrichment

Configure GeoIP table

[enrichment_tables.geoip_city]
  type = "geoip"
  path = "/etc/vector/GeoLite2-City.mmdb"

[enrichment_tables.geoip_asn]
  type = "geoip"
  path = "/etc/vector/GeoLite2-ASN.mmdb"

Create enrichment transform

[transforms.enrich_geoip]
  type = "remap"
  inputs = ["parse_logs"]
  source = '''
    # City-level enrichment
    city_data, err = get_enrichment_table_record(
      "geoip_city",
      { "ip": .client_ip }
    )
    
    if err == null {
      .geo.city = city_data.city_name
      .geo.country = city_data.country_name
      .geo.country_code = city_data.country_code
      .geo.region = city_data.region_name
      .geo.latitude = city_data.latitude
      .geo.longitude = city_data.longitude
      .geo.timezone = city_data.timezone
    }
    
    # ASN enrichment
    asn_data, err = get_enrichment_table_record(
      "geoip_asn",
      { "ip": .client_ip }
    )
    
    if err == null {
      .asn.number = asn_data.autonomous_system_number
      .asn.organization = asn_data.autonomous_system_organization
    }
  '''

Add conditional routing

[transforms.route_by_country]
  type = "route"
  inputs = ["enrich_geoip"]
  
  route.us_traffic.type = "vrl"
  route.us_traffic.source = '.geo.country_code == "US"'
  
  route.eu_traffic.type = "vrl"
  route.eu_traffic.source = '''
    includes([
      "GB", "DE", "FR", "IT", "ES", "NL", "BE", "AT", "SE", "DK"
    ], .geo.country_code)
  '''

Recipe 2: Service Inventory Enrichment

# service_inventory.csv
service_id,service_name,team,version,cost_center,on_call_team
srv-web-01,web-frontend,platform-team,v2.3.1,CC-1001,platform-oncall
srv-api-01,api-gateway,backend-team,v1.8.0,CC-1002,backend-oncall

[enrichment_tables.services]
  type = "file"
  file.path = "/etc/vector/service_inventory.csv"
  file.encoding.type = "csv"

[transforms.enrich_services]
  type = "remap"
  inputs = ["parsed_logs"]
  source = '''
    # Lookup service details
    service, err = get_enrichment_table_record(
      "services",
      { "service_id": .service_id }
    )
    
    if err != null {
      log("Service not found: " + .service_id, level: "warn")
      .service_name = "unknown"
      .team = "unknown"
    } else {
      .service_name = service.service_name
      .team = service.team
      .service_version = service.version
      .cost_center = service.cost_center
      .on_call_team = service.on_call_team
      
      # Add alerting info for critical services
      if .severity == "critical" {
        .alert_team = service.on_call_team
        .escalation_required = true
      }
    }
  '''

Recipe 3: User Enrichment with Privacy

# User enrichment with PII handling
user, err = get_enrichment_table_record(
  "users",
  { "user_id": .user_id }
)

if err == null {
  # Add non-PII user information
  .user.account_type = user.account_type
  .user.subscription_tier = user.tier
  .user.region = user.region
  .user.created_date = user.created_at
  
  # Hash PII for correlation without exposure
  .user.email_hash = sha256(user.email)
  .user.name_hash = sha256(user.full_name)
  
  # Don't include actual PII
  # .user.email = user.email  # NO!
  # .user.name = user.name    # NO!
}

Recipe 4: Cost Allocation Tags

# Enrich with cost allocation information
resource, err = get_enrichment_table_record(
  "resources",
  { "resource_id": .resource_id }
)

if err == null {
  # Add billing tags
  .billing.cost_center = resource.cost_center
  .billing.project = resource.project_name
  .billing.environment = resource.environment
  .billing.business_unit = resource.business_unit
  
  # Calculate estimated cost if metrics available
  if exists(.usage_hours) {
    .billing.estimated_cost = to_float!(resource.hourly_rate) * .usage_hours
  }
}

Dynamic Enrichment Table Updates

Enrichment tables can be reloaded without restarting Vector:

[enrichment_tables.services]
  type = "file"
  file.path = "/etc/vector/services.csv"
  file.encoding.type = "csv"
  
  # Auto-reload when file changes
  file.auto_reload = true
  file.reload_interval_secs = 60

Manual reload via API:

# Reload specific table
curl -X POST http://localhost:8686/enrichment_tables/services/reload

# Reload all tables
curl -X POST http://localhost:8686/enrichment_tables/reload

Performance Considerations

Indexing

Create indexes for frequently queried fields:

[enrichment_tables.services]
  type = "file"
  file.path = "/etc/vector/services.csv"
  file.encoding.type = "csv"
  
  # Define indexes for fast lookups
  indexes = [
    ["service_id"],
    ["team", "environment"]
  ]

Caching Strategy

# Cache enrichment results to reduce lookups
if !exists(.enrichment_cache) {
  .enrichment_cache = {}
}

# Check cache first
if exists(.enrichment_cache[.service_id]) {
  service_data = .enrichment_cache[.service_id]
} else {
  # Perform lookup and cache result
  service_data = get_enrichment_table_record!(
    "services",
    { "service_id": .service_id }
  )
  .enrichment_cache[.service_id] = service_data
}

Handling Large Tables

Use appropriate indexes

Index fields that you query frequently to speed up lookups.

Filter data

Only load necessary rows in your enrichment table. Remove historical or irrelevant data.

Consider memory limits

Large enrichment tables consume memory. Monitor Vector’s memory usage and adjust table size accordingly.

Use disk buffers

If enrichment increases processing time, use disk buffers to prevent backpressure:

[sinks.my_sink.buffer]
  type = "disk"
  max_size = 1073741824  # 1 GB

Troubleshooting Enrichment

Debugging Missing Enrichment

# Add debug logging
service, err = get_enrichment_table_record(
  "services",
  { "service_id": .service_id }
)

if err != null {
  log("Enrichment failed for service_id: " + .service_id, level: "debug")
  log("Error: " + string!(err), level: "debug")
  
  # Track enrichment failures
  .enrichment_failed = true
  .enrichment_error = string!(err)
} else {
  .enrichment_succeeded = true
}

Validating Enrichment Tables

# Test enrichment table loading
vector validate /etc/vector/vector.toml

# Check enrichment table status via API
curl http://localhost:8686/enrichment_tables

Common Issues

Table not found error

Cause: Enrichment table name mismatchSolution: Verify table name in configuration matches VRL function call

No matching records

Cause: Query condition doesn’t match any table rowsSolution: Check field names and values, verify data exists in table

Multiple records found

Cause: Using get_enrichment_table_record when multiple matches existSolution: Use find_enrichment_table_records or add more specific conditions

Best Practices

Keep tables updated: Regularly refresh enrichment data to ensure accuracy
Use appropriate indexes: Index fields that you query frequently
Handle missing data gracefully: Always check for errors and provide defaults
Minimize table size: Only include necessary data and fields
Version your data: Track changes to enrichment tables
Test thoroughly: Validate enrichment logic with test data
Monitor performance: Track enrichment latency and failures
Document mappings: Maintain documentation of enrichment sources and fields

Next Steps

Enrichment is powerful when combined with other Vector features:

Routing: Use enriched data to route events to different destinations
Filtering: Filter based on enriched fields
Aggregation: Group and analyze events using enrichment tags
Alerting: Trigger alerts based on enriched context

With enrichment tables, you can transform raw events into rich, contextual data that provides deep insights into your systems and applications.

Getting Started

Core Concepts

Configuration

Deployment

Administration

Guides

Data Enrichment

What is Data Enrichment?

Enrichment Tables Overview

Setting Up Enrichment Tables

File-Based Enrichment

GeoIP Enrichment

Enrichment Table Functions

get_enrichment_table_record

find_enrichment_table_records

Advanced Enrichment Patterns

Multi-Field Matching

Date Range Enrichment

Wildcard Matching

Nested Enrichment

Practical Enrichment Recipes

Recipe 1: Complete GeoIP Enrichment

Recipe 2: Service Inventory Enrichment

Recipe 3: User Enrichment with Privacy

Recipe 4: Cost Allocation Tags

Dynamic Enrichment Table Updates

Performance Considerations

Indexing

Caching Strategy

Handling Large Tables

Troubleshooting Enrichment

Debugging Missing Enrichment

Validating Enrichment Tables

Common Issues

Best Practices

Next Steps

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Configuration

Deployment

Administration

Guides

​What is Data Enrichment?

​Enrichment Tables Overview

​Setting Up Enrichment Tables

​File-Based Enrichment

​GeoIP Enrichment

​Enrichment Table Functions

​get_enrichment_table_record

​find_enrichment_table_records

​Advanced Enrichment Patterns

​Multi-Field Matching

​Date Range Enrichment

​Wildcard Matching

​Nested Enrichment

​Practical Enrichment Recipes

​Recipe 1: Complete GeoIP Enrichment

​Recipe 2: Service Inventory Enrichment

​Recipe 3: User Enrichment with Privacy

​Recipe 4: Cost Allocation Tags

​Dynamic Enrichment Table Updates

​Performance Considerations

​Indexing

​Caching Strategy

​Handling Large Tables

​Troubleshooting Enrichment

​Debugging Missing Enrichment

​Validating Enrichment Tables

​Common Issues

​Best Practices

​Next Steps

Build docs developers (and LLMs) love

What is Data Enrichment?

Enrichment Tables Overview

Setting Up Enrichment Tables

File-Based Enrichment

GeoIP Enrichment

Enrichment Table Functions

get_enrichment_table_record

find_enrichment_table_records

Advanced Enrichment Patterns

Multi-Field Matching

Date Range Enrichment

Wildcard Matching

Nested Enrichment

Practical Enrichment Recipes

Recipe 1: Complete GeoIP Enrichment

Recipe 2: Service Inventory Enrichment

Recipe 3: User Enrichment with Privacy

Recipe 4: Cost Allocation Tags

Dynamic Enrichment Table Updates

Performance Considerations

Indexing

Caching Strategy

Handling Large Tables

Troubleshooting Enrichment

Debugging Missing Enrichment

Validating Enrichment Tables

Common Issues

Best Practices

Next Steps