Skip to main content

Overview

JobTimer is a utility class that wraps Hadoop MapReduce job execution with automatic timing and performance logging. It records job execution times to both console output and a CSV file for analysis. Package: circlenet.common Source: /workspace/source/src/main/java/circlenet/common/JobTimer.java:9

Methods

run()

Executes a MapReduce job with timing and logging.
public static boolean run(Job job, String label) throws Exception
job
org.apache.hadoop.mapreduce.Job
required
The configured Hadoop MapReduce job to execute
label
String
required
A descriptive label for this job (used in logging and CSV output)
Returns: boolean - true if the job completed successfully, false otherwise Throws: Exception - Any exception from job execution

Behavior

  1. Records start time
  2. Calls job.waitForCompletion(true) to execute the job
  3. Records end time and calculates duration
  4. Prints timing to console
  5. Appends timing data to CSV file
  6. Returns job success status

Console Output

TaskA time_ms=45230 time_s=45.23

Example Usage

public static void main(String[] args) throws Exception {
    long totalStart = System.currentTimeMillis();
    
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "TaskA - Hobby Frequency");
    
    job.setJarByClass(TaskA.class);
    job.setMapperClass(TaskAMapper.class);
    job.setReducerClass(TaskAReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
    boolean success = JobTimer.run(job, "TaskA");
    JobTimer.total("TaskA", totalStart);
    
    System.exit(success ? 0 : 1);
}
From TaskA.java:71-72 - single job execution.

total()

Records total execution time for a multi-job workflow.
public static void total(String label, long totalStartMs)
label
String
required
A descriptive label for the overall workflow
totalStartMs
long
required
The start time in milliseconds (from System.currentTimeMillis())
Returns: void

Behavior

  1. Calculates elapsed time from totalStartMs to now
  2. Prints total time to console
  3. Appends total timing to CSV with phase="total" and success=true

Console Output

TaskBSimple total_time_ms=152847 total_time_s=152.847

Example Usage

public static void main(String[] args) throws Exception {
    long totalStart = System.currentTimeMillis();
    
    // Job 1
    Job job1 = Job.getInstance(conf, "TaskB Simple - Count Access");
    // ... configure job1 ...
    if (!JobTimer.run(job1, "TaskBSimple-job1-count")) {
        JobTimer.total("TaskBSimple", totalStart);
        System.exit(1);
    }
    
    // Job 2
    Job job2 = Job.getInstance(conf, "TaskB Simple - Join");
    // ... configure job2 ...
    if (!JobTimer.run(job2, "TaskBSimple-job2-join")) {
        JobTimer.total("TaskBSimple", totalStart);
        System.exit(1);
    }
    
    // Job 3
    Job job3 = Job.getInstance(conf, "TaskB Simple - Top10");
    // ... configure job3 ...
    boolean ok = JobTimer.run(job3, "TaskBSimple-job3-top10");
    JobTimer.total("TaskBSimple", totalStart);
    
    System.exit(ok ? 0 : 1);
}
From TaskBSimple.java:149-182 - multi-job workflow with total time tracking.

CSV Output Format

File Location

The CSV file path is determined by:
  1. Environment variable CIRCLENET_TIMING_FILE (if set)
  2. Default: task_times.csv in current directory

CSV Schema

timestamp,label,phase,time_ms,time_s,success
2026-03-01T14:23:45.123,TaskA,job,45230,45.23,1
2026-03-01T14:23:45.456,TaskA,total,45250,45.25,1
2026-03-01T14:25:12.789,TaskBSimple-job1-count,job,42100,42.1,1
2026-03-01T14:26:05.234,TaskBSimple-job2-join,job,63450,63.45,1
2026-03-01T14:27:13.567,TaskBSimple-job3-top10,job,47297,47.297,1
2026-03-01T14:27:13.590,TaskBSimple,total,152847,152.847,1

Field Descriptions

FieldTypeDescription
timestampISO DateTimeWhen the measurement was recorded
labelStringJob or workflow label
phaseStringEither "job" or "total"
time_msLongExecution time in milliseconds
time_sDoubleExecution time in seconds (decimal)
successInt1 for success, 0 for failure

CSV Features

  • Header Row: Automatically written when creating new file
  • Append Mode: New entries appended without overwriting
  • Thread Safety: Synchronized write method prevents race conditions
  • Error Handling: Write failures print to stderr but don’t stop job execution

Usage Patterns

Single Job Timing

long totalStart = System.currentTimeMillis();
Job job = Job.getInstance(conf, "My Job");
// Configure job...
boolean success = JobTimer.run(job, "MyJob");
JobTimer.total("MyJob", totalStart);
System.exit(success ? 0 : 1);

Multi-Job Pipeline

long totalStart = System.currentTimeMillis();

// Run jobs sequentially
if (!JobTimer.run(job1, "Pipeline-step1")) {
    JobTimer.total("Pipeline", totalStart);
    System.exit(1);
}

if (!JobTimer.run(job2, "Pipeline-step2")) {
    JobTimer.total("Pipeline", totalStart);
    System.exit(1);
}

boolean ok = JobTimer.run(job3, "Pipeline-step3");
JobTimer.total("Pipeline", totalStart);
System.exit(ok ? 0 : 1);

Custom CSV Location

Set environment variable before running:
export CIRCLENET_TIMING_FILE=/data/metrics/job_performance.csv
hadoop jar circlenet.jar circlenet.taskA.TaskA input output

Analyzing Performance

The CSV output can be analyzed using standard tools:
# View all job times
cat task_times.csv | column -t -s,

# Calculate average job time
awk -F, 'NR>1 && $3=="job" {sum+=$4; count++} END {print sum/count}' task_times.csv

# Find slowest jobs
sort -t, -k4 -nr task_times.csv | head -n 10

Design Notes

  • Utility Class: Constructor is private; all methods are static
  • Thread Safety: CSV append is synchronized to prevent concurrent write issues
  • Non-Blocking Errors: CSV write failures don’t crash the job
  • Console + File: Dual output ensures visibility during execution and historical analysis
  • ISO Timestamps: Uses LocalDateTime.toString() for standard date format

Build docs developers (and LLMs) love