Skip to main content

Overview

CsvUtils is a utility class that provides helper methods for parsing CSV and tab-delimited data, as well as safe string-to-integer conversion. This class is commonly used in Mapper classes to parse input data fields. Package: circlenet.common Source: /workspace/source/src/main/java/circlenet/common/CsvUtils.java:3

Methods

split()

Splits a CSV line into an array of string fields.
public static String[] split(String line)
line
String
required
The CSV line to split
Returns: String[] - Array of field values (empty strings for empty fields) Implementation: Uses String.split(",", -1) to preserve empty trailing fields.

Example Usage

public class AccessCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    @Override
    protected void map(LongWritable key, Text value, Context context) 
            throws IOException, InterruptedException {
        String[] f = CsvUtils.split(value.toString());
        if (f.length >= 3) {
            pageId.set(f[2].trim());
            context.write(pageId, ONE);
        }
    }
}
From TaskBSimple.java:29 - parsing activity log entries.

splitTab()

Splits a tab-delimited line into an array of string fields.
public static String[] splitTab(String line)
line
String
required
The tab-delimited line to split
Returns: String[] - Array of field values (empty strings for empty fields) Implementation: Uses String.split("\\t", -1) to preserve empty trailing fields.

Example Usage

public class CountMapper extends Mapper<LongWritable, Text, Text, Text> {
    @Override
    protected void map(LongWritable key, Text value, Context context) 
            throws IOException, InterruptedException {
        String[] f = CsvUtils.splitTab(value.toString());
        if (f.length >= 2) {
            outKey.set(f[0].trim());
            outVal.set("C," + f[1].trim());
            context.write(outKey, outVal);
        }
    }
}
From TaskBSimple.java:72 - parsing intermediate MapReduce output (tab-separated key-value pairs).

toInt()

Safely converts a string to an integer with a fallback value on parse failure.
public static int toInt(String value, int fallback)
value
String
required
The string value to parse
fallback
int
required
The default value to return if parsing fails
Returns: int - Parsed integer value, or fallback if parsing fails Error Handling: Catches all exceptions during parsing and returns the fallback value. Automatically trims whitespace before parsing.

Example Usage

public class JoinReducer extends Reducer<Text, Text, NullWritable, Text> {
    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context) 
            throws IOException, InterruptedException {
        int count = 0;
        for (Text t : values) {
            String[] p = CsvUtils.split(t.toString());
            if (p.length >= 2 && "C".equals(p[0])) {
                count = CsvUtils.toInt(p[1], 0);
            }
        }
        // Use count...
    }
}
From TaskBSimple.java:95 - safely parsing count values during join operation.

Usage Patterns

Parsing CSV Input Data

String[] fields = CsvUtils.split(value.toString());
if (fields.length == 5) {
    String userId = fields[0];
    String name = fields[1];
    int age = CsvUtils.toInt(fields[2], 0);
    String email = fields[3];
    String hobby = fields[4];
}

Parsing MapReduce Intermediate Output

MapReduce jobs output tab-separated key-value pairs by default:
String[] fields = CsvUtils.splitTab(value.toString());
if (fields.length >= 2) {
    String key = fields[0];
    String value = fields[1];
}

Combining Split and ToInt

String[] f = CsvUtils.split(line);
if (f.length >= 4) {
    int count = CsvUtils.toInt(f[0], 0);
    long keyRank = (count * 1000000L) + seq++;
    // Process ranking...
}
From TaskBSimple.java:119 - parsing and ranking top results.

Design Notes

  • Utility Class: Constructor is private; all methods are static
  • Preserve Empty Fields: Both split methods use -1 limit to preserve trailing empty fields
  • Null Safety: toInt() handles any exception (null, empty, invalid format) with fallback
  • Performance: Simple delegation to Java standard library methods for efficiency

Build docs developers (and LLMs) love