Reversing (big) text file in Java

Posted on January 8, 2019

The task

When interviewing technical people, the most challenging thing is to discover the candidate’s creativity. This post presents an idea for such a task: reversing the order of lines in a text file, e.g., a random one generated by the following code.

package io.reverse;

import org.apache.commons.lang3.RandomStringUtils;

import java.io.BufferedWriter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Random;

public class WriteRandomTextFile {
    public static final String DIRECTORY = "spool";
    public static final String FILE = "big.txt";
    private static final Random random = new Random();

    public static void main(String[] args) throws IOException {
        try (var writer = Files.newBufferedWriter(Paths.get(DIRECTORY, FILE))) {
            for (int i = 0; i < 100_000 - 1; i++)
                writeRandomString(writer);
        }
    }

    private static void writeRandomString(BufferedWriter writer)
            throws IOException {
        String str = RandomStringUtils.randomAlphabetic(random.nextInt(255));
        writer.write(str);
        writer.newLine();
    }
}

A good candidate is expected to elaborate on some details, first of all, whether or not an upfront estimate of the file’s size exists and how much memory is available for the process.

Take one

For reasonably small files as well as for testing purposes, we can start off with the simplest possible solution. Let’s read the whole source file into memory and write data in reverse order. Here a candidate could earn some additional points by showing knowledge of how to iterate over a collection in reverse direction.

package io.reverse;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;

/**
 * starts working from ~ -Xmx20m on JDK11
 */
public class ReverseInMemory {

    public static final String DIRECTORY = "spool";
    public static final String SOURCE = "big.txt";
    public static final String TARGET = "reverse-memory.txt";

    public static void main(String[] args) throws IOException {
        List<String> lines = Files.readAllLines(Paths.get(DIRECTORY, SOURCE));
        var iter = lines.listIterator(lines.size());
        try (var writer = Files.newBufferedWriter(Paths.get(DIRECTORY, TARGET))) {
            while (iter.hasPrevious()) {
                writer.write(iter.previous());
                writer.newLine();
            }
        }
    }
}

This works, but takes a lot of memory and is obviously not suitable for large files. And here the game begins. From my experience, most people tend to find a way to read a source file in reverse order, and this is the trap. It turns out there is no easy way to do this. All reading facilities, including java.util.Scanner (which additionally swallows exceptions silently while reading a file), work only in forward mode and not the other way around.

Take two

Wait, we have java.io.RandomAccessFile to the rescue! But it doesn’t provide an easy way to seek between newline separators. We can read or even map to memory a fragment of a fixed size, but we don’t know the length of the longest line in the file… the algorithm becomes difficult to comprehend, let alone implement.

By far the most favorite mental pattern of mine when locked in such dead-ends is inversion. It works quite often, and this case is no exception. If we are not able to read a source file in reverse order—relax and don’t worry—let’s read it in a straightforward, effective way but reverse the way we write the target file instead. It turns out to be a surprisingly easy thing to do. Having the next line from the source file, we know its length and can calculate the position to perform writing, starting from the end of the target file.

NB: This code doesn’t take into account charsets and line separators.

package io.reverse;

import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.file.Files;
import java.nio.file.Paths;

/**
 * works with -Xmx2m on JDK11
 */
public class ReverseWithRAF {

    private static final String SOURCE = "spool/big.txt";
    private static final String TARGET = "spool/reverse-raf.txt";

    public static void main(String[] args) throws IOException {
        var in = new File(SOURCE);

        try (var out = new RandomAccessFile(new File(TARGET), "rw");
             var reader = Files.newBufferedReader(Paths.get(SOURCE))) {
            long pos = in.length();
            out.setLength(pos);

            String str;
            while ((str = reader.readLine()) != null) {
                int len = str.length();
                pos -= (len + 1);
                out.seek(pos);
                out.writeBytes(str + "\n");
            }
        }
    }
}

To validate results, one can use the diff utility with the --report-identical-files option:

$ diff -s reverse-memory.txt reverse-raf.txt
Files reverse-memory.txt and reverse-raf.txt are identical