Introduction to File Input and Output

(with illustrations from The Java Tutorial)

We have seen in this class how objects communicate with each other. Program often need to communicate with the outside world. The means of communication are input (such as a keyboard) and output (such as the computer screen). Programs can also communicate through stored data, such as files.

I/O Streams

A stream is a communication channel that a program has with the outside world. It is used to transfer data items in succession.

An Input/Output (I/O) Stream represents an input source or an output destination. A stream can represent many different kinds of sources and destinations, including disk files, devices, other programs, and memory arrays.

Streams support many different kinds of data, including simple bytes, primitive data types, localized characters, and objects. Some streams simply pass on data; others manipulate and transform the data in useful ways.

No matter how they work internally, all streams present the same simple model to programs that use them: A stream is a sequence of data.

Reading information into a program.

A program uses an input stream to read data from a source, one item at a time:

Reading information into a program.

Writing information from a program.

A program uses an output stream to write data to a destination, one item at time:

Writing information from a program.

The data source and data destination pictured above can be anything that holds, generates, or consumes data. Obviously this includes disk files, but a source or destination can also another program, a peripheral device, a network socket, or an array.

The Java IO API

The java.io package contains many classes that your programs can use to read and write data. Most of the classes implement sequential access streams. The sequential access streams can be divided into two groups: those that read and write bytes and those that read and write Unicode characters. Each sequential access stream has a speciality, such as reading from or writing to a file, filtering data as its read or written, or serializing an object.

Types of Streams

Byte Streams: Byte streams perform input and output of 8-bit bytes. They read and write data one byte at a time. Using byte streams is the lowest level of I/0, so if you are reading or writing character data the best approach is to use character streams. Other stream types are built on top of byte streams.

Character Streams: All character stream classes are descended from Reader and Writer.We are going to use the Reader and Writer classes that perform file I/0, FileReader and FileWriter. We will look at two examples of writing programs using character streams, one that reads and writes one character at a time and one that reads and writes one line at a time.

For sample input, we'll use the example file xanadu.txt, which contains the following verse:

In Xanadu did Kubla Khan
A stately pleasure-dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.

Example 1: Reading and Writing One Character at a time

Here is an example of a program that copies a file one character at a time. The IOTest class has a copyCharacters method which creates an instance of FileReader and passes the name of the file to be read to the FileReader constructor. It then creates an instance of FileWriter and passes the name of the file to be written to the FileWriter constructor.

It then copies the contents of the xanadu.txt file to the characteroutput.txt file, character by character.

It uses an int variable, "c", to read to and write from. The int variable holds a character value. Using an int as the return type allows read() to use -1 to indicate that it has reached the end of the stream.

Notice that there is a block of code preceded by the keyword try and another block of code preceded by the keyword finally. This is to ensure that the code in the finally block gets executed, even if the code within the try block causes an exception to be thrown (e.g. if you try to open a file for which you don't have permission). Don't worry if you don't understand it completely; you can copy and paste our examples in to your programs.

import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;


  public class IOTest {
  
    public static void copyCharacters() throws IOException {
        FileReader inputStream = null;
        FileWriter outputStream = null;

        try {
            inputStream = new FileReader("xanadu.txt");
            outputStream = new FileWriter("xanadu_output.txt");

            int c;
            while ((c = inputStream.read()) != -1) {
                outputStream.write(c);
            }
        } finally {
            if (inputStream != null) {
                inputStream.close();
            }
            if (outputStream != null) {
                outputStream.close();
            }
        }
    }
 }

Always Close Streams

Closing a stream when it's no longer needed is very important. That is why CopyCharacters uses a finally block to guarantee that both streams will be closed even if an error occurs. This practice helps avoid serious resource leaks.

One possible error is that CopyCharacters was unable to open one or both files. When that happens, the stream variable corresponding to the file never changes from its initial null value. That's why CopyCharacters makes sure that each stream variable contains an object reference before invoking close.

Example 2: Reading and Writing one line a a time

Character I/O is usually processed in units longer than single characters. One common unit is the line: a string of characters with a line terminator at the end. A line terminator can be a carriage-return/line-feed sequence ("\r\n"), a single carriage-return ("\r"), or a single line-feed ("\n"). Supporting all possible line terminators allows programs to read text files created on any of the widely used operating systems.

Let's modify the copyCharacters example to use line-oriented I/O. To do this, we have to use two classes we haven't seen before, BufferedReader and PrintWriter.

The copyLines method example invokes BufferedReader.readLine and PrintWriter.println to do input and output one line at a time.

import java.io.FileReader;
import java.io.FileWriter;
import java.io.BufferedReader;
import java.io.PrintWriter;
import java.io.IOException;

    public static void copyLines() throws IOException {
        BufferedReader inputStream = null;
        PrintWriter outputStream = null;

        try {
            inputStream = 
                new BufferedReader(new FileReader("xanadu.txt"));
            outputStream = 
                new PrintWriter(new FileWriter("characteroutput.txt"));

            String l;
            while ((l = inputStream.readLine()) != null) {
                outputStream.println(l);
            }
        } finally {
            if (inputStream != null) {
                inputStream.close();
            }
            if (outputStream != null) {
                outputStream.close();
            }
        }
    }

Invoking readLine returns a line of text. This line is then output using PrintWriter's println method, which appends the line terminator for the current operating system. This might not be the same line terminator that was used in the input file.

Scanning

The Scanner class is available in Java 5 (JDK 1.5). It provides methods to convert text into appropriate Java types (integer, float, etc). A scanner splits text into a succession of tokens (text representations of data values) according to specific rules.

For example, in the String object "ab*cd 12.34 253", "ab*cd" is a String token, "12.34" is a double token and "253" is an integer token.

Objects of type Scanner are useful for breaking down formatted input into tokens and translating individual tokens according to their data type.

Selected Constructors:

Selected Methods:

See the Scanner API documentation for more methods.

Scanner Example 1

By default, a scanner uses white space to separate tokens. (White space characters include blanks, tabs, and line terminators. For the full list, refer to the documentation for Character.isWhitespace.) To see how scanning works, let's look at ScanXan, a program that reads the individual words in xanadu.txt and prints them out, one per line.

import java.io.*;
import java.util.Scanner;


public class ScanXan {
    public static void main(String[] args) throws IOException {
        Scanner s = null;
        try {
            s = new Scanner(new BufferedReader(new FileReader("xanadu.txt")));

            while (s.hasNext()) {
                System.out.println(s.next());
            }
        } finally {
            if (s != null) {
                s.close();
            }
        }
    }
}

Notice that ScanXan invokes Scanner's close method when it is done with the scanner object. Even though a scanner is not a stream, you need to close it to indicate that you're done with its underlying stream.

The output of ScanXan looks like this:

In
Xanadu
did
Kubla
Khan
A
stately
pleasure-dome
...

To use a different token separator, invoke useDelimiter(), specifying a regular expression. For example, suppose you wanted the token separator to be a comma, optionally followed by white space. You would invoke, s.useDelimiter(",\\s*");

Scanner Example 2: Translating Individual Tokens

The ScanXan example treats all input tokens as simple String values. Scanner also supports tokens for all of the Java language's primitive types (except for char), as well as BigInteger and BigDecimal. Also, numeric values can use thousands separators. Thus, Scanner correctly reads the string "32,767" as representing an integer value.

The ScanSum example reads a list of double values and adds them up.

import java.io.FileReader;
import java.io.BufferedReader;
import java.io.IOException;
import java.util.Scanner;
import java.util.Locale;


public class ScanSum {
    public static void main(String[] args) throws IOException {
        Scanner s = null;
        double sum = 0;
        try {
            s = new Scanner(
                    new BufferedReader(new FileReader("usnumbers.txt")));
           

            while (s.hasNext()) {
                if (s.hasNextDouble()) {
                        sum += s.nextDouble();
                    } else {
                        s.next();
                    }   
            }
        } finally {
            s.close();
        }

        System.out.println(sum);
    }
}
And here's the sample input file, usnumbers.txt,
8.5
832,767
83.14159
81,000,000.1

The output string is "1032778.74159".