Introduction to Sequential Files

Introduction

Files are repositories of data that reside on backing storage (hard disk, magnetic tape or CD-ROM). Nowadays, files are used to store a variety of different types of information, such as programs, documents, spreadsheets, videos, sounds, pictures and record-based data.

Although COBOL can be used to process these other kinds of data file, it is generally used only to process record-based files.

In this, and subsequent file-oriented tutorials, we examine how COBOL may be used to process record-based files.

There are essentially two types of record-based file organization:

Serial Files (COBOL calls these Sequential Files)
Direct Access Files.

In a Serial File, the records are organized and accessed serially.

In a Direct Access File, the records are organized in a manner that allows direct access to a particular record without having to read any of the preceding records.

In this tutorial, you will discover how COBOL may be used to process serial files.

By the end of this unit you should -

Understand concepts and terminology like file, record, field and record buffer.
Be able to write the file and record declarations for a Sequential File.
Understand how READ verb works
Be able to use the READ, WRITE, OPEN and CLOSE verbs to process a Sequential File.

Introduction to record-based files

COBOL is generally used in situations where the volume of data to be processed is large. These systems are sometimes referred to as "data intensive" systems. Generally, large volumes arise not because the data is inherently voluminous but because the same items of information have been recorded about a great many instances of the same object. Record-based files are used to record this information.

In record-based files:

We use the term file, to describe a collection of one or more occurrences (instances) of a record type (template).
We use the term record, to describe a collection of fields which record information about an object.
We use the term field, to describe an item of information recorded about an object (e.g. StudentName, DateOfBirth).

It is important to distinguish between a record occurrence (i.e. the values of a record) and the record type or template (i.e. the structure of the record).

Each record occurrence in a file will have a different value but every record in the file will have the same structure.

For instance, in the student details file, illustrated below, the occurrences of the student records are actual values in the file. The record type/template describes the structure of each record occurrence.

Student details file showing record occurrences mapped to the record type template

Before a computer can do any processing on a piece of data, the data must be loaded into main memory (RAM). The CPU can only address data that is in RAM.

A record-based file may consist of hundreds of thousands, millions or even tens of millions of records, and may require gigabytes of storage. Files of this size cannot be processed by loading the whole file into memory in one go. Instead, files are processed by reading the records into memory, one at a time.

To store the record read into memory and to allow access to the individual fields of the record, a programmer must declare the record structure (see the diagram above) in his program. The computer uses the programmer's description of the record (the record template) to set aside sufficient memory to store one instance of the record. The memory allocated for storing a record is usually called a "record buffer".

A record buffer is capable of storing the data recorded for only one instance of the record. To process a file a program must read the records one at a time into the record buffer. The record buffer is the only connection between the program and the records in the file.

Diagram showing a record being read from a sequential file into the record buffer in RAM

If a program processes more than one file, a record buffer must be defined for each file.

To process all the records in an INPUT file, we must ensure that each record instance is copied (read) from the file, into the record buffer, when required.

To create an OUTPUT file containing data records, we must ensure that each record is placed in the record buffer and then transferred (written) to the file.

To transfer a record from an input file to an output file we must read the record into the input record buffer, transfer it to the output record buffer and then write the data to the output file from the output record buffer. This type of data transfer between 'buffers' is quite common in COBOL programs.

Declaring Records and Files

Suppose we want to create a file to hold information about the students in the University. What kind of information do we need to store about each student?

One thing we need to store is the student's name. Each student is assigned an identification number; so we need to store that as well. We also need to store the date of birth, and the code of the course the student is taking. Finally, we are going to store the student's gender. These items are summarized below;

Student Id
Student Name
Date of birth
Course Code
Gender

To create a record buffer large enough to store one instance of a record, containing the information described above, we must decide on the type and size of each of the fields.

The student identity number is 7 digits in size so we need to declare the data-item to hold it as PIC 9(7).
To store the student name, we will assume that we require only 10 characters. So we can declare a data-item to hold it as PIC X(10).
The date of birth is 8 digits long so we declare it as PIC 9(8).
The course code is 4 characters long so we declare it as PIC X(4).
Finally, the gender is only one character so we declare it as PIC X.

The fields described above are individual data items but we must collect them together into a record structure as follows;

01 StudentRec.
   02 StudentId         PIC 9(7).
   02 StudentName       PIC X(10).
   02 DateOfBirth       PIC 9(8).
   02 CourseCode        PIC X(4).
   02 Gender            PIC X.

The record description above is correct as far as it goes. It reserves the correct amount of storage for the record buffer. But it does not allow us to access all the individual parts of the record that we might require.

For instance, the name is actually made up of the student's surname and initials while the date consists of 4 digits for the year, 2 digits for the month and 2 digits for the day .

To allow us to access these fields individually we need to declare the record as follows;

01 StudentRec.
   02 StudentId         PIC 9(7).
   02 StudentName.
      03 Surname        PIC X(8).
      03 Initials       PIC XX.
   02 DateOfBirth.
      03 YOBirth        PIC 9(4).
      03 MOBirth        PIC 99.
      03 DOBirth        PIC 99.
   02 CourseCode        PIC X(4).
   02 Gender            PIC X.

In this description, StudentName is a group item consisting of Surname and Initials, and DateOfBirth consists of YOBirth, MOBirth and DOBirth.

The record type/template/buffer of every file used in a program must be described in the FILE SECTION by means of an FD (file description) entry. The FD entry consists of the letters FD and an internal name that the programmer assigns to the file.

So the full file description for the students file might be:

DATA DIVISION.
FILE SECTION.
FD StudentFile.
01 StudentRec.
   02 StudentId         PIC 9(7).
   02 StudentName.
      03 Surname        PIC X(8).
      03 Initials       PIC XX.
   02 DateOfBirth.
      03 YOBirth        PIC 9(4).
      03 MOBirth        PIC 99.
      03 DOBirth        PIC 99.
   02 CourseCode        PIC X(4).
   02 Gender            PIC X.

Note that we have assigned the name StudentFile as the internal file name. The actual name of the file on disk is Students.dat.

Although the name of the students file on disk is Students.dat we are going to refer to it in our program as StudentFile. How can we connect the name we are going to use internally with the actual name of the file on disk?

The internal file name used in a file's FD entry is connected to an external file (on disk, tape or CD-ROM) by means of the SELECT and ASSIGN clause. The SELECT and ASSIGN clause is an entry in the FILE-CONTROL paragraph in the INPUT-OUTPUT SECTION in the ENVIRONMENT DIVISION.

ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
   SELECT StudentFile
          ASSIGN TO "STUDENTS.dat".
DATA DIVISION.
FILE SECTION.
FD StudentFile.
01 StudentRec.
   02 StudentId         PIC 9(7).
   02 StudentName.
      03 Surname        PIC X(8).
      03 Initials       PIC XX.
   02 DateOfBirth.
      03 YOBirth        PIC 9(4).
      03 MOBirth        PIC 99.
      03 DOBirth        PIC 99.
   02 CourseCode        PIC X(4).
   02 Gender            PIC X.

SELECT and ASSIGN syntax diagram for Sequential files

The GnuCOBOL compiler recognizes two kinds of Sequential File organization

LINE SEQUENTIAL
and
RECORD SEQUENTIAL.

LINE SEQUENTIAL files, are files in which each record is followed by the carriage return and line feed characters. These are the kind of files produced by a text editor such as Notepad.

RECORD SEQUENTIAL files, are files where the file consists of a stream of bytes. Only the fact that we know the size of each record allows us to retrieve them. Files that are not record based, can be processed by defining them as RECORD SEQUENTIAL.

The ExternalFileReference can be a simple file name, or a full, or a partial, file specification. If a simple file name is used, the drive and directory where the program is running is assumed but we may choose to include the full path to the file. For instance, we could associate the StudentFile with an actual file using statements like:

SELECT StudentFile
       ASSIGN TO "D:\Cobol\ExampleProgs\Students.dat"

SELECT StudentFile
       ASSIGN TO "A:\Students.dat"

The SELECT and ASSIGN clause allows us to assign a meaningful name to an actual file on a storage device. The advantage of this is that it makes our programs more readable and more easy to maintain. If the location of the file, or the medium on which the file is held, changes then the only change we need to make to our program, is to change the entry in the SELECT and ASSIGN clause.

COBOL file handling verbs

Sequential files are uncomplicated. To write programs that process Sequential Files you only need to know four new verbs - the OPEN, CLOSE, READ and WRITE.

You must ensure that (before terminating) your program closes all the files it has opened. Failure to do so may result in data not being written to the file or users being prevented from accessing the file.

The OPEN verb

OPEN verb syntax diagram for Sequential files

Before your program can access the data in an input file or place data in an output file, you must make the file available to the program by OPENing it.

When you open a file you have to indicate how you intend to use it (e.g. INPUT, OUTPUT, EXTEND) so that the system can manage the file correctly.Opening a file does not transfer any data to the record buffer, it simply provides access.

OPEN notes

When a file is opened for INPUT or EXTEND, the file must exist or the OPEN will fail.

When a file is opened for INPUT, the Next Record Pointer is positioned at the beginning of the file.

When the file is opened for EXTEND, the Next Record Pointer is positioned after the last record in the file. This allows records to be appended to the file.

When a file is opened for OUTPUT, it is created if it does not exist, and is overwritten, if it already exists.

CLOSE InternalFileName...

You must ensure that, before terminating, your program closes all the files it has opened. Failure to do so may result in some data not being written to the file or users being prevented from accessing the file.

READ verb syntax diagram for Sequential files

Once the system has opened a file and made it available to the program it is the programmers responsibility to process it correctly. To process all the records in the file we have to transfer them, one record at a time, from the file to the file's record buffer. The READ is provided this purpose.

The READ copies a record occurrence/instance from the file and places it in the record buffer.

READ notes

When the READ attempts to read a record from the file and encounters the end of file marker, the AT END is triggered and the StatementBlock following the AT END is executed.

Using the INTO Identifier clause, causes the data to be read into the record buffer and then copied from there, to the Identifier, in one operation. When this option is used, there will be two copies of the data. One in the record buffer and one in the Identifier. Using this clause is the equivalent of executing a READ and then moving the contents of the record buffer to the Identifier.

The animation below demonstrates how the READ works. When a record is read it is copied from the backing storage file into the record buffer in RAM. When an attempt to READ detects the end of file the AT END is triggered and the condition name EndOfFile is set to true. Since the condition name is set up as shown below, setting it to true fills the whole record with HIGH-VALUES.

FD StudentFile.
01 StudentRec.
   88 EndOfFile     VALUE HIGH-VALUES.

   02 StudentId     PIC 9(7).
       etc

WRITE RecordName [FROM Identifier]

The WRITE verb is used to copy data from the record buffer (RAM) to the file on backing storage (Disk, tape or CD-ROM).

To WRITE data to a file we must move the data to the record buffer (declared in the FD entry) and then WRITE the contents of record buffer to the file.

When the WRITE..FROM is used the data contained in the Identifier is copied into the record buffer and is then written to the file. The WRITE..FROM is the equivalent of a MOVE Identifier TO RecordBuffer statement followed by a WRITE RecordBuffer statement.

If you were paying close attention to the syntax diagrams above you probably noticed that while we READ a file, we must WRITE a record.

The reason we read a file but write a record, is that a file can contain a number of different types of record. For instance, if we want to update the students file we might have a file of transaction records that contained Insertion records and Deletion records. While the Insertion records would contain all the student record fields, the Deletion only needs the StudentId.

When we read a record from the transaction file we don't know which of the types will be supplied; so we must - READ Filename . It is the programmers responsibility to discover what type of record has been supplied.

When we write a record to the a file we have to specify which of the record types we want to write; so we must - WRITE RecordName .

The example program below demonstrates the items discussed above. The program gets records from the user and writes them to a file. It then reads the file and displays part of each record.

       >>SOURCE FORMAT IS FREE
IDENTIFICATION DIVISION.
PROGRAM-ID.  SeqWriteRead.
AUTHOR.  Michael Coughlan.
*> Example program showing how to create a sequential file
*> using the ACCEPT and the WRITE verbs and then read and
*> display its records using the READ and DISPLAY.
*> Note: In this version of COBOL pressing the Carriage Return (CR)
*> without entering any data results in StudentDetails
*> being filled with spaces.

ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
    SELECT StudentFile ASSIGN TO "STUDENTS.dat"
                ORGANIZATION IS LINE SEQUENTIAL.

DATA DIVISION.
FILE SECTION.
FD StudentFile.
01 StudentRec.
   88  EndOfStudentFile  VALUE HIGH-VALUES.
   02  StudentId         PIC 9(7).
   02  StudentName.
       03 Surname        PIC X(8).
       03 Initials       PIC XX.
   02  DateOfBirth.
       03 YOBirth        PIC 9(4).
       03 MOBirth        PIC 9(2).
       03 DOBirth        PIC 9(2).
   02  CourseCode        PIC X(4).
   02  Gender            PIC X.

PROCEDURE DIVISION.
Begin.
    OPEN OUTPUT StudentFile
    DISPLAY "Enter student details using template below."
    DISPLAY "Enter no data to end"

    PERFORM GetStudentRecord
    PERFORM UNTIL StudentRec = SPACES
       WRITE StudentRec
       PERFORM GetStudentRecord
    END-PERFORM
    CLOSE StudentFile

    OPEN INPUT StudentFile.
    READ StudentFile
         AT END SET EndOfStudentFile TO TRUE
    END-READ
    PERFORM UNTIL EndOfStudentFile
       DISPLAY StudentId SPACE StudentName SPACE CourseCode
       READ StudentFile
            AT END SET EndOfStudentFile TO TRUE
       END-READ
    END-PERFORM
    CLOSE StudentFile
    STOP RUN.

GetStudentRecord.
    DISPLAY "NNNNNNNSSSSSSSSIIYYYYMMDDCCCCG"
    ACCEPT  StudentRec.

Introduction

Aims

Objectives

Introduction to record-based files

Introduction

Files, Records, Fields

Record instance vs Record type

The record buffer

Some implications of "buffers"

Declaring Records and Files

Introduction

Creating a record

Declaring a record buffer in your program

The SELECT and ASSIGN clause

SELECT and ASSIGN syntax for Sequential files

What is the purpose of the SELECT and ASSIGN clause?

COBOL file handling verbs

Introduction

The OPEN verb

OPEN notes

The CLOSE verb

The READ verb

READ notes

How the READ works

The WRITE verb

Read a file, Write a record

Example Program