Fortran-DVM Compiler
Detailed design
* 30 June 2000 *


Contents

1 Compiler role
2 Command line format
3 The general scheme of compiler

3.1 Parsing
3.2 Transforming parse tree
3.3 Generating code in Fortran 77
3.4 Generating code in HPF

4 Basic data structures

4.1 Parse tree
4.2 Symbol and Type table

5 Detailed description of compiler modules

5.1 Translating Fortran DVM constructs (module dvm.cpp)

5.1.1 Distributed array creation and remapping
5.1.2 Distributed array referencing
5.1.3 Parallel loop

5.2 Translating input/output statements (module io.cpp)
5.3 Restructuring parse tree (module stmt.cpp)
5.4 Translating HPF-DVM constructs (module hpf.cpp)

5.4.1 Processing distributed array references in HPF-DVM
5.4.2 INDEPENDENT loop


This report presents the detailed description of the Fortran-DVM (FDVM) compiler implementation. The basic data structures, the control scheme, and the functions of compiler modules are considered.

1 Compiler role

Fortran DVM (FDVM) language is an extension of the Fortran 77 language for parallel programming. The extension is implemented as special comments (directives) that annotate sequential program in Fortran 77.

The input to the compiler is source code in Fortran DVM or HPF-DVM language. The compiler produces the following output programs.

2 Command line format

The format of the FDVM compiler command line is illustrated below:

dvm fdv [ <options> ] <file-name>

Source program is placed in input file <file-name>.fdv or <file-name>.hpf.

On the command-line <options> are the compiler options:

-o file Place output in the file file;
-s Produce sequential program;
-p Produce parallel program;
-hpf1 Produce HPF1 program;
-hpf2 Produce HPF2 program;
-v Display the invocations of compiler phases and version number;
-w Display all the warning messages;
-Idir Add directory dir to the list of directories searched for include files;
-bindk Specifies the compatibility of data types between Fortran and C, k is an integer pointing to compatibility table number;
-dleveld[:fr-list] Produce additional code for the program debugging,
leveld specifies debug level, fr-list is fragment number list;
-elevele[:fr-list] Produce additional code for program performance analyzing,
levele specifies level of performance debug.

3 The general scheme of compiler

Sage++ system is used as a tool for designing FDVM compiler.

Sage++ is an object-oriented toolkit for building program transformation systems for Fortran 77, Fortran 90, C and C++ languages. It is designed as an open C++ class library that provides a set of parsers, a structured parse tree, a symbol and type tables for a user. The heart of the system is a set of functions that allow restructuring the parse tree and a mechanism (called unparsing) for generating new source code from the restructured internal form.

The FDVM compiler consists of four components:

3.1 Parsing

The Fortran parser of Sage++ based on the GNU Bison version of YACC is extended to add language extensions (DVM directives) to Fortran system. It consists of the following modules:

ftn.gram - grammar rules for Fortran
fdvm.gram - grammar rules for Fortran DVM
lexfdvm.c - lexical analyzer
tag - variant tag list
tokens - lexeme list
gram1.tab.c - parser generated by Bison
cftn.c - main routine (calls parser, opens and closes the files that are needed)
init.c - initialization routines
stat.c - routines for creating internal form of statements (bif node of parse tree)
errors.c - printing error messages
sym.c -Symbol table routines
types.c - routines to handle the variable declarations
lists.c - routines to build the lists
misc.c - miscellaneous help routines
hash.c - hash table routines

The parser reads the source file, checks the concrete syntax, constructs a parse tree, and writes its internal representation in .dep file.

3.2 Transforming parse tree

Second phase of compiling involves analyzing and restructuring internal representation of FDVM program. A DVM directive is substituted for a sequence of Lib-DVM function calls. Afterwards new source code is generated from restructured internal form.

Back-end of the compiler is written in C++ language using Sage++ class library.

The Sage++ library is organized as a class hierarchy that provides access to the parse tree, symbol table and type table for each file in an application project. There are five basic families of classes in the library: Project and Files, Statements, Expressions, Symbols, and Types.

Project and Files correspond to source files. Statements correspond to the basic source statements in Fortran 77 and DVM directives. Expressions are contained within statements. Symbols are the basic user defined identifiers. Types are associated with each identifier and expression.

The file libSage++.h contains all the class definitions.

Seven modules compound the translator:

dvm.cpp - analyzing and translating FDVM constructs
funcall.cpp - generating Lib-DVM library calls
stmt.cpp - restructuring parse tree
io.cpp - translating I/O statements
debug.cpp - support of debugging mode
help.cpp - miscellaneous help routines
hpf.cpp - translating HPF-DVM constructs

3.3 Generating code in Fortran 77

Generating new source code from the restructured internal form is implemented by the File class member function (unparse( )) of Sage++ class library.

3.4 Generating code in HPF

When the source FDVM program is converted in HPF program the following routines and tables are used for unparsing:

unparse_hpf.c - routines for generation HPF code
low_hpf.c - low-level routines for unparsing
unparse.hpf - table driving the generation of HPF2 code
unparse1.hpf - table driving the generation of HPF1 code

4 Basic data structures

The definitions of data structures of internal representation are contained in the files:

4.1 Parse tree

The structures of parse tree nodes for a statement and an expression are given in Fig. 4.1 and Fig.4.2 correspondingly. The Fig. 4.4 illustrates the fragment of parse tree.

4.2 Symbol and Type table

The Fig.4.3 presents the Symbol and Type Table entries.

variant tag
identification tag
index
global line number
local line number
declaration specifier
pointer to the label
pointer to the next statement node
pointer to the source filename
pointer to the control parent node
property list
list of nodes (list of procedures)
pointer to the comment
symbol table entry
L-value expr tree
R-value expr tree
spare expr tree
do-label (used by do)
null
null
null
null

Fig. 4.1. Parse tree node representing a statement (bif node).

variant tag
identification tag
pointer to the next node (by allocation order)
pointer to the Type table element
constant value
pointer to the Symbol table element
L-value expr tree
R-value expr tree

Fig. 4.2. Parse tree node representing an expression (low level node).

variant tag variant tag
identification tag identification tag
length identifier
spare field Hash table entry
spare field special list
use-definition chain special list
base type entry (for array) special list
ranges (for array) next Symbol table entry
  Type table entry
  Scope
  use-definition chain
  attributes (mask)
  do-variable flag
  parser used
  pointer to value (for constants)
  special fields

Fig.4.3. Type and Symbol table entries

Fig. 4.4. Internal representation of the statement a = b + c.

5 Detailed description of compiler modules ==>