DVM Debugger
Preliminary design
* October 1999 *

- last edited 02.10.00 -


Contents

1 The Functions of DVM debugger

1.1 The “Dynamic control of DVM-directives” method
1.2 Kinds of detected errors
1.3 The “Comparing execution results” method

2 The Content of DVM debugger
3 Approach and Principle for DVM debugger Implementation

3.1 The “Dynamic control of DVM-directives” method

3.1.1 Checking initialization of variables and elements of distributed arrays
3.1.2 Checking access to elements of distributed arrays
3.1.3 Checking private and read-only variables
3.1.4 Checking reduction variables
3.1.5 Checking usage of remote access buffer

3.2 The “Comparing execution results” method

3.2.1 Trace accumulation
3.2.2 Trace comparing
3.2.3 Checking reduction operations


1 The Functions of DVM debugger

DVM debugger is used for debugging DVM-program (written in Fortran-DVM or C-DVM languages). The following approach is used to debug DVM-programs. First, the program is debugged on a workstation as a sequential program using ordinary debugging methods and tools. Then the program is executed at the same workstation in the special mode of checking DVM-directives. At the next step the program may be executed on a parallel computer in the special mode, when intermediate results of its execution are compared with reference results (for example, the results of its sequential execution).

The DVM-program can contain the errors of different kinds. The DVM-debugger is used to detect errors not appeared during a sequential execution. In the common case the following classes of errors can be considered:

A compiler detects the errors of the first class.

Only simple errors of second class such as wrong parameter type may be detected by static analysis. To detect the errors of this class each Lib-DVM function checks the correctness of DVM-directives order and parameters. The checks are done dynamically, during program execution. Some from these checks can cause considerable overhead therefore they are done only by a special request.

It is impossible to detect errors of third class without special compilation mode and considerable overhead in the parallel program. Unspecified data dependence in loop is example of the error of the third class. To detect this error it is necessary to fix all data modifications and usage on different processors and to determine the cases, when a variable is modified on one processor and used on the other. This detection can be more effectively performed on a single processor by emulation of parallel execution of the program.

The DVM-debugger is intended to detect errors of third class. It is based on the following two methods.

The first method, method of dynamic control of DVM-directives, allows verifying the correctness of the program parallelization specified by DVM-directives. It is based on the analysis of sequence of Lib-DVM function calls and references to the variables during parallel program execution simulated on a single processor.

The second method is based on the comparison of parallel and sequential program execution trace results. It allows to localize the program point and moment, when the results are beginning to differ. Unlike the first method, only part of program can be compiled in the special debug mode.

The facilities of parallel program tracing that are included into DVM-debugger may be useful for detection of error of fourth class. Moreover the system tracing facilities are intended to detect these errors (the tracing of DVM support library calls).

1.1 The “Dynamic control of DVM-directives” method

The dynamic control is based on simulation of DVM-program parallel execution on a single processor. This method using can slow down the program execution significantly and requires large volumes of additional memory. Therefore it can be applicable to a program with specially chosen test data of limited size only.

1.2 Kinds of detected errors

The dynamic control allows to detect the following kinds of errors:

  1. Undeclared cross-iteration data dependencies in a parallel loop or task region.
  2. Using of non-initialized variables.
  3. Modification of non-distributed variables not specified as reduction or private.
  4. Using reduction variables after the asynchronous reduction startup but before its completion.
  5. Accessing to out of bound elements of distributed array.
  6. Undeclared access to non-local elements of the distributed array.
  7. Writing to shadow edges of the distributed array.
  8. Reading shadow elements before completion of their updating.

1.3 The “Comparing execution results” method

The dynamic control is intended first of all for checking DVM-directives correctness. The control area is limited by DVM-programs compiled in the special debug mode only. However, the program can contain calls of procedures written in ordinary sequential languages (including assembler). The procedure execution is not controlled and can cause incorrect program execution. Besides, correctness of reduction operation descriptions is not checked. And finally, the program can have errors (not related with its parallelization) which appear only during a parallel execution.

There is another control method aimed to detect such errors. This method is based on an accumulation of trace results for different conditions of execution and these results comparison. With this method the program may be executed in two modes. In the first mode, trace results (intermediate variable values) are accumulated and written to a file as the reference trace results. In the second mode, execution results are compared with the reference ones. It is taken into account that values of some variables may differ (for example, reduction variables inside parallel construction).

2 The Content of DVM debugger

The DVM-debugger consists of two parts: the dynamic control system and the trace comparison system.

Both systems use the following base subsystems: tables allowing storing large volumes of uniformed information, hash-tables for fast data lookup, and diagnostic output module.

The dynamic control system consists of the following components:

The trace comparison system consists of the following components:

3 Approach and Principle for DVM debugger Implementation

3.1 The “Dynamic control of DVM-directives” method

All data, which can be used in the DVM-program, are divided into the following classes:

The control algorithms are based on these data classes.

3.1.1 Checking initialization of variables and elements of distributed arrays

This inspection is perofrmed for all classes of data. Each variable has its initialization flag (the array has the array of flags). When variable is modified the flag is set. When the variable is read the flag setting is checked.

3.1.2 Checking access to elements of distributed arrays

Each element of created distributed array is provided by a correspondent structure, describing access type to the element (read or write) and a last loop iteration or last parallel task, where this access occurred. Entering a parallel construction all these structures are initialized by a default status that means that element is not used yet. When an element is used inside a parallel loop or task the number of iteration or task and access type are stored for this element. If an element is modified inside one iteration or task and then used inside another iteration or task the undeclared data dependence error is detected.

Note: Iterations of a parallel loop can depend on each other due to usage of reduction variables inside loop. But in this case such dependence doesn’t prevent parallel loop execution and should be declared by programmer when specifying the parallel loop. If a program hasn’t such declaration the variable will be considered as a private variable and private variable usage error will be detected.

When the program accesses to a distributed array inside a parallel construction the local part of the array is calculated.

If the local part of the array doesn’t contain the accessed element an error of access to non-local part is reported. If a shadow edge contains the accessed element the shadow edge renewal checking is performed.

The following three states are assigned to shadow edges of a distributed array: the renewal isn’t performed, renewal being started and renewal has been finished. The program can access to the shadow edges only for reading and only after the renewal has been completed.

The state renewal isn’t performed” is set when exported elements are modified. It provides a detection of incorrect access to shadow elements before the next edge renewal will be performed. The error is reported when exported elements are modified before renewal of edges has been completed.

3.1.3 Checking private and read-only variables

The type of a variable is determined when the first use of the variable inside a parallel construction is detected. If the variable is not reduction variable or distributed array then its type is determined by the following way. If the variable is used for writing it will be registered as private. Otherwise it will be registered as read-only.

Each private variable has its initialization flag. The flag is set if the variable is initialized inside given loop iteration or task and otherwise flag is cleared. When variable is modified the flag is set. When the variable is read the flag setting is checked. The private variable flags are cleared after parallel loop exit.

Access type checking is performed for read-only variables. If the program attempts to write to read-only variable the error will be detected.

3.1.4 Checking reduction variables

Reduction variables are registered when included into a reduction group. The reduction variable has a flag that can assume the following states:

The state of the reduction variable is checked when the variable is accessed. If reduction variable is used after a parallel construction leaving but before asynchronous reduction completion the error will be reported.

Note: Incorrect specification of a reduction function (for example, specifying MIN instead of MAX) is not detected by this method. Use the trace comparison method to detect such errors.

3.1.5 Checking usage of remote access buffer

The remote data buffer is used for pre-loading of non-local elements required for the each processor. The same verifications are performed for a remote data buffer and for a distributed array. But there are few exclusions:

3.2 The “Comparing execution results” method

3.2.1 Trace accumulation

The following trace information is accumulated in a trace file:

Each record in the trace file has reference to a line of the source program.

Since this method requires considerable overhead the means to control trace detailing are provided.

The content of a special file (trace configuration file) determines detailing of the trace. This file contains description of all program loops and task regions.

To decrease trace size the following accumulation levels are provided:

Furthermore, you can specify for each loop an iteration range and for each task region an task range, which the information about variable usage will be accumulated for.

3.2.2 Trace comparing

In the comparing mode the trace-file is read into the memory before program startup. The structure of trace in the memory is formed by the same functions that are used in the accumulation mode. As result, we have the same trace structure in the memory both after accumulating and after reading the trace.

Then during program execution the occurred events are compared with reference ones.

There are the following singularities of parallel program execution:

3.2.3 Checking reduction operations

Reduction variable accumulation has a special implementation. The values of reduction variables inside a parallel constructions aren’t compared with reference ones. Comparison is performed only for reduction result.

There are two ways to calculate a reduction operation. The first way is a standard method of reduction performance. Program statements inside an iteration or task perform all computations of a reduction variable on the its own processor. The final result of reduction operation between processors is computed by Lib-DVM. If a program is performed on a single processor only program statements will compute the reduction.

The second way is emulation of performing each iteration and parallel task on a separate processor. At the beginning of iteration and task the initial value is assigned to reduction variable. The initial value is stored before parallel construction beginning. Upon end of iteration and task the reduction is performed by Lib-DVM according to specified reduction function.

To perform reduction checking the comparing of program execution results is performed for each method that described above. If a program specify the reduction function correctly the reduction results will be the same for both ways. Otherwise the error of differences of reduction results will be reported for user.