Debugging DVM-programs |
- last edited 07.10.02 -
Contents
1 Introduction
2 What is DVM-program?
3 Error kinds in DVM-program
4 Dynamic control of
DVM-directives
5 Comparison of execution
results
6 System
trace accumulation tools and its examining
7 DVM-system tuning
8 Options of converters
8.1 Converter options controlling program execution modes
8.2 Options of converters for dynamic debugger
8.3 Options of converters for performance analysis
9 DVM-system commands
10 Methods of debugging
DVM-programs
10.1 Debugging ordinary sequential program
10.2 Obtaining debug versions of DVM-program for sequential and parallel execution
10.3 Program execution in mode of dynamic control of DVM-directives
10.4 Accumulation of DVM-program reference trace file
10.5 Comparing reference trace with results of parallel program execution on single processor
10.6 Comparing parallel execution trace with reference one
10.7 Accumulating parallel program trace
10.8 Parallel execution with real data
10.9 Estimating trace size
10.10 Controlling size of trace file
10.11 Program startup with non-standard parameter set
11.1 The parameters of dynamic control
11.2 Parameters of trace accumulation and comparison
11.3 Parameters of standard data streams redirection and Run-Time System informational messages output control
11.4 Parameters controlling system tracing
11.4.1 Enabling and disabling tracing
11.4.2 Specifying opened (enabled) trace streams
11.4.3 Specifying trace modes
11.4.4 Controlling form of accumulated information
11.4.5 Controlling internal self-checking Run-Time System mechanisms, functioning during trace accumulation
11.4.6 Controlling output of additional information when tracing some Run-Time System functions in extended mode
12 Diagnostics messages of dynamic debugger
13
Structure of trace configuration file
14 Execution trace
structure
15 Error
messages of Run-Time System
15.1 Start and completion errors
15.2 Errors of the type "... is not a DVM object"
15.3 Errors of the type "[the object] is not a/the …"
15.4 Errors of the type "… is not a subsystem of the current/parental PS"
15.5 Errors of the type "…was not created by the current subtask" and "…was not started by the current subtask"
15.6 Errors of the type "…has not been aligned/mapped" and "…has already been aligned/mapped"; "…does not exist" and "…already exists"
15.7 Errors of the type "… has already been started/inserted …", "… has not been started/completed" and "… the reduction group is empty"
15.8 Index and value errors
15.9 Other semantic errors
15.10 Memory allocation and the number of objects errors
15.11 Errors of low level message passing
16 Structure of system trace file
C-DVM and Fortran-DVM languages (F-DVM below) are intended for the development of portable and efficient parallel computational applications for different architecture computers. They are extensions of standard C and Fortran 77 languages. A parallel program is a usual sequential program, expanded by DVM-directives, defining its parallel execution. DVM-directives are transparent for usual compilers, so the compilers process DVM-program as usual sequential program.
The following approach is used to debug DVM-programs.
First, the program is debugged on a workstation as a sequential program (in the mode of ignoring DVM-directives) using ordinary debugging methods and tools. Then the program is executed at the workstation in the special mode of dynamic control of DVM-directives that allows verifying the correctness and fullness of the DVM-directives. At the next step the program can be executed at a parallel computer of a cluster of workstations (or on the workstation with simulation of parallel execution) in the mode of comparison its intermediate results with the reference ones obtained, for example, as a result of its sequential execution. Moreover, to localize errors during parallel program execution the tools for an accumulation of the trace information are provided.
DVM-program is one or several source files in C-DVM or F-DVM languages having .cdv and .fdv extensions respectively.
Ready-to-run program (executable file) is obtained in three steps:
DVM-program can contain errors of different kinds. The errors can be subdivided on several classes by degree of their influencing on program robustness, by the easiness to detect and so on.
Generally, the following five classes of errors can be distinguished in DVM-program:
The errors of the first class are detected by:
The errors of the second class are detected while converting C-DVM or F-DVM programs (see sections 8 and 9). A list of diagnostics messages is presented in the C-DVM and F-DVM compiler (converter) user guides respectively.
The errors of the third class are detected by Run-Time System when the program is executed in parallel mode. Lib-DVM functions check the correctness of DVM-directives order and passed parameters (the errors, detected by Lib-DVM functions, are considered in section 15).
The errors of fourth class are detected by DVM-debugger when:
The errors of the fifth class can be detected by:
4 Dynamic control of DVM-directives
Dynamic control of DVM-directives allows verifying correctness of the program parallization by DVM-directives. The dynamic control is based on simulation of DVM-program parallel execution during its sequential execution on a single processor.
For dynamic control a program should be compiled in a mode of obtaining debug version of the parallel program (see sections 8 and 9).
However using this method essentially degrades the program execution and requires considerable additional memory resources. Therefore the program is recommended to debug using test data.
The dynamic control allows detecting the following kinds of errors:
5 Comparison of execution results
Lack of dynamic control errors does not guarantee correct execution of parallel program due to following reasons:
To find such errors the mode of the accumulation and the comparison of execution traces is used. It allows localizing the program point and moment, when the results are beginning to differ.
When tracing execution, the information is accumulated about all variables reads and updates, entering each loop iteration, entering and exiting parallel loop, entering each parallel task, entering and exiting task region.
Tracing as well as dynamic control results in considerable overhead. Therefore the program is recommended to debug using test data at first and only then real data. However when executing the program with real data, it is not always possible to accumulate full trace due to its large volume. In such a case it is necessary to estimate the trace size as for whole program as for its parts.
The detailed level (and therefore the size) of the trace can be controlled by:
6 System trace accumulation tools and its examining
A system trace (the trace of Run-Time System function calls) allows tracing a sequence of function calls, their parameters and execution times. There are two main ways of the system trace accumulation (see section 11.4):
Trace detailed level is controlled by the parameters (see section 11.4). It is necessary to remember that to trace some frequently used Run-Time System functions the library must be compiled in special mode, specified by computation variables. The full list of traced events is presented in file events.def in DVM-system directory dvm_sys/rts/src. In this file the event number is corresponded to the event name.
All Run-Time System informational messages and all messages about errors, detected by Run-Time System, are also traced.
System trace files have text format (see section 16) and can be examined using ordinary editors and visualizers.
DVM-system is tuned to a user environment in two steps:
The following environment variables that can be modified by a user are defined in the DVM-command startup file:
dvmdir | - | full name (with the path) of DVM-system directory (it is tuned automatically when the system is installed); |
dvmpar | - | directory and extensions of base parameter files of DVM-system; |
usrpar | - | directories and names (with extensions) of DVM-system parameter files, in which a user can modify base parameter set (a variable can have several values, separated by spaces); |
optcconv | - | options of C-DVM converter; |
optfconv | - | options of F-DVM converter; |
optccomp | - | invocation and options of standard C-compiler; |
optfcomp | - | invocation and options of standard Fortran-compiler; |
optclink | - | C program linker options; |
optflink | - | Fortran program linker options; |
dvmlib | - | librarys of Run-Time System; |
usrlib | - | libraries, used by the user program; |
Pred_sys | - | name of configuration file, describing target machine (for predictor); |
Pred_vis | - | name of html-file (predictor results) visualizer; |
Doc_vis | - | name of documentation visualizer; |
dvmout | - | enable (on) or disable (off)
message output of:
|
dvmoutfile | - | file name to output the user task messages (if it is not specified, the messages will be output to the screen); |
dvmlog | - | enable (1) or disable (0) the user session protocol (if dvmoutfile is not specified, the protocol is output into the file dvm.log); |
dvmshow | - | enable (1) or disable (0) output of all executed DVM-commands to the screen; |
dvmsave | - | enable (1) or disable (0) keeping intermediate files (of conversion, compilation and so on). |
In own working directory (where a task is started) the user can have several DVM-command startup files with different names, containing different values of the environment variables. Then when starting dvm-commands the corresponding prefix is used (see section 9).
C-DVM and F-DVM converter options control:
Brief description of main options of the converters is presented below. Full description of all options is presented in C-DVM and Fortran-DVM user's guides.
8.1 Converter options controlling program execution modes
-p | - | obtaining parallel version of the program (by default): all DVM-directives are processed. |
-s | - | obtaining sequential version of the program: only DVM-directives required for execution tracing and performance analysis are processed. The data processing is not changed, that allows debugging such program using ordinary tools. |
-o<file> | - | name of target .c or .f file. |
-v | - | output of version number, source file name and so on. |
8.2 Options of converters for dynamic debugger
-d1 | - | tracing only distributed array updates. |
-d2 | - | tracing distributed array reading and updates. |
-d3 | - | tracing all data updates. |
-d4 | - | tracing all data reading and updates. |
8.3 Options of converters for performance analysis
-e1 | - | all parallel loops and nesting sequential loops are intervals. |
-e2 | - | all statement sequences, declared (INTERVAL) by the user, are intervals. |
-e3 | = | e1 + e2. |
-e4 | = | e3 + all sequential loops are intervals. |
DVM-system commands have the following form:
dvm <DVM-command_name> [<command_parameters>] <DVM-program_name>
where:
dvm | - | prefix (name of DVM-command startup file); |
<command_parameters> | - | parameters, specific for different commands, such as options of the converters or the compilers, processor matrix and so on; |
<DVM-program_name> | - | name (without extension) of the file, containing source program. |
DVM-commands can be subdivided on based and derived.
The base commands to perform different actions, required for debugging DVM-programs, are presented below.
dvm cdv [<C-DVM-converter_options>] <C-DVM-program_name>
dvm fdv [<F-DVM-converter_options>] <F-DVM-program_name>
Processing result: the files <DVM-program_name>.c or <DVM-program_name>.f.
At first, converter options are taken from environment variables (see section 7), then options from the command line are added. If the converter options are not specified, the commands form working parallel version of the program.
dvm cc [<C-compiler_options>] <C-DVM-program_name>
dvm f77 [<F-compiler_options>] <F-DVM-program_name>
Processing result: ready-for-run program (executable file <DVM-program_name>)
At first, the converter options are taken from environment variables (see section 7), then options from the command line are added.
dvm run [processor_matrix] [<cluster_options>] <DVM-program_name>
processor matrix - the matrix of
virtual processors is specified, for example, 2 3 or 5 1 3.
cluster_options - are specified only when
parallel programs are started on a workstation cluster with UNIX-like
operating systems. They are:
-mf <machine_list> | - | <machine_list> is the name of a file, containing machine name list in the workstation cluster; the summary number of processors in the processor matrix cannot be more, then the number of machine names in the list; |
-m | - | name $dvmdir/user/machinelist is used as the name of the file with the machine name list; |
-cp | - | coping the executable file in shared (defined in the file dvmwork and accessible for all workstations) directory; |
-h | - | help. |
If options -mf and -m are not specified, the program is executed at local workstation.
Processing result: the execution of the program version, prepared by the user.
Derived command is a sequence of base commands simplifying execution of some user actions, for example, when debugging DVM-programs (see section 10). The following derived commands exist:
dvm c [<C-DVM-converter_options>] <C-DVM-program_name>
dvm f [<F-DVM-converter_options>] <F-DVM-program_name>
Processing result: ready-for-run program (executable file <DVM-program_name>).
At first, the converter options are taken from environment variables (see section 7), then the options from the command line are added. If the converter options are not specified, the commands form working parallel version of the program.
dvm csdeb [<C-DVM-converter_options>] <C-DVM-program_name>
dvm fsdeb [<F-DVM-converter_options>] <F-DVM-program_name>
Processing result: ready-for-run program (executable file <DVM-program_name>_s)
At first, converter options are taken from environment variables (see section 7), then the options from the command line are added. If the converter options are not specified, the commands form sequential program version with option –d4 (see section 8).
dvm cpdeb [<C-DVM-converter_options>] <C-DVM-program_name>
dvm fpdeb [<F-DVM-converter_options>] <F-DVM-program_name>
Processing result: ready-for-run program (executable file <DVM-program_name>_p).
At first, the converter options are taken from environment variables (see section 7), then the options from the command line are added. If the converter options are not specified, the commands form parallel program version with option –d4 (see section 8).
dvm err <DVM-program_name>
Processing result: errors, detected in DVM-directives (if they exist).
dvm trc <DVM-program_name>
Processing result: a file, containing accumulated trace or error messages, if trace accumulation errors were detected.
dvm red <DVM-program_name>
Processing result: errors, detected during trace comparing (if they exist).
dvm dif [processor_matrix] [<cluster_options>] <DVM-program_name>
Processing result: errors detected during trace comparing (if they exist).
dvm ptrc [processor_matrix] [<cluster_options>] <DVM-program_name>
Processing result: the files with accumulated trace or error messages, if trace accumulation errors were detected.
dvm size <DVM-program_name>
Processing result: trace configuration file (see sections 10.10 and 13).
dvm pa [sts <output_file_name>] [[[<ch1> <ch2><ch3>] <level>] <numbers]
where:
ch1 | = | y/n – output of main characteristics; |
ch2 | = | y/n – output of comparative characteristics; |
ch3 | = | y/n – output characteristics per processors; |
level | - | enclosure level; |
numbers | - | list of the processor numbers for which to output characteristics. |
Processing result: performance analysis characteristics.
If options are omitted in first pair of square brackets, the characteristics are printed to the screen. The command of the form
dvm pa –h
outputs to the screen the list of its own options.
dvm_runpred <DVM-program name>
Processing result: file <DVM-program_name>.ptr, containing accumulated trace.
dvm pred [processor matrix] <DVM-program name>
The command execution is controlled by the following environment
variables:
Pred_sys - name of configuration file,
describing target machine;
Pred_vis - name html file visualizers.
The file <DVM-program_name>.ptr with trace, obtained by dvm_runpred
command, must exist for predictor operation.
Processing result: <DVM-program_name>.ptd directory, containing html files. If html-file visualizer is specified in command startup file, it is started automatically.
dvm doc [documentation_type]
where documentation_type can be:
ur | - | user documentation in Russian; |
ue | - | user documentation in English; |
sr | - | system documentation in Russian; |
se | - | system documentation in English. |
The command execution is controlled by environment variable:
Doc_vis - name of
documentation visualizer.
dvm ctest [processor matrix] <DVM-program_name>
dvm ftest [processor matrix] <DVM-program_name>
Processing result: concatenation of processing results of the commands jointed in command of combined execution.
Notion. If a program uses libraries of sequential programs, the library names must be listed in environment variable usrlib of dvm-command startup file (see section 7).
10 Methods of debugging DVM-programs
It is recommended to debug programs first on tests, and then on the real data in the following sequence.
10.1 Debugging ordinary sequential program
First DVM-program is debugged on a workstation as usual sequential program in C or Fortran 77 language using ordinary compilers and debuggers. DVM-directories are ignored.
10.2 Obtaining debug versions of DVM-program for sequential and parallel execution
To obtain the version for sequential execution the following commands are used:
dvm csdeb [C-DVM-converter_options] <C-DVM-program_name>
dvm fsdeb [F-DVM-converter_options] <F-DVM-program_name>
To obtain version for parallel execution the following commands are used:
dvm cpdeb [C-DVM-converter_options] <C-DVM-program_name>
dvm fpdeb [F-DVM-converter_options] <F-DVM-program_name>
10.3 Program execution in mode of dynamic control of DVM-directives
To execute DVM-program in this mode, the following command is used:
dvm err <DVM-program_name>
The program startup is controlled (by default) by dynamic control parameters from base file usrdebug (see sections 7 and 11.1), corrected by the following parameters from the file deb_err.par (both files are placed in subdirectory \PAR of DVM-system directory, specified in environment variable dvmpar):
EnableDynControl = 1; | - | enable dynamic control; |
EnableTrace = 0; | - | disable accumulation of execution trace. |
In case wrong DVM-directives are found, diagnostics about existence of dynamic control errors is outputted to stderr stream. This stream can be directed either to the screen or into a file (see sections 7 and 11.3).
Diagnostics about error type, line of the source text and numbers of iterations of all nesting loops can be also outputted either to the screen, or into a file (see section 11.1). A structure and list of dynamic control error messages is given in section 12.1).
The absence of dynamic control errors does not guarantee correct execution of the parallel program. Therefore, the program debugging should be continued, using trace accumulation and comparison trace commands.
10.4 Accumulation of DVM-program reference trace file
The following command is used for this purpose:
dvm trc <DVM-program_name>
By default, startup is controlled by the parameters of the execution trace accumulation from the base file usrdebug (see section 11.2), corrected by the following parameters from the file deb_trc.par:
EnableDynControl = 0; | - | disable dynamic control; |
EnableTrace = 1; | - | enable trace accumulation; |
TraceOptions.TraceMode = 1; | - | trace accumulation mode. |
In case trace accumulation errors are found, diagnostics about existence of such errors is outputted to stderr stream. This stream can be directed either to the screen or into a file (see sections 7 and 11.3).
Diagnostics about error type, line of the source text and numbers of iterations of all nesting loops can be also output either to the screen, or into a file (see section 11.2). A structure of accumulated trace is presented in section 14.
The structure and list of error messages of trace accumulation is given in section 12.2.
10.5 Comparing reference trace with results of parallel program execution on single processor
When comparing reference trace with the trace of parallel execution of the program on single processor, the correctness of reduction operation descriptions is checked. It is carried out by means of a special mode of parallel execution of the program on single processor. In this mode the reduction variables are calculated according to the reduction operation descriptions given by programmer. The reduction variables are calculated in the way of emulation of each iteration execution on a separate processor. At the beginning of iteration, the initial value is assigned to the reduction variable. This value is kept when entering loop. Upon end of iteration, Run-Time System is invoked to calculate the final value of the reduction according to specified by the user reduction function. If the user specifies the reduction function incorrectly, the differences in traces, obtained in different modes of reduction computation must occur.
For trace comparing the following command is used:
dvm red <DVM-program_name>
By default, startup is controlled by the trace comparison parameters from base file usrdebug (see section 11.2), corrected by the following parameters from the file deb_red.par:
EnableDynControl = 0; | - | disable dynamic control; |
EnableTrace = 1; | - | enable trace accumulation; |
TraceOptions.TraceMode = 3; | - | trace comparison mode; |
ManualReductCalc = 1; | - | computation of reduction variables according to the user specifications. |
In case trace comparison errors are found, diagnostics about existence of such errors is outputted to stderr stream. This stream can be directed either to the screen or into a file (see sections 7 and 11.3).
Diagnostics about error type, line of the source text and numbers of iterations of all nesting loops can be also output either to the screen, or into a file (see section 11.2).
The structure and list of error messages of trace comparison is given in section 12.2.
10.6 Comparing parallel execution trace with reference one
The parallel program is started in a mode of emulation of multiprocessor system on a workstation and the execution trace comparison with the reference one. The following command is used:
dvm dif N1 [N2 [N3]] [<cluster_options>] <DVM-program_name>
where N1, N2, N3 - sizes of processor matrix (1 1 1 by default).
By default, startup is controlled by the parameters of trace comparison from the base file usrdebug (see section 11.2), corrected by the following parameters from the file deb_dif.par:
EnableDynControl = 0; | - | disable dynamic control; |
EnableTrace = 1; | - | enable trace accumulation; |
TraceOptions.TraceMode = 3; | - | trace comparison mode; |
ManualReductCalc = 0; | - | computation of reduction variables according to standard algorithm. |
The reduction variables are calculated in standard way. All computations of reduction variable on one processor are performed by the statements of iterations, executed on the processor. The final result of reduction operation from partial results obtained on all the processors is calculated by Run-Time System. If a program is performed on a single processor, only the program statements will calculate the reduction.
In case trace comparison errors are found, diagnostics about existence of such errors is outputted to stderr stream. This stream can be directed either to the screen or into a file (see sections 7 and 11.3).
Diagnostics about error type, line of the source text and numbers of iterations of all nesting loops can be also directed either to the screen, or into a file (see section 11.2).
The structure and list of error messages of trace comparison is given in section 12.2.
If there is no differences in the traces the program can be executed with real data (see section 10.8).
If differences are detected, but the error in program is failed to find using reference trace and trace comparison diagnostics, the user can accumulate trace on each processor during executing parallel version of the program on required processor matrix (see section 10.7).
If during the parallel program execution (or during its emulation on one workstation) error situations will occur on some processor (or differences in reference and current traces will be detected) the program can hang-up. If to terminate program execution by CTRL-C, the standard output streams directed into the files can be loused. In this case stderr stream should not be directed into the files.
A point of hang-up or abnormal program termination can be detected, if to enable the program system trace before the program startup (see section 11.4). Last records in system trace allow determining the program point, where crash situation occurred.
10.7 Accumulating parallel program trace
The following command is used:
dvm ptrc N1 [N2 [N3]] [<cluster_options>] <DVM-program_name>
where N1, N2, N3 - sizes of processor matrix (1 1 1 by default).
By default, startup is controlled by the parameters of the user trace accumulation from the base file usrdebug (see section 11.2), corrected by the following parameters from the file deb_trc.par:
EnableDynControl = 0; | - | disable dynamic control; |
EnableTrace = 1; | - | enable trace accumulation; |
TraceOptions.TraceMode = 1; | - | trace comparison mode. |
In case trace accumulation errors are found, diagnostics only about existence of such errors is outputted to stderr stream. This stream can be directed either to the screen or in a file (see sections 7 and 11.3).
Diagnostics about error type, line of the source text and numbers of iterations of all nesting loops can be also directed either to the screen, or into a file (see section 11.2). The trace is accumulated on each processor in the separate file, for example, with names 0.trd, 1.trd, 2.trd and so on.
The structure of accumulated trace files is presented in section 14.
10.8 Parallel execution with real data
If no differences are detected at previous steps it is possible to consider the program working correctly on test parameters. Now the user can proceed to parallel execution of the program on workstation cluster with real parameters.
The following commands are used:
for compilation:
dvm c [C-DVM-converter_options] <DVM-program_name>
dvm f [F-DVM-converter_options] <DVM-program_name>
for execution:
dvm run [N1 [N2 [N3]]] [<cluster_options>] <DVM-program_name>
where N1, N2, N3 - sizes of processor matrix (1 1 1 by default).
By default, startup is controlled by the parameters from the sets, specified in environment variables dvmpar and usrpar.
If during the program execution with real parameters execution results are not satisfied the user, he can again obtain sequential and parallel program versions to trace the program with real data. But it is necessary to take into account, that:
The following command is used:
dvm size <DVM-program_name>
By default, startup is controlled by the parameters of the user trace accumulation from the base file usrdebug (see section 11.2), corrected by the following parameters from the file deb_size.par:
EnableDynControl = 0; | - | disable dynamic control; |
EnableTrace = 1; | - | enable trace accumulation; |
TraceOptions.TraceMode = 0; | - | mode of trace configuration file generation. |
The command creates so called trace configuration file, containing, in particular, predicted sizes of trace, with taking into account specified DVM-converter options (see section 8) and trace accumulation levels (see section 11.2).
Really, only two parameters TraceOptions.TraceLevel and TraceOptions.WriteEmptyIter from the base parameter set and the trace configuration file (described below) control the trace size.
10.10 Controlling size of trace file
Created trace configuration file may be modified by the user to decrease trace size. The user may set the mode of selective accumulation of the trace, completely or partially cancel trace accumulation for some (or all) loops or parallel task regions. Then the command dvm size should be performed again to estimate the size of the trace. If results are not acceptable the process should be repeated.
The trace configuration file contains:
The trace entity is a program loop or task region. Information for each program loop and region contains:
The header of executable construct contains:
The parameters, controlling executable construct (that can be modified by a user) are:
Changing executable construct controlling parameters influences on calculated sizes of the separate construct trace and therefore on the size of full trace, the number of trace lines and the number of traced loop iterations and traced tasks.
One of the following trace accumulation levels can be specified as for whole program as for each loop as for every task region:
The traced iterations and tasks are specified in the following way:
( <dimension> : [<first iteration>] , [<last iteration>] [<iteration step>])
<dimension> | - | loop dimension (begin with 0), the restrictions are specified for. It is obligatory parameter. It must be always zero for task regions. |
<first iteration> | - | first traced iteration or task number. If the parameter is omitted, the iterations or tasks are traced from the first one. |
<last iteration> | - | last traced iteration or task number. If the parameter is omitted, the iterations or tasks are traced until last one inclusively. |
< iteration step> | - | step of iteration or task tracing. By default, the step is equal to 1. |
Iteration ranges examples:
(0:2,10,) | - | tracing iterations from 2 till 10 inclusively; |
(0:,10,) | - | tracing iterations with numbers till 10 inclusively; |
(0:4,,) | - | tracing iterations from number 4; |
(0:,,3) | - | tracing iterations with step 3. |
10.11 Program startup with non-standard parameter set
When a user starts programs using derived dvm-commands, described in this chapter, but with non-standard parameters (using his own parameter sets), he should take into account the sequence, Run-Time System follows to correct parameters:
This sequence of parameter correction is resulted in the user current directory in the file current.par, used by Run-Time System to execute any dvm-command.