Porting Cobol Programs

Using a Transformational Approach


Julio Cesar Sampaio do Prado Leite

Marcelo Sant'Anna

Departamento de Informática

Pontifícia Universidade Católica do Rio de Janeiro

R. Marquês de São Vicente 225

22453-900 - Rio de Janeiro, Brasil

e-mail: draco@inf.puc-rio.br

Antonio Francisco do Prado

Departamento de Computação

Universidade Federal de São Carlos

Via Washington Luiz, km 235

04499-610 - São Carlos, Brasil

e-mail: draco@power.ufscar.br

Abstract

Transformation systems have been applied in several areas of software engineering. This article describes the use of the transformational machine Draco-PUC in porting Cobol programs. We describe the porting strategy of Cobol programs to C/C++ and give examples of such strategy applied to a radar application and also to a payroll system. Although targeting an object oriented language, we do not claim to generate code that strictly follows the OO guidelines. Special attention is given to the knowledge structure we have been using to help guide the transformation process as well as to help in the process of design recovery.

1. Introduction

Reverse engineering is a research area of ever-increasing importance, and one in which cooperation between research work and practical work is essential. More and more, society is demanding new systems or, at least, changes on old systems. The code legacy that we have to deal with presents a challenge for software engineering. First, we have to fully understand what a legacy system does, and second, we have to change it according to new demands from our clients. Although there are already some commercial available tools to help the task of reverse engineering a system, a still bigger effort is needed to face the complexity of the existing systems, not only because of their size, but also due to their intrinsic complexity. Recent work on reverse engineering can be found at [1, 2, 3].

We view reverse engineering as an important support to re-engineering, and this is how we believe maintenance should be treated. In our research [4] we have been using a general schema where the re-engineering process is divided in phases called: recover, specify, re-design and re-implement. Our process is geared towards the idea of reusability, reusing as much as possible during the process of re-engineering.

Porting occurs in the context of adaptive maintenance, that is, the functionality stays the same, but there is a need to alter the supporting platform (hardware or software). In such cases, most of the work is done in what we call the re-implementation phase. Using the Draco Paradigm [5], most of the re-implementation is done by transformations that do not need application domain knowledge. Once we have the knowledge from the source language and the target language, we are able to specify the mapping between them and to apply this mapping to any program written in the source language.

Our work started with very little regard to design recovery and has now evolved to a program knowledge base that helps in guiding the transformations as well as supporting re-design decisions. With respect to re-design, this task is specifically limited to decisions at the implementation level.

The focus of our contribution is narrowed to our specific case of interest: to implement and use complex transformations and show the flexibility of the Draco-PUC transformation system through the experiment of porting Cobol programs to C++. Draco-PUC is currently considerably more powerful than the original one implemented by Neighbors and used in [6]. Using the experience described in this paper, we have proposed and partially validated a knowledge base structure which helps guiding the transformational porting and also helps in the design recover of the ported applications.

This paper aims to show how to handle porting in a structured semi-automated manner and how to link transformations with recovered program abstractions. Our working results are of an ongoing research project, but the examples we have used make us believe on the power and flexibility of our proposal.

Our first sections deal with transformation systems. Section 2 reviews some available transformation systems and Section 3 details the transformation system used in the Draco-PUC machine. Section 4 deals with the general strategy used for porting Cobol programs to C/C++. Section 5 uses examples to illustrate the strategy and explains in more detail the knowledge base we are proposing. We conclude, Section 6, by summarizing our contribution, relating it to similar work and lying down the base for our future research.


2. Transformation Systems

Transformation systems have been proposed, for some time now, as the backbone of revolutionary software production [7]. Although the promises are still to be fulfilled, there are already localized examples of success and a significant amount of experience in the area of reverse engineering.

A transformation system is a system that manipulates programs or specifications in order to change their descriptions. Transformations can be either semantic-preserving or not. Generally a transformation system transforms a program A into program B by applying a well-defined set of transformations that should maintain the original semantics of A in B. These transformations have to be carefully written and must be proved to guarantee the formality of the manipulation. Below we describe some existing transformation systems.

Refine [8] is a commercial system produced by Reasoning Systems (US), based on the research developed at Kestrel [9]. The central point in Refine is a library of Common Lisp functions that encapsulates the very basis of a transformation system. The actual emphasis of Reasoning is the use of Refine as a reverse engineering tool, and there are already packages using Refine that support languages such as Cobol, Fortran, C, and Ada.

Popart [10] is a system with a definition language for parsing and re-writing rules. Lisp is the target language and the re-rewriting rules can be integrated with it. The architecture of Popart and its integration with Lisp provide a powerful way of manipulating software descriptions.

Tampr [11] is also a Lisp system, but it has ported itself to Fortran. It uses the language Poly for defining parsers and transformations. Most of the application of Tampr has been in porting Lisp to Fortran, and the resulting code is reported to be very efficient. There are also reports on work with Pascal and C.

TXL [12] has a very elegant transformation description language and very good documentation. Its availability on the Internet has widespread its use and several successful applications have been reported.

3. Draco-PUC

Draco-PUC [13, 14] is a software machine being developed at PUC-Rio with the objective of developing, testing and putting into practice the ideas of the Draco paradigm [5] for domain-oriented software construction. Our project started from the original Draco machine and evolved using a re-engineering approach. The core of the Draco-PUC machine is its powerful transformation engine, which was the basis for our porting strategy. The next section gives an introduction to Draco-PUC architecture as well as provide a very simple example of how the transformations work.

3.1 Draco-PUC Transformational Framework

Below we summarize some characteristics of Draco-PUC machine that are relevant with respect to the transformational paradigm.

Transformations are applied over Draco specifications, or programs, which are descriptions written in domain languages. The grammar for a Domain language is specified in an extended-BNF style syntax and generated by the Draco-PUC parser generator. Transformations can map descriptions in one language into descriptions in the same language as well they can change the representation language, mapping descriptions in one language to descriptions in other languages.

Figure 1: Transformation Engine Functioning

Figure 1 illustrates the Draco-PUC transformation engine. First of all the input program is analyzed. This step generates a Dast. Using a context filter, which makes it possible the selection of a region of the Dast, the selected part is submitted to the transformation engine. This selected part is called a program locale. The basic control, given a set of transforms, looks for a match by navigating the Dast (Draco Abstract Syntax Tree) in a left to right bottom-up fashion. Once several options of transformation exist, a rule filter selects among the several options the one to be applied. After the transformation is applied the new partial Dast is again selected for transformation using the same set of candidate transformation rules, unless there is a change in the control strategy. Sets of transformations can be chained in order to accomplish an specific aimed goal. Once transformations are finished, the Dast is prettyprinted .

A transformation rule is essentially composed of a recognition pattern, called lhs (left-hand-side) and a replacement pattern called rhs (right-hand-side). An example of a simple transformation is shown below in Figure 2.


input MULTIPLY M BY C GIVING E




transformation rule lhs: MULTIPLY Var1 BY Var2 GIVING Var3

rhs: COMPUTE Var3 = Var1 * Var2




output COMPUTE E = M * E


Figure 2: Example of a Transformation

Besides the usual lhs and rhs, one transformation in Draco-PUC may trigger events or alter the basic flow of control. This is possible by using the idea of control points with associated code (currently, only C++ code is allowed). This code has access to the pattern matcher variables as well as to the pointers that define selected program parts.

Figure 3 shows the general transformation framework. Left-to-right, the three levels of detail on transformer organization can be seen.

First (Figure 3a), we provide a means of encapsulating a set of transformations into a block, which has three parts. The first part is for the declaration of variables to be used at the transformations control points. The second part carries procedures that have to be activated before a set of transformations is considered. The third part is the transformer itself formed by a sequence of sets of transformations. The fourth part (END) finalizes the process.

Second (Figure 3b), we provide the structure for a set of transformations. Additionally to the list of transformations, each set has an initialization part, which are actions taken before the first transformation is applied, and an END part in which actions are performed after the last transformation is applied. We also have directives to describe the control strategy for that specific set of transformations. The definition of this control strategy is performed in the initialization part by choosing a triggering method, an application method and a search navigation method for the set.

Third (Figure 3c), we show the transformation structure itself. In addition to the lhs and rhs of a conventional production rule we have 5 possible control points: pre-match -- activated each time the lhs is tested against the program; match-constraint -- activated when the pattern matcher matches a variable in the lhs pattern to a part of the program; post-match -- activated after a successful match between the program and the lhs; pre-apply -- activated immediately before the program part is substituted by the rhs; post-apply -- activated after the program has been modified.

The transformation engine uses the concept of workspace to store the Dast information it is processing. The use of workspaces makes it possible the application of different policies during transformation application. For instance, if the policy is using in-place transformations, then the main workspace will hold the evolving Dast. On the other hand if it is necessary to keep a Dast unchanged, new workspaces could be created by the transformers. Also, when it is necessary to keep a Dast configuration for some time, before changing it, new workspaces could be created by the transformers.

Figure 3: Transformation Framework
  1. Example

In order to provide an understanding of how Draco-PUC works we present a classic example: symbolic differentiation implemented through transformations. This example shows how the computation of the derivative of (2+X)*(X+1) with respect to X is worked out in Draco-PUC.

Our example domain is called Math, encompassing natural numbers, variables and simple expressions, which handles addition, multiplication and differentiation operators. Parenthesized expressions are also allowed.

The grammar for the Math domain is presented in Figure 4. A syntactic type named exp is defined for expressions and two lexical types - NAT and VAR - are defined for naturals and variables. The lexical type IGNORE defines which characters have null value for the lexical scanner. Two left association rules are defined for the addition and multiplication operators; the order they are presented indicate the highest precedence of the addition operator over the multiplication operator. In the grammar descriptions, syntactic types begin with a lower case character whereas lexical types begin with an upper case character. The set of grammar rules are surrounded by double-percent signs (%%). Association and precedence rules are annotated in the beginning of the grammar description using the modes %left for left association, %right for right association and %nonassoc for non-associative lexical types. Single lexemes are surrounded by single quotes (').

Figure 5 shows the set of equations used in the example and Figure 6 shows the derivation history for the example itself. Figure 7 lists the description of equations DeriveConstant and DeriveMultiplication using Draco-PUC transformation language.


%left '+'


%left '*'

%%

exp : NAT

| VAR

| exp '+' exp

| exp '*' exp

| 'd' exp '/' 'd' VAR

| '(' exp ')'

;

NAT : [0-9]+

;

VAR : [XY]

;

IGNORE : [ \t\n]

;

%%

Figure 4: Grammar for the Math Domain

(DeriveConstant) dC/dV = 0


(DeriveEqualVars) dV/dV = 1

(DeriveDifferentVars) dV1/dV2 = 0

(DeriveAddition) d(E1+E2)/dV = dE1/dV + dE2/dV

(DeriveMultiplication) d(E1*E2)/dV = dE1/dV * E2 + E1 * dE2/dV

(SimplifyExpPlusZero) E+0 = E

(SimplifyZeroPlusExp) 0+E = E

(SimplifyExpMultOne) E*1 = E

(SimplifyOneMultExp) 1*E = E

C=constant; V,V1,V2=variables; E,E1,E2=expressions

Figure 5: Set of Equations for Symbolic Differentiation

Input: d((2+X)*(X+Y))/dX



=> d((2+X)*(X+Y))/dX


(DeriveMultiplication) => d(2+X)/dX*(X+Y) + (2+X)*d(X+Y)/dX

(DeriveAddition) => d2/dX + dX/dX*(X+Y) + (2+X)*d(X+Y)/dX

(DeriveConstant) => 0 + dX/dX *(X+Y)+(2+X)*d(X+Y)/dX

(DeriveEqualVars) => 0+1 *(X+Y)+(2+X)*d(X+Y)/dX

(SimplifyZeroPlusExp) => 1*(X+Y)+(2+X)*d(X+Y)/dX

(SimplifyOneMultExp) => (X+Y) + (2+X)*d(X+Y)/dX

(DeriveAddition) => (X+Y) + (2+X)*dX/dX + dY/dX

(DeriveEqualVars) => (X+Y) + (2+X)*1 + dY/dX

(DeriveDifferentVars) => (X+Y) + (2+X)*1 + 0


(SimplifyExpPlusZero) => (X+Y) + (2+X)*1



(SimplifyExpMultOne) => (X+Y)+(2+X)





Output: (X+Y)+(2+X)

Figure 6: Derivation History for d((2+x)*(x+y))/dX

Transformer SymbolicDifferentiation


Set Of Transforms Derive

Control

Trigger is automatic.

Search is bottom-up.

Application is exhaustive.

//

// dC/dV = 0

//

Transform DeriveConstant

Lhs: {{dast math.exp

d [[NAT C]] / d [[VAR V]]

}}

Rhs: {{dast math.exp

0

}}

//

// d(E1*E2)/dV = dE1/dV*E2 + E1*dE2/dV

//

Transform DeriveMultiplication

Lhs: {{dast math.exp

d ( [[exp E1]] * [[exp E2]] ) / d [[VAR V]]

}}

Rhs: {{dast math.exp

d [[exp E1]] / d [[VAR V]] * [[exp E2]]

+

[[exp E1]] * d [[exp E2]] / d [[VAR V]]

}}

Figure 7: DeriveConstant and DeriveMultiplication Equations

as Draco-PUC Transformation Rules

A transformer file has a header composed by the keyword Transformer followed by its name, followed by the several elements depicted in Figure 3a. Figure 7 describes an excerpt of the transformer SymbolicDifferentiation.

A set of transformation rules in Draco-PUC is described by the keyword Set Of Transforms (see Figure 3b). In Figure 7 Derive we show two transformations from the equations presented in Figure 5. The Control section describes Derive as a set which is automatically tried (Trigger is automatic) when its parent transformer is invoked and which will search patterns on a bottom-up manner (Search is bottom-up), applying transformations until no one more can be applied (Application is exhaustive).

A transformation rule in Draco-PUC is described by the keyword Transform (see Figure 3c). Transformations can be described using the Dast notation or using a domain specific syntax. The latter option is the most desirable. In order to use the domain syntax the transformation writer must make use of a domain escape from the Draco-PUC transformation language to the domain specific language. The domain escape is represented by the construct {{dast math.exp ... }} (see Figure 7). This construct indicates that in the places where the transformation language syntax expects a Dast description, we are describing an expression in the Math domain (math.exp). Every domain syntax is extended with constructs describing pattern variables, which uses the notation: [[type, varname]], where type must be a valid domain type and varname must be a symbol beginning with a letter (an identifier). In Figure 7, C is a pattern variable of type NAT and V is a pattern variable of type VAR.

4. From Cobol to C/C++

The general model we have developed for porting Cobol programs can be summarized by Figure 8. Our strategy has basically three steps (Cobol structuring, conversion and C/C++ structuring) and is supported by a program knowledge base. The following sub-sections detail each of these steps and the program knowledge base.

Figure 8: Cobol to C++ Porting Strategy

4.1 Cobol Structuring

At this step the programs are transformed according to structured programming principles. Data flow and control flow are analyzed and the program is restructured in such a way that sets of paragraphs can be grouped in procedures. The call graph between procedures is used to help clustering procedures and the data flow analysis is used to help determining coupling measures which may lead to the definition of modules with separate compilation. When coupling is provided by memory variables the integration of these modules is done by the linkage section. In the case of coupling trough files or data base, a script language, such as JCL (job control language) [15], must be used.

The presence of this step on our porting strategy is endorsed by the common sense among reegineers, who assert that the majority of industrial Cobol programs must be first restructured prior to reengineering [16].

4.2 Conversion

There are four basic steps for the conversion:

After that, for each Cobol module a definition file (.h) and an implementation file with the C++ program are defined. We may also use auxiliary C++ libraries which emulate some of Cobol basic types. This run-time library, named CobLib, may be linked either statically or dynamically to the generated modules. The library encapsulates the Cobol types decimal and string as well as some I/O related aspects, such as file access, screen management and report writing.


4.3 C/C++ Structuring

The code resulting from the steps described above is not the usual native C/C++ code. The structuring process objective is to transform this program to make it easier to be read by C++ programmers. Using data dependence information, the global variables are distributed over functions and input and output parameters are allocated to each function. By performing these transformations we expand the structuring strategies used at the Cobol Structuring step and make it possible the identification of local variables, functions and abstract data types.

4.4 The Program Knowledge Base

In order to apply our conversion strategy, three aspects about a program must be available: control flow, data structure and data dependence information. We decided to compose a simple but functional program model which was able to encompass these three types of information. The result was the program model depicted in Figure 9. The dashed boxes in the model represent views over the data schema for programs.

Figure 9: Program Knowledge Base Schema

The control flow view offers two kinds of information: intra-procedural and inter-procedural control flow information. Intra-procedural control flow information can tell which are the basic blocks in a program and how these basic blocks dominate each other. Inter-procedural control flow information stores the calling relationship among functions and procedures.

The data dependence view provides information on which statements declare, manipulate and use variables in the program. This very basic kind of information can assist rather complex program understanding methods such as detection of functional, object-oriented or parallel aspects of a program.

The data structure view provides a hierarchical decomposition of data types so that they can be understood both in its high-level aspects as well as on its atomic parts.

In addition to these views, we also have a program lexicon. The program lexicon contains all the names that characterize an instance of the meta-model (Figure 9) for a given system or program under study. Special extraction and encapsulation transformations produce the lexicon from the source code being ported. Not only we keep each word used, but also all the other words it relates to. It can be visualized in a hypertext form and should be an important link to reverse engineering information derived from non-code sources [17].

5. Using Draco-PUC to Implement the Porting

In order to implement a porting strategy in Draco we must go through three basic steps: the construction of source and target domain parsers and prettyprinters, the construction of auxiliary libraries which help bridging the semantic gap between domains and the construction of the transformers themselves, which encapsulate the overall strategy showed in Figure 8. As also seen in Figure 8, besides the three basic steps of our conversion model, we need a program knowledge base in order to apply the porting transformations. Below we describe in detail all the aspects of the porting strategy.

5.1 Preparing Domain Parsers and Prettyprinters

In order to apply our strategy we had to define the C++ and the Cobol domains. Since they are executable domains we have defined a grammar for each language, and also the prettyprinter for each language. For the Cobol grammar we have used the ANSI 85 [18] specification and for C++ we have used Stroustrup's proposal [19]. The implementation of these grammars was made easier since the Draco-PUC parser generator has a backtracking mechanism, which makes possible the use of the whole class of context-free grammars.

5.2 Building Auxiliary Libraries

Besides the parsers and prettyprinters we also have developed a C++ run-time library named CobLib. The library encapsulates the Cobol types decimal and string as well as some I/O related aspects, such as file access, screen management and report writing. The library is used throughout the transformations as a source for classes that will be used as the basic components of the ported code. For example, the class CblString will be used in the transformations every time it is necessary to port Cobol commands that uses strings. In that case the CblString will encapsulate methods that will mirror the Cobol commands.

  1. Conversion

The automatic support for the conversion strategy is based on a three step process. Each one defines a different set of transformations. Below we detail these steps.

5.3.1 CobStructurer

This transformer does a control flow analysis and divides the program in blocks. This division is done according to the dependence among the blocks as determined by PERFORM calls and basic block dominance. In order to do this we have annotated program parts with an auxiliary data structure. So, a set of transformations was built in order to annotate all the statements with a data structure that defines the dependencies with respect to control flow. As such, after applying the transformations we have a dependency graph for the control flow. This graph is stored as part of the program knowledge base.

The construction of the auxiliary data structure is performed by transformations without a rhs, in which the control point post-match activates the actions that fill in the data structure. After the analysis, the structuring itself takes place. The set of transformations that perform this step is based on clustering heuristics in order to form program blocks. One of such heuristics is the finding of common input and output points in the flow of control.

5.3.2 CobC

The CobC transformer is composed of two main sets of transformations and several other auxiliary sets. The first one analyzes the data division using the data structure view of the program knowledge base. The second set of transformations represents the porting itself, that is, it defines the semantic mappings between the structured Cobol and C++. For the definition of patterns in the transformations we use the original domain language with pattern variables. Pattern variables are surrounded by double brackets and are typed with rules in the domain language syntax. Figure 10 shows an example of one of the CobC porting transformations.


/*


The If1 transformation rule is a simple rewriting rule

which finds a Cobol IF statement and then replaces it

by a corresponding C++ IF statement.

COND and STMTS are the names of

pattern variables used in the

transformation. When the match occurs, COND holds the

conditional expression for the if statement and STMTS

holds zero or more statements corresponding to the

sequence of statements that are executed when the

conditional expression is true.

*/

Transform If1

Lhs: {{dast cobol.statement

IF [[expr COND]] THEN [[statement* STMTS]]

}}

Rhs: {{dast cpp.dstatement

if ( [[expression COND]] ) {

[[dstatement* STMTS]]

}

}}

Figure 10: Example of CobC transformations - IF1

The pattern variable name COND is typed as an expr (a Cobol expression following our Cobol syntax). The star (*) qualifier means sequence of 0 or more elements, so the pattern variable STMTS is defined as being of type sequence-of-Cobol-statements, indicating it may hold a sequence of 0 or more statements. In general, upper-case types refer to terminals (tokens) in the domain language syntax. Each pattern is enclosed by double braces, having an indication of what kind of object must be pointed by the main Dast locale pointer when trying to apply the transformation. The construction "dast cobol.statement" in If1 lhs indicates the transformation If1 must match a Cobol statement.

Figure 11 shows another CobC transformation mapping Cobol ACCEPT statements into a C++ method invocation statement. In this case Cobol INT is being mapped to C++ INTEGER_CONSTANT. The conversion of the Cobol simple_term to a C++ unary_expression is accomplished through the use of other transformations in the CobC transformer (see Figure 17).


/*


Transformation Accept finds Cobol ACCEPT statements and

maps then to Accept method calls in C++. The existence

of a method called Accept assumes the

cobol_data_types library is being used.

Pattern variables X and Y hold the display position for

the user input; they are of type integer, which means

type INT when referring to Cobol code and

INTEGER_CONSTANT when referring to C++ code.

Pattern variable V holds the Cobol term containing

the user input. V is assumed to be properly transformed

to a C++ object which has an Accept method taking two

integer arguments.

*/

Transform Accept

Lhs: {{dast cobol.statement

ACCEPT ( [[INT X]] , [[INT Y]] ) [[simple_term V]]

}}

Rhs: {{dast cpp.dstatement

[[unary_expression V]].Accept( [[INTEGER_CONSTANT X]] ,

[[INTEGER_CONSTANT Y]] );

}}

Figure 11: Example of CobC transformations - ACCEPT

5.3.3 CStructurer

This transformer manipulates the generated C++ programs so that they can be made more readable by C++ programmers. One of the examples of this transformer is the treatment of repetition structures implemented by if-goto-label combinations. In this specific case we have written 5 sets of transformations.


/*


If, inside a C++ compound statement, a sequence of

statements is followed by another sequence of

statements starting with a label and ending in a

condition that address this label, followed by another

sequence of statements, then this pattern is

candidate to define in reality a do-while statement.

This is what precisely the lhs of transformation

Label-If-Goto describes.

*/

Transform Label-If-Goto

Lhs: {{dast cpp.compound_statement

{

[[dstatement* ST1]]

[[IDENTIFIER LBL]]: [[dstatement ST]]

[[dstatement* ST2]]

if ( [[expression E]] )

goto [[IDENTIFIER LBL]];

[[dstatement* ST3]]

}

}}

// Label-If-Goto is only applied if there is no

// reference to label LBL inside the sequences of

// statements ST1, ST2 and ST3.

// This constraint is guaranteed through a test

// performed by the application of a set of transforms

// named FIND_LABEL_CALL. This set of transforms

// takes as parameters a label name and a

// sequence of statements and returns true if it

// finds any occurrence of this label inside the

// the sequence of statements. It returns 0 otherwise.

Post-Match: {{dast cpp.statement_list

if (apply("FIND_LABEL_CALL", "LBL", "ST1"))

return(0);

if (apply("FIND_LABEL_CALL", "LBL", "ST2"))

return(0);

if (apply("FIND_LABEL_CALL", "LBL", "ST3"))

return(0);

return(1);

}}

Rhs: {{dast cpp.compound_statement

{

[[dstatement* ST1]]

do {

[[dstatement ST]]

[[dstatement* ST2]]

} while ( [[expression E]] );

[[dstatement* ST3]]

}

}}

Figure 12: Example of CStructurer transformation

The support transformations are parameterized by a label name and have as input a list of C++ statements. The first set of transformations verifies if there are references to the label in the list. The second set replaces the occurrences of the label by break points, assuming that the statements will be nested in a repetition structure. The other two sets transform the if-goto-label structures into do-while structures when it is possible to get rid of the labels. The last set disposes label definitions proven to be useless in the previous steps. In Figure 12 we list one of the transformations responsible for the manipulation of repetition structures.

5.4 Program Knowledge Base

In order to define an infra-structure for a reverse engineering effort three questions must be clearly answered:

  1. What are the relevant aspects of a program?
  2. How the relevant program information must be extracted?
  3. How program information must be represented and accessed?

These questions are related to important issues on the organization of program knowledge bases. The first question addresses the need of a program model. This program model expresses the important aspects about a program from the point of view of reverse engineering. The second question addresses the methods for program information extraction. The extraction methods define how programs must be analyzed so it is possible to derive the relevant information. The third question addresses the encapsulation problem. It seeks for answers on what are proper representational methods so program information can be stored and accessed during the reverse engineering process.

Addressing question 1 we decided to store information about control flow, data dependence and data structures. The entities and relationships that we considered relevant are described in our program knowledge base schema, as presented in Figure 9. The following two sub-sections describe how the remaining questions were treated by our team when trying to put forward our Cobol to C++ conversion strategy.

5.4.1 Extraction Procedures

We decided to use transformations to extract information used in the program model. The basic idea was to define patterns and procedures that can identify the relevant entities and relationships. For example, it is possible to collect some Cobol file declarations in the code by using the pattern shown in Figure 13.


Lhs: {{dast cobol.statement


SELECT [[optional OP]] [[ID ANAME]]

ASSIGN TO [[file_assignment FA]]

[[file_attrib* FA1]]

FILE STATUS IS [[ID VNAME]]

[[file_attrib* FA2]].

}}

Figure 13: Finding SELECT statements

Every time this pattern matches against a code fragment, the variable ANAME will hold the name of a Cobol file declaration in a SELECT statement.

5.4.2 Encapsulation Methods

We decided to use predicates on a knowledge base to store facts about programs. As an example, if we find the Cobol SELECT statement presented in Figure 14,


SELECT RADAR-FILE


ASSIGN TO DISK

FILE STATUS IS FSTAT

ORGANIZATION IS LINE SEQUENTIAL.

Figure 14: Example of a SELECT statement

the knowledge base should encapsulate at least the facts which are shown in Figure 15:


file(RADAR-FILE).


file_status(RADAR-FILE, FSTAT).

Figure 15: Example of PKB facts

The storage of facts when a pattern is found is performed through the use of sets of transformations we call analyzers. The analyzers are mainly composed of transformations having just a lhs and a post-match control point. At the control point an insertion command is issued to the inference engine communication interface which is responsible for creating the desired facts. An example of an analysis transformation is presented in Figure 16:


/*


The analysis transform Select_Extractor finds a SELECT statement

in a Cobol program and stores in the PKB only two facts. The

first fact defines the name of the file being declared in the

program and the other fact describes what is the Cobol term that

is holding the file status.

*/

Transform Select_Extractor

Lhs: {{dast cobol.statement

SELECT [[optional OP]] [[ID ANAME]]

ASSIGN TO [[file_assignment FA]]

[[file_attrib* FA1]]

FILE STATUS IS [[ID VNAME]]

[[file_attrib* FA2]].

}}

// For the storage of the facts we use the PKB method Assert and

// the function expand.

// The function expand takes a string and substitutes

// the content of a pattern variable for any occurrence of a

// pattern variable reference in the string.

Post-Match: {{dast cpp.statement_list

PKB.Assert(expand("is_a(file, [[ANAME]])"));

PKB.Assert(expand("file_status([[ANAME]], [[VNAME]])"));

}}

Figure 16: Example of an analysis transformation

In this example, every time a file declaration is captured, two new entries are created in the program knowledge base. One of the facts states the name of the variable responsible for mapping the physical file and the other fact defines what variable is going to hold the file status.

Similarly to the case of analyzers, generators do also correspond to sets of transformations in our approach. In this case, the transformations manipulate the program using the knowledge gathered by the analyzers. The access to the stored facts is performed through the inference engine communication interface. The transformation shown in Figure 17 is responsible for converting to C++ a file reference occurring as a term in a Cobol expression.


Transform File_As_An_Expression_Term


/*

This transformation tests if a simple term in an expression

is a variable which holds the status of a file.

The test is performed through the query "file_status(*x, [[NAME]])".

If the query is successful then the Cobol term is converted

to the C++ Status field belonging to the associated file.

The inference engine take as variables any identifier preceeded by

a star (*), such as the variable x in the query.

*/

Lhs: {{dast cobol.simple_term

[[ID NAME]]

}}

Post-Match: {{dast cpp.statement_list

// Here we construct the query which tests if the content of [[ID NAME]]

// is a file status variable and put the answer on the local variable

// named result.

// Variable result is of type TLEnv which means a list of environments,

// holding all possible environment answers for the query (possible

// bindings of x once the PKB is taken).


TLEnv *result =


PKB.Solve(expand("file_status(*x, [[NAME]])"));


if (result) {


//

// if [[ID NAME]] is a file status variable (result==1) then the

// pattern variable CPP_NAME is updated with the value of x, which

// is retrieved from the first environment in result

//

SET("CPP_NAME", result->Retrieve("x"));

return(1);

}

else

return(0);

}}

Rhs: {{dast cpp.unary_expression

[[IDENTIFIER CPP_NAME]].Status

}}

Figure 17: Example of transformation using the PKB

In the above example, once a Cobol term is matched, inside the post-match control point, a query is constructed which do ask if the variable NAME is a file-status variable. To carry on this query, the inference engine works with the program knowledge base (PKB) previously constructed by the analyzers. The result of the query over the file_status predicate is stored in the variable result, which is then tested. If the term is a file-status variable then pattern variable named CPP_NAME is set and the rhs will be instantiated. The CPP_NAME refers to the C++ corresponding name of the object that encapsulates the file being used in the program. Objects, which are associated to files, do have a data member named Status holding information about the file status. The rhs of File_As_An_Expression_Term retrieves the contents of Status.

5.5 Porting a Radar Cataloging Program

In order to make our strategy more clear we show in this sub-section parts of the porting of a radar cataloging program (radar.cbl). Although the program is a small one (500 lines), it does have characteristic aspects of file management, user interface and it is not coded following the structured programming principles. In the next sequence of figures we present partial instances of the code at different times during the porting. Shadowed areas and dashed arrows between the figures help the reader follow the application of the conversion transformations.

Figure 18 shows fragments of the original program with the file description, some data description and some paragraphs that are related in terms of control flow. Figure 19 shows the same program fragment after the transformer CobStructurer has been applied. There is no change with respect to the data area, but at the procedure division we see the new labels: LIST-RADARS-FILE-ERR and LIST-RADARS-EXIT. These labels were introduced at the code in order to make possible, using the clustering heuristic, to create a block containing the paragraphs LIST-RADARS, LIST-RADARS-LOOP, LIST-RADARS-END, LIST-RADARS-FILE-ERR and LIST-RADARS-EXIT. Note that the reference to FILE-ERR is replaced by LIST-RADARS-FILE-ERR at the LIST-RADARS paragraph and that an unconditional jump to LIST-RADARS-EXIT is included before the paragraph LIST-RADARS-FILE-ERR.

Figures 20 and 21 show the direct porting from Cobol to C++. It is worth noting on Figure 21 the class declaration created to encapsulate the data file and the several declarations which correspond to the Cobol declarations. It is also interesting to observe that the paragraphs grouped by the previous transformer became a function named Func_list_radars.

Figures 22 and 23 show the state of the program after the application of the transformation Label-If-Goto while the transformer Cstructurer is being applied. Finally, Figures 24 and 25 show the listings for the final step, when the transformer Cstructurer finishes its job. Note in the last listing (Figure 25) the label suppression and the while structure replacing the if-goto-label repetition structure.

This example was run on a Sparc 2, using SunOS with 32Mb, shared on a network. It took around 36 seconds to finish the porting of the whole program which required 1112 applications of transformations from the 49 available transformations.













*


* Suppressed Code

*

FILE-CONTROL.

SELECT RADARS-FILE

ASSIGN TO DISK

FILE STATUS IS FSTAT

ORGANIZATION IS LINE SEQUENTIAL.

DATA DIVISION.

FILE SECTION.

FD RADARS-FILE

LABEL RECORDS STANDARD

VALUE OF FILE-ID IS "RADARS.DAT".

01 RADAR-FILE-RECORD.

05 RADAR-NAME PIC X(30).

*

* Suppressed Code

*

WORKING-STORAGE SECTION.

01 TMP-NUM PIC 99999.

01 FSTAT PIC X(02).

01 ERR-MSG PIC X(40)VALUE

"ERROR ACCESSING FILE RADARS.DAT".

01 MAIN-SCREEN1 .

04 LINE0 PIC X(30)VALUE "MAIN MENU".

04 LINE1 PIC X(30)VALUE "D- Detection/Analysis".

*

* Suppressed Code

*

LIST-RADARS.

PERFORM CLEAR-SCREEN THRU CLEAR-SCREEN-IF.

PERFORM CABC-PGM.

DISPLAY (2, 24) HEAD2 OF SCREEN-2.

PERFORM DISP-SCRN2.

OPEN INPUT RADARS-FILE.

IF FSTAT EQUAL 30 GO TO FILE-ERR.

LIST-RADARS--LOOP.

READ RADAR-FILE

AT END, GO TO LIST-RADARS-END.

DISPLAY (5, 50) RADAR-NAME.

DISPLAY (7, 50) MAX-FREQ.

DISPLAY (9, 50) MIN-FREQ.

DISPLAY (11, 50) MAX-FRP.

DISPLAY (13, 50) MIN-FRP.

DISPLAY (15, 50) MAX-LP.

DISPLAY (17, 50) MIN-LP.

DISPLAY (19, 50) MAX-SRP.

DISPLAY (21, 50) MIN-SRP.

DISPLAY (23, 20) USER-PROMPT OF SCREEN-2.

ACCEPT (23, 43) VAL.

IF VAL NOT EQUAL "n" AND VAL NOT EQUAL "N"

GO TO LIST-RADARS-LOOP.

LIST-RADARS-END.

DISPLAY (23, 20) END-OF-RADARS OF SCREEN-2.

ACCEPT (23, 43) VAL.

CLOSE RADARS-FILE.

*

* Suppressed Code

*

FILE-ERR.

PERFORM CLEAR-SCREEN THRU CLEAR-SCREEN-IF.

DISPLAY ERR-MSG.

STOP RUN.




Fig. 18: Original Program Code Fragments

*


* Suppressed Code

*

FILE-CONTROL.

SELECT RADARS-FILE

ASSIGN TO DISK

FILE STATUS IS FSTAT

ORGANIZATION IS LINE SEQUENTIAL.

DATA DIVISION.

FILE SECTION.

FD RADARS-FILE

LABEL RECORDS STANDARD

VALUE OF FILE-ID IS "RADARS.DAT".

01 RADAR-FILE-RECORD.

05 RADAR-NAME PIC X(30).

*

* Suppressed Code

*

WORKING-STORAGE SECTION.

01 TMP-NUM PIC 99999.

01 FSTAT PIC X(02).

01 ERR-MSG PIC X(40)VALUE

"ERROR ACCESSING FILE RADARS.DAT".

01 SCREEN-1 .

04 LINE0 PIC X(30)VALUE "MAIN MENU".

04 LINE1 PIC X(30)VALUE "D- Detection/Analysis".

*

* Suppressed Code

*

LIST-RADARS.

PERFORM CLEAR-SCREEN THRU CLEAR-SCREEN-IF.

PERFORM CABC-PGM.

DISPLAY (2, 24) HEAD2 OF SCREEN-2.

PERFORM DISP-SCREEN2.

OPEN INPUT RADARS-FILE.

IF FSTAT EQUAL 30 GO TO LIST-RADARS-FILE-ERR.

LIST-RADARS-LOOP.

READ RADAR-FILE AT END GO TO LIST-RADARS-END.

DISPLAY (5, 50) RADAR-NAME.

DISPLAY (7, 50) MAX-FREQ.

DISPLAY (9, 50) MIN-FREQ.

DISPLAY (11, 50) MAX-FRP.

DISPLAY (13, 50) MIN-FRP.

DISPLAY (15, 50) MAX-LP.

DISPLAY (17, 50) MIN-LP.

DISPLAY (19, 50) MAX-SRP.

DISPLAY (21, 50) MIN-SRP.

DISPLAY (23, 20) USER-PROMPT OF SCREEN-2.

ACCEPT (23, 43) VAL.

IF VAL NOT EQUAL "n" AND VAL NOT EQUAL "N"

GO TO LIST-RADARS-LOOP.

LIST-RADARS-END.

DISPLAY (23, 20) END-OF-RADARS OF SCREEN-2.

ACCEPT (23, 43) VAL.

CLOSE RADARS-FILE.

GO TO LIST-RADARS-EXIT.

LIST-RADARS-FILE-ERR.

PERFORM CLEAR-SCREEN THRU CLEAR-SCREEN-IF.

DISPLAY ERR-MSG.

STOP RUN.

LIST-RADARS-EXIT.

EXIT

*

* Suppressed Code

*

Fig. 19: Program Fragments after CobStructurer


*


* Suppressed Code

*

FILE-CONTROL.

SELECT RADARS-FILE

ASSIGN TO DISK

FILE STATUS IS FSTAT

ORGANIZATION IS LINE SEQUENTIAL.

DATA DIVISION.

FILE SECTION.

FD RADARS-FILE

LABEL RECORDS STANDARD

VALUE OF FILE-ID IS "RADARS.DAT".

01 RADAR-FILE-RECORD.

05 RADAR-NAME PIC X(30).

*

* Suppressed Code

*

WORKING-STORAGE SECTION.

01 TMP-NUM PIC 99999.

01 FSTAT PIC X(02).

01 ERR-MSG PIC X(40)VALUE

"ERROR ACCESSING FILE RADARS.DAT".

01 SCREEN-1 .

04 LINE0 PIC X(30)VALUE "MAIN MENU".

04 LINE1 PIC X(30)VALUE "D- Detection/Analysis".

*

* Suppressed Code

*

LIST-RADARS.

PERFORM CLEAR-SCREEN THRU CLEAR-SCREEN-IF.

PERFORM CABC-PGM.

DISPLAY (2, 24) HEAD2 OF SCREEN-2.

PERFORM DISP-SCREEN2.

OPEN INPUT RADARS-FILE.

IF FSTAT EQUAL 30 GO TO LIST-RADARS-FILE-ERR.

LIST-RADARS-LOOP.

READ RADAR-FILE

AT END GO TO LIST-RADARS-END.

DISPLAY (5, 50) RADAR-NAME.

DISPLAY (7, 50) MAX-FREQ.

DISPLAY (9, 50) MIN-FREQ.

DISPLAY (11, 50) MAX-FRP.

DISPLAY (13, 50) MIN-FRP.

DISPLAY (15, 50) MAX-LP.

DISPLAY (17, 50) MIN-LP.

DISPLAY (19, 50) MAX-SRP.

DISPLAY (21, 50) MIN-SRP.

DISPLAY (23, 20) USER-PROMPT OF SCREEN-2.

ACCEPT (23, 43) VAL.

IF VAL NOT EQUAL "n" AND VAL NOT EQUAL "N"

GO TO LIST-RADARS-LOOP.

LIST-RADARS-END.

DISPLAY (23, 20) END-OF-RADARS OF SCREEN-2.

ACCEPT (23, 43) VAL.

CLOSE RADARS-FILE.

GO TO LIST-RADARS-EXIT.

LIST-RADARS-FILE-ERR.

PERFORM CLEAR-SCREEN THRU CLEAR-SCREEN-IF.

DISPLAY ERR-MSG.

STOP RUN.

LIST-RADARS-EXIT.

EXIT

*

* Suppressed Code

*

Fig. 20: Program Fragments after CobStructurer

#include "CobLib.h"


class Class_Radars_File :public CblFile{

public:

Class_Radars_File() : CblFile("RADARS.DAT") {}

int Read();

void Write();

void ReWrite() {

fseek(file,last_pos,0);

Write();

}

};

CblString radar_file_record_radar_name (30);

/*

Suppressed Code

*/

CblDecimal Tmp_num (5,0);


CblString Err_msg (40,"ERROR ACCESSING FILE


RADARS.DAT");


CblString screen_1_line0 (30,"MAIN MENU");


CblString screen_1_line1 (30,"D- Detection/Analysis");

/*

Suppressed Code

*/

void Func_list_radars() {

list_radars:

Func_clear_screen();

Func_cabc_pgm();

screen_2_head2.Display(2,24);

Func_disp_screen2();

radars_file.Open("rb");

if (radars_file.Status==30)

goto list_radars_file_error;

list_radars_loop:

if (!radar_file.Read()) {

goto list_radars_end;

}

radar_file_record_radar_radar.Display(5,50);

radar_file_record_max_freq.Display(7,50);

radar_file_record_min_freq.Display(9,50);

radar_file_record_max_frp.Display(11,50);

radar_file_record_min_frp.Display(13,50);

radar_file_record_max_lp.Display(15,50);

radar_file_record_min_lp.Display(17,50);

radar_file_record_max_srp.Display(19,50);

radar_file_record_min_srp.Display(21,50);

screen_2_user_prompt.Display(23,20);

Val.Accept(23,43);

if (Val!="n"&&Val!="N")

goto list_radars_loop;

list_radars_end:

screen_2_end_of_radars.Display(23,20);

Val.Accept(23,43);

radar_file.Close();

goto list_radars_exit;

list_radars_file_error:

Func_clear_screen();

Err_msg.Display();

exit(0);

list_radars_exit:

return;

}

Fig. 21: Program Fragments after CobC

#include "CobLib.h"


class Class_Radars_File :public CblFile{

public:

Class_Radars_File() : CblFile("RADARS.DAT") {}

int Read();

void Write();

void ReWrite() {

fseek(file,last_pos,0);

Write();

}

};

CblString radar_file_record_radar_name (30);

/*

Suppressed Code

*/

CblDecimal Tmp_num (5,0);


CblString Err_msg (40,"ERROR ACCESSING FILE


RADARS.DAT");


CblString screen_1_line0 (30,"MAIN MENU");


CblString screen_1_line1 (30,"D- Detection/Analysis");

/*

Suppressed Code

*/

void Func_list_radars() {

list_radars:

Func_clear_screen();

Func_cabc_pgm();

screen_2_head2.Display(2,24);

Func_disp_screen2();

radars_file.Open("rb");

if (radars_file.Status==30)

goto list_radars_file_error;

list_radars_loop:

if (!radar_file.Read()) {

goto list_radars_end;

}

radar_file_record_radar_radar.Display(5,50);

radar_file_record_max_freq.Display(7,50);

radar_file_record_min_freq.Display(9,50);

radar_file_record_max_frp.Display(11,50);

radar_file_record_min_frp.Display(13,50);

radar_file_record_max_lp.Display(15,50);

radar_file_record_min_lp.Display(17,50);

radar_file_record_max_srp.Display(19,50);

radar_file_record_min_srp.Display(21,50);

screen_2_user_prompt.Display(23,20);

Val.Accept(23,43);

if (Val!="n"&&Val!="N")

goto list_radars_loop;

list_radars_end:

screen_2_end_of_radars.Display(23,20);

Val.Accept(23,43);

radar_file.Close();

goto list_radars_exit;

list_radars_file_error:

Func_clear_screen();

Err_msg.Display();

exit(0);

list_radars_exit:

return;

}

Fig. 22: Program Fragments after CobC

#include "CobLib.h"


class Class_Radars_File :public CblFile{

public:

Class_Radars_File() : CblFile("RADARS.DAT") {}

int Read();

void Write();

void ReWrite() {

fseek(file,last_pos,0);

Write();

}

};

CblString radar_file_record_radar_name (30);

/*

Suppressed Code

*/

CblDecimal Tmp_num (5,0);


CblString Err_msg (40,"ERROR ACCESSING FILE RADARS.DAT");



CblString screen_1_line0 (30,"MAIN MENU");


CblString screen_1_line1 (30,"D- Detection/Analysis");

/*

Suppressed Code

*/

void Func_list_radars() {

list_radars:

Func_clear_screen();

Func_cabc_pgm();

screen_2_head2.Display(2,24);

Func_disp_screen2();

radars_file.Open("rb");

if (radars_file.Status==30)

goto list_radars_file_error;

do {

if (!radar_file.Read()) {

goto list_radars_end;

}

radar_file_record_radar_name.Display(5,50);

radar_file_record_max_freq.Display(7,50);

radar_file_record_min_freq.Display(9,50);

radar_file_record_max_frp.Display(11,50);

radar_file_record_min_frp.Display(13,50);

radar_file_record_max_lp.Display(15,50);

radar_file_record_min_lp.Display(17,50);

radar_file_record_max_srp.Display(19,50);

radar_file_record_min_srp.Display(21,50);

screen_2_user_prompt.Display(23,20);

Val.Accept(23,43);

}

while (Val!="n"&&Val=="N");

list_radars_end:

screen_2_end_of_radars.Display(23,20);

Val.Accept(23,43);

radar_file.Close();

goto list_radars_exit;

list_radars_file_error:

Func_limpa_SCREEN();

Err_msg.Display();

exit(0);

list_radars_exit:

return;

}

Fig. 23: Use of the Label-If-Goto Transformation

#include "CobLib.h"


class Class_Radars_File :public CblFile{

public:

Class_Radars_File() : CblFile("RADARS.DAT") {}

int Read();

void Write();

void ReWrite() {

fseek(file,last_pos,0);

Write();

}

};

CblString radar_file_record_radar_name (30);

/*

Suppressed Code

*/

CblDecimal Tmp_num (5,0);


CblString Err_msg (40,"ERROR ACCESSING FILE


RADARS.DAT");


CblString screen_1_line0 (30,"MAIN MENU");


CblString screen_1_line1 (30,"D- Detection/Analysis");

/*

Suppressed Code

*/

void Func_list_radars() {

list_radars:

Func_clear_screen();

Func_cabc_pgm();

screen_2_head2.Display(2,24);

Func_disp_screen2();

radars_file.Open("rb");

if (radars_file.Status==30)

goto list_radars_file_error;

do {

if (!radar_file.Read()) {

goto list_radars_end;

}

radar_file_record_radar_name.Display(5,50);

radar_file_record_max_freq.Display(7,50);

radar_file_record_min_freq.Display(9,50);

radar_file_record_max_frp.Display(11,50);

radar_file_record_min_frp.Display(13,50);

radar_file_record_max_lp.Display(15,50);

radar_file_record_min_lp.Display(17,50);

radar_file_record_max_srp.Display(19,50);

radar_file_record_min_srp.Display(21,50);

screen_2_user_prompt.Display(23,20);

Val.Accept(23,43);

}

while (Val!="n"&&Val=="N");

list_radars_end:

screen_2_end_of_radars.Display(23,20);

Val.Accept(23,43);

radar_file.Close();

goto list_radars_exit;

list_radars_file_error:

Func_limpa_SCREEN();

Err_msg.Display();

exit(0);

list_radars_exit:

return;

}

Fig. 24: Program Fragments during CStructurer

#include "CobLib.h"


class Class_radars_file :public CblFile{

public:

Class_radars_file() : CblFile("RADARS.DAT") {}

int Read();

void Write();

void ReWrite() {

fseek(file,last_pos,0);

Write();

}

};

CblString radars_file_record_radar_name (30);

/*

Suppressed Code

*/

CblDecimal Tmp_num (5,0);


CblString Err_msg (40,"ERROR ACCESSING FILE


RADARS.DAT");


CblString screen_1_line0 (30,"MAIN MENU");


CblString screen_1_line1 (30,"D- Detection/Analysis");

/*

Suppressed Code

*/

void Func_list_radars() {

Func_clear_screen();

Func_cabc_pgm();

screen_2_head2.Display(2,24);

Func_disp_screen2();

radars_file.Open("rb");

if (!(Arq_radar.Status==30)) {

do {

if (!radars_file.Read()) {

break;

}

radars_file_record_radar_name.Display(5,50);

radars_file_record_max_freq.Display(7,50);

radars_file_record_min_freq.Display(9,50);

radars_file_record_max_frp.Display(11,50);

radars_file_record_min_frp.Display(13,50);

radars_file_record_max_lp.Display(15,50);

radars_file_record_min_lp.Display(17,50);

radars_file_record_max_srp.Display(19,50);

radars_file_record_min_srp.Display(21,50);

screen_2_user_prompt.Display(23,20);

Val.Accept(23,43);

} while (Val!="n" && Val!="N")

screen_2_end_of_radars.Display(23,20);

Val.Accept(23,43);

radars_file.Close();

return;

}

Func_clear_screen();

Err_msg.Display();

exit(0);

}








Fig. 25: Program Fragments after CStructurer

5.6 Porting a Payroll System

Another experiment was conducted using programs from the PUC-Rio payroll system. One of these programs was written in April of 1985 and still in use. It is 7.000 lines program for checking the consistency of input data and printing consistency reports. These reports are generated using the report-writer feature of IBM Cobol. We will focus our example on the porting of the report section. Figure 26 shows the CobLib class CBLReport that handles the main concepts for Cobol report writer.


class CblReport {


private:

int PageLimit;

int HeadingLine;

int FirstDetail;

int LastDetail;

int FootingLine;

int LineNumber; //The current line number

int PageCounter; //The current page number

// Controls do hold the set of controls

// used when the report is being generated

CblReportControls Controls;

// RH, PH, CH, DE, CF, PF and RF must be implemented inside

// classes that have CblReport as its base class. These

// methods encapsulate the functionallity of a report.

void ReportHeading();

void PageHeading();

void ControlHeading(char* ControlName);

void Detail(char* Detailname);

void ControlFooting(char* ControlName);

void PageFooting();

void ReportFooting();

// Show and ShowNTimes are private methods that are used

// inside the implemenation of RH, PH, CH, DE, CF, PF and

// RF methods.

// Show exhibits an String at Column in

// the current line (LineNumber).

// ShowNTimes repeats the exhibition of String N times

// at Column in the current line (LineNumber).

void Show(char* String, int Column);

void ShowNTimes(int N, char* String, int Column);

// AssociateWith link a CblFile to a CblReport, so that

// the output of the report be OutputFile

void AssociateWith(CblFile* OutputFile);

public:

CblReport(char* control_name,

int page_limit,

int heading_line,

int first_detail,

int last_detail,

int footing_line);

CblReport(CblReportControls* controls,

int page_limit,

int heading_line,

int first_detail,

int last_detail,

int footing_line);

void Initiate();

void Generate(char* DetailName);

void Terminate();

};

Fig. 26: The CblReport C++ Interface

Given this interface, we have developed transformations to fulfill this structure. A report description is captured by the lhs pattern in Figure 27.


Lhs: {{dast cobol.report_description


RD [[ID report_name]]

CONTROL IS [[ID primary_control]]

PAGE LIMIT IS [[INT page_limit]] LINES

HEADING [[INT heading]]

FIRST DETAIL [[INT first_detail]]

LAST DETAIL [[INT last_detail]]

FOOTING [[INT footing]].

[[report_record* records]]

}}

Fig. 27: Recognition pattern for Cobol report descriptions

Implemented in the Cobc transformer, three sets of transforms are chained in a hiererchical sequence to treat reports. Pattern variables on the above pattern are handled by these specialized transformations. Figure 28 shows the apply part of a transformation used to instantiate CblReport objects based on the data extracted by recognition pattern of Figure 27.


Pre-Apply: {{dast cpp.statement_list


//

// SET is a Draco API function

// which sets a pattern variable with the contents of a

// C++ expression.

// CblNameToCpp takes a Cobol symbol and rewrites it as

// a valid C++ symbol

//

SET("class_name", CblNameToCpp(expand("[[report_name]]")));

//

// As CblReport constructor initializer needs an string

// as its first argument we surround the contents of

// pattern variable with double quotes

//

SET("primary_control", expand("\"[[primary_control]]\""));

//

// We move the contents of workspace WSReportMethods into

// pattern variable ReportMethodsDeclaration

//

MOVE("WSReportMethodsDeclaration", "ReportMethodsDeclaration");

}}

Rhs: {{dast cpp.ext_declaration.rc("CblReport.tab")

class [[IDENTIFIER class_name]] :

public CblReport{[[IDENTIFIER class_name]]() :

CblReport ( [[STRING primary_control]],

[[INTEGER_CONSTANT page_limit]],

[[INTEGER_CONSTANT heading]],

[[INTEGER_CONSTANT first_detail]],

[[INTEGER_CONSTANT last_detail]],

[[INTEGER_CONSTANT footing]] ) {}

[[member_declaration* ReportMethodsDeclaration]]

};

}}

Fig. 28: Recognition pattern for Cobol report descriptions

Figure 29 shows a fragment of the academic management system for a report description. Figure 30 shows the same program fragment already in C++.

Linking Figures 29 and 30 dashed arrows show how mappings between the original Cobol report description and its C++ implementation counterpart are being tackled by the transformations in Cobc.

Mapping A shows how the parameters present in a report descriptor header become constructor parameters in a subclass of CblReport. Mapping B presents a line setting transformation. Mapping C depicts how constant strings are mapped inside a report description. Mapping D shows how variables are handled by the transformations. Mapping E displays how report totalizers are mapped. Mapping F shows the handling of repeated strings. Mapping G presents line increments. Mapping H show how details are transformed into conditional statements inside the body of the Detail method. Following the same guidelines control footings are mapped as it is shown in Mapping I. Mapping J depicts the transformation of report footings.

The main module report writer section encompass approximately 300 lines of code and took 5 seconds on a SunOS Sparc 2 with 32 Mb of memory. We estimate that the whole payroll system will take approximately 120 seconds to be completely ported once all transformations are set up.


























RD REL-ALTERA



CONTROL IS MATRICULA-DV-RP


PAGE LIMIT IS 65 LINES

HEADING 1

FIRST DETAIL 10

LAST DETAIL 65

FOOTING 65.

01 TYPE IS PH.

02 LINE 1.

03 COLUMN 3 PIC X(129) VALUE ALL "-".

02 LINE 3.

03 COLUMN 3 PIC X(06) VALUE "PUC-RJ".

03 COLUMN 20 PIC X(72) VALUE

"PONTIFICIA UNIVERSIDADE CATOLICA


DO RIO DE JANEIRO - GERENCIA DE PESSOAL".



02 LINE 5.


03 COLUMN 3 PIC X(08) SOURCE DATA-CORRETA.

03 COLUMN 20 PIC X(57) VALUE

"PGRR042 - RELATORIO DE ATUALIZACAO

DO CADASTRO DE PESSOAL".

03 COLUMN 100 PIC X(04) VALUE "PAG:".

03 COLUMN 104 PIC ZZ.ZZ9 SOURCE

PAGE-COUNTER OF REL-ALTERA.

02 LINE 7.

03 COLUMN 3 PIC X(129) VALUE ALL "-".

02 LINE 8.

03 COLUMN 8 PIC X(09) VALUE "MATRICULA".

03 COLUMN 20 PIC X(05) VALUE "CAMPO".

01 DETALHE-ALT-1

TYPE IS DETAIL

LINE PLUS 2.


02 COLUMN 8 PIC X(07) SOURCE MATRICULA-DV-RP.



02 COLUMN 17 PIC X(27) SOURCE CAMPO-RP.


02 COLUMN 45 PIC X(06) VALUE "DE :".

02 COLUMN 55 PIC X(50) SOURCE ALTERADO-RP.

02 COLUMN 109 PIC X(20) SOURCE DES-CONS.

01 DETALHE-ALT-2

TYPE IS DETAIL

LINE PLUS 1.

02 COLUMN 45 PIC X(06) VALUE "PARA :".

02 COLUMN 55 PIC X(50) SOURCE ALTERACAO-RP.

*

* Supressed Code

*

01 DETALHE-ALT-7

TYPE IS DETAIL

LINE PLUS 1.

02 COLUMN 1 PIC X(17) VALUE "CONTINUACAO----".

02 COLUMN 18 PIC X(029) SOURCE

LISTA-VET-ALTERADO-8-2.

01 TYPE IS CF MATRICULA-DV-RP.

03 LINE PLUS 1.

03 COLUMN 3 PIC X(03) VALUE ALL "*".

01 TYPE IS RF

LINE NEXT PAGE.

02 LINE 2.

03 COLUMN 40 PIC X(29) VALUE

"REGISTROS INCLUIDOS =".

03 COLUMN 72 PIC ZZZZ9 SOURCE INCLUIDOS.

02 LINE 3.

03 COLUMN 40 PIC X(29) VALUE


"ALTERADOS/EXCLUIDOS =".



03 COLUMN 72 PIC ZZZZ9 SOURCE ALTERADOS.




Fig. 29: Example of a Cobol report description

class Rel_altera :public CblReport{



Rel_altera(): CblReport("Matricula_dv_rp", 65,1,10,65,65) {}


void Rel_altera::PageHeading();

void Rel_altera::Detail(char * DetailName);

void Rel_altera::ControlFooting(char * ControlName);

void Rel_altera::ReportFooting();

};

void Rel_altera::PageHeading() {

LineNumber=1;

ShowNTimes(129,"-",3);

LineNumber=3;

Show("PUC-RJ",3);

Show("PONTIFICIA UNIVERSIDADE CATOLICA DO RIO

DE JANEIRO - GERENCIA DE PESSOAL", 20);

LineNumber=5;

Show(Data_correta,3);

Show("PGRR042 - RELATORIO DE ATUALIZACAO DO

CADASTRO DE PESSOAL", 20);

Show("PAG:",100);

Show(PageCounter,104);

LineNumber=7;

ShowNTimes(129,"-",3);

LineNumber=8;

Show("MATRICULA",8);

Show("CAMPO",20);

}

void Rel_altera::Detail(char * DetailName) {

if (!strcmp(DetailName,"DETALHE-ALT-1")) {

LineNumber+=2;

Show(Matricula_dv_rp, 8);

Show(Campo_rp,17);

Show("DE :",45);


Show(Alterado_rp,55);


Show(Des_cons,109);


}


if (!strcmp(DetailName,"DETALHE-ALT-2")) {

LineNumber+=1;

Show("PARA :",45);

Show(Alteracao_rp,55);

}

/*

Supressed Code

*/

if (!strcmp(DetailName,"DETALHE-ALT-7")) {

LineNumber+=1;

Show("CONTINUACAO----",1);

Show(Lista_vet_alterado_8_2,18);

}

}

void Rel_altera::ControlFooting(char * ControlName) {

if (!strcmp(ControlName,"Matricula_dv_rp")) {

LineNumber+=1;

ShowNTimes(03,"*",3);

}

}

void Rel_altera::ReportFooting() {

LineNumber=2;

Show("REGISTROS INCLUIDOS =",40);

Show(Incluidos,72);

LineNumber=3;

Show("ALTERADOS/EXCLUIDOS =",40);

Show(Alterados,72);


}



Fig. 30: Example of a report description in C++

6. Conclusion

This work describes a porting strategy based on transformations, automated through the use of the Draco-PUC transformation engine. In particular we detailed how to use a program knowledge base in conjunction with the transformational approach. A detailed example shows how the interaction between the program knowledge base and the transformations operate together.

The novelty of our strategy is the exploration of the transformational paradigm to the extent of using transformations to extract program structure from code and then using the produced information to help the application of the porting transformations.

The current implementation of the presented strategy is limited to representative Cobol programs such as the radar cataloging system. This approach permits us to test and expand Draco-PUC flexibility and power and also allows experiments on new reverse engineering techniques, opening frontiers for people wanting to take them into product-level.

About the scalability and power of our approach we have very good experiments showing that Draco can hold large Cobol systems. This claim is the result of our previous experience with Draco when we have already went through hard experiments mainly with very large C++ programs transformers for a C++ Design Extractor [20]. In this referenced work Draco has successfully handled a 42 modules, 40.000 lines of code, C++ system. The transformers presented in this paper still need to be further refined and we are looking forward to work with a commercial partner so we can continue our experiments. Another important aspect we still have not tackled are the database issues in reengineering Cobol programs.

Our research agenda is geared towards the improvement of our knowledge structure and its manipulation. Recent work on object discovery showed us that our program knowledge base encompasses the basic building blocks for a re-engineering strategy based on object-orientation. We found two recent reports that makes us believe we are on the correct path.

Gall and Klösh [21] propose heuristics for finding objects based on data store entities (DSEs) and non-data store entities (NDSEs) which act primarily over tables representing basic data dependence information. The tables are called (m, u)-tables, since they store information on the manipulation and use of variables. With the help of an expert a basic assistant model of the object oriented application architecture is devised for the production of the final reverse generated object-oriented application model (RooAM).

Yeh, Harris and Reubenstein [22] propose a more conservative approach based not just on the search of objects but more directed toward the finding of abstract data types (ADTs). The tool which encapsulates their approach, called OBAD, uses a data dependence graph between procedure and structure types as a starting point for the selection of abstract data types candidates. The procedures and structure types are the nodes of the graph, and the references by the procedures to the internal fields of the structure are the edges. The set of connected components in this graph form the set of candidate ADTs.

Encouraged by these and similar results, we are currently studying and testing some of the proposed heuristics in order to integrate them in our model.

Draco also runs on Solaris, Linux, NT and Windows 95. This example took almost the same time to run on a stand-alone 486 DX50 with 16Mb using these other operating systems. Draco is written on C++ with parts of it written on itself, which gives it a good degree of portability among operating systems.

Acknowledgments

We would like to thank Prof. José Lucas Rangel for his careful work on the revision of a previous version of this article. We would also like to thank Mr. Felipe Gouveia de Freitas for his ingenious solution given to our parser generator, without it it would be extremely painful to handle such complex grammars as those of Cobol and C++.

References

1. Proceedings of the Working Conference on Reverse Engineering, IEEE Computer Society Press, 1993.

2. ACM, Special Edition on Reverse Engineering, Communications of the ACM, vol. 37, n. 5, May 1994.

3. Proceedings of the Working Conference on Reverse Engineering, IEEE Computer Society Press, 1995.

  1. J. Leite, A. Prado, M. Sant'Anna, 'Draco-PUC, experiências e resultados de re-engenharia de software', in Proceedings of the VI Símposio Brasileiro de Engenharia de Software, 115-128, Sociedade Brasileira de Computação, 1992
  2. J. Neighbors, 'The Draco Approach to Constructing Software from Reusable Components', IEEE Transactions on Software Engineering, 10(5), 564-574, Sep.1984.

6. G. Arango , I. Baxter et al., 'TMM: Software Maintenance by Transformation', in IEEE Software, 3, 27-39, May 1986.

7. R. Balzer, T. Cheatham and C. Green. 'Software Technology in the 1990's: Using a New Paradigm', in IEEE Computer, vol. 16, n. 11, Nov. 1993.

8. REFINE User's Guide, Reasoning Systems Incorporated, Palo Alto, 1992.

9. D. Smith, G. Kotik, S. Westfold, 'KIDS: Research on Knowledge-Based Software Environments at Kestrel Institute', in IEEE Transactions on Software Engineering, 11, 1278-1295, Nov. 1985.

10. D. Wile D., 'POPART: Producer of Parsers and Related Tools System Builders'Manual', Technical Report, USC/Information Sciences Institute; Nov. 1993.

11. J. Boyle, 'Abstract Programming and Program Transformations - An Approach to Reusing Programs', in Software Reusability (Vol 1), 361-413, Ed. Ted Biggerstaff, ACM Press, 1989.

12. J. Cordy, I. Carmichael, 'The TXL Programming Language Syntax and Informal Semantics (V. 7)', in Technical Report, Queen's University at Kingston - Canada., June 1993. (TXL web site is http://www.qucis.queensu.ca/STLab/TXL)

13. J. Leite, A. Prado, M. Sant'Anna, 'Draco-PUC: A Case Study on Software Re-Engineering', in Position Paper Collection of the Second International Workshop on Software Reusability, Lucca, Italy; Mar., 1993.

14. J. Leite, M. Sant'Anna, F. Freitas, 'Draco-PUC: A Technology Assembly for Domain Oriented Software Development', in Proceedings of the Third International Conference on Software Reuse, 94-101, IEEE Computer Society Press, 1994.

15. IBM Corporation , 'IBM System/360 Operating System: Job Control Language User's Guide, Order No. Gc28-6703, Poughkeepsie, N.Y., 1970

16. H. Sneed, 'Reverse Engineering as a bridge to CASE', in Proceedings of the WCRE95, 300-313, IEEE Computer Society Press, 1995.

17. J. Leite and P. Cerqueira, 'Recovering Business Rules from Structured Analysis Specifications', in Proceedings of the WCRE95, 13-21, IEEE Computer Society Press, 1995.

18. N. Stern, R. Stern, The Wiley COBOL Syntax Reference Guide. John Wiley & Sons; 1990.

19. B. Stroustrup, The C++ Programming Language, Addison-Wesley; 1991.

20. Braga C., 'Uma Ferramenta para Geração de Documentação de Sistemas de Software', Dissertação de Mestrado, Departamento de Informática da Pontifícia Universidade Católica do Rio de Janeiro, Março de 1996.

21. H. Gall and H. Klösch, 'Finding Objects in Procedural Programs: An Alternative Approach', in Proceedings of the WCRE95, 208-216, IEEE Computer Society Press, 1995.

22. A. Yeh, D. Harris and H. Reubenstein, 'Recovering Abstract Data Types and Object Instances from a Conventional Procedural Language', in Proceedings of the WCRE95, 227-236, IEEE Computer Society Press, 1995.