HP 3000 Manuals

Using the Optimizer [ HP FORTRAN 77/iX Programmer's Guide ] MPE/iX 5.0 Documentation


HP FORTRAN 77/iX Programmer's Guide

Using the Optimizer 

The optimizer is an optional part of the compiler that modifies your
program so that machine resources are used more efficiently, using less
space and running faster.  The optimizer consists of 12 modules:  five
for level one optimization and seven for level two optimization.

This section describes the following:

   *   When and how to use the optimizer

   *   Level one and two optimization modules

   *   Optimizer assumptions

   *   How to write code that is easily optimized

   *   What to do if your optimized program fails

Introduction to the Optimizer 

You can run the optimizer in one of three ways: 

   1.  No optimization (this is the default).

   2.  Level one optimization:  This level only performs a subset of the
       available optimization modules.  The transformations performed are
       local to small subsections of code, and therefore are performed
       quickly with little run-time storage required by the compiler.
       Level one optimization should be used when some optimization is
       desired, but when compile time performance is more important than
       run time performance.

   3.  Level two optimization:  This level of optimization performs all
       of the available optimization modules.  Transformations are
       performed over the scope of each procedure.  If you use this level
       of optimization, the compiler uses more memory and takes longer to
       process your program.

When to Use the Optimizer 

Use the optimizer only on debugged code that is ready to run, because the
compiler cannot generate debug information and perform optimizations at
the same time.  After level two optimizations are performed, the code is
radically reordered and variable values might not be maintained in
memory, which makes symbolic debugging impossible.  Therefore, once a
program is optimized, you cannot use symbolic debugging unless you
recompile without optimization.

Invoking the Optimizer on MPE/iX 

Invoke the optimizer by specifying the OPTIMIZE compiler directive in
your source file or by passing the directives through the INFO string.
For level one optimization, use the command

     $OPTIMIZE LEVEL1

For level two optimization, use the command

     $OPTIMIZE

or

     $OPTIMIZE LEVEL2

Level One Optimization Modules 

The level one optimization modules are:

   *   Branch optimization

   *   Dead code elimination

   *   Faster register allocation

   *   Instruction scheduler

   *   Peephole optimization

The examples in this section are shown at the source code level wherever
possible; transformations that cannot be shown at the source level are
shown in assembly language.

Table 6-2  summarizes the assembly language routines.

          Table 6-2.  Descriptions of Assembly Language Routines 

---------------------------------------------------------------------------
|                          |                                              |
|       Instruction        |                 Description                  |
|                          |                                              |
---------------------------------------------------------------------------
|                          |                                              |
| LDW offset(sr, base),    | Loads a word from memory into register       |
| target                   | target.  sr is the space register (0 through |
|                          | 7); base is the base register (0 through     |
|                          | 31).                                         |
|                          |                                              |
---------------------------------------------------------------------------
|                          |                                              |
| ADDI const, reg, target  | Adds the constant const to the contents of   |
|                          | register reg and puts the result in register |
|                          | target.                                      |
|                          |                                              |
---------------------------------------------------------------------------
|                          |                                              |
| LDI const, target        | Loads the constant const into register       |
|                          | target.                                      |
|                          |                                              |
---------------------------------------------------------------------------
|                          |                                              |
| AND reg1, reg2, target   | Performs a bitwise AND of the contents of    |
|                          | registers reg1 and reg2 and puts the result  |
|                          | in register target.                          |
|                          |                                              |
---------------------------------------------------------------------------
|                          |                                              |
| COMIB, cond, const, reg, | Compares the constant const to the contents  |
| lab                      | of register reg and branches to label lab if |
|                          | the condition cond is true.                  |
|                          |                                              |
---------------------------------------------------------------------------
|                          |                                              |
| BB, cond, reg, num, lab  | Tests the bit number num in the contents of  |
|                          | register reg and branches to label lab if    |
|                          | the condition cond is true.                  |
|                          |                                              |
---------------------------------------------------------------------------
|                          |                                              |
| COPY  reg, target        | Copies the contents of register reg to       |
|                          | register target.                             |
|                          |                                              |
---------------------------------------------------------------------------
|                          |                                              |
| STW  reg, offset(sr,     | Stores the word in register reg to memory.   |
| base)                    | sr is the space register (0 through 7); base |
|                          | is the base register (0 through 31).         |
|                          |                                              |
---------------------------------------------------------------------------

Branch Optimization Module.   

The branch optimization module makes branch instruction sequences more
efficient whenever possible.  Examples of possible tranformations are:

   *   Deleting branches whose target is the fall-through instruction
       (that is, the target is two instructions away)

   *   When the target of a branch is an unconditional branch, changing
       the target of the first branch to be the target of the second
       unconditional branch

   *   Transforming an unconditional branch at the bottom of a loop,
       which branches to a conditional branch at the top of the loop,
       into a conditional branch at the bottom of the loop

   *   Changing an unconditional branch to the exit of a procedure into
       an exit sequence where possible

   *   Changing conditional or unconditional branch instructions that
       branch over a single instruction into a conditional nullification
       in the previous instruction

   *   Looking for conditional branches over unconditional branches,
       inverting the sense of the first branch and deleting the second
       branch.  These result from null THEN clauses and from THEN clauses
       that only contain GOTO statements.  For example, the code

                 IF (x) THEN
                      statement 1 
                 ELSE
                      GOTO 100
                 ENDIF
                 statement 2 
            100  statement 3 

       becomes

                 IF (.NOT. x) GOTO 100
                 statement 1
                   statement 2 
            100  statement 3 

Dead Code Elimination Module.   

The dead code elimination module removes unreachable code that is never
executed.

For example, the code

     if (.FALSE.) then
          a=1
     else
          a=2
     endif

becomes

     a=2

Faster Register Allocation Module.   

The faster register allocation module, used with unoptimized code,
analyzes register use faster than the advanced register allocator (a
level two module).

This module performs the following:

   *   Inserts entry and exit code

   *   Generates code for operations (such as multiplication and
       division)

   *   Eliminates unnecessary copy instructions

   *   Allocates actual registers to the dummy registers in instructions

Instruction Scheduler Module.   

The instruction scheduler module performs the following:

   *   Reorders the instructions in a basic block to improve memory
       pipelining.  (For example, where possible, a load instruction is
       separated from the use of the loaded register.)

   *   Where possible, follows a branch instruction with an instruction
       that can be executed as the branch occurs.

   *   Schedules floating point instructions.

For example, the code

     LDW     -52(0,30),r1
     ADDI    3,r1,r31    ;interlock with load of r1
     LDI     10,r19

becomes

     LDW     -52(0,sp),r1
     LDI     10,r19
     ADDI    3,r1,r31    ;use of r1 is now separated from load

Peephole Optimization Module 

The peephole optimization module is a machine-dependent module that
makes a pass through an intermediate representation of the code
applying patterns to a small window of code looking for optimization
opportunities.  The optimizations performed are:

   *   Changing the addressing mode of instructions so they use shorter
       sequences

   *   Substituting sequences of instructions that access bit fields with
       shorter, equivalent instructions

For example, the code

     LDI     32,r3
     AND     r1,r3,r2
     COMIB,= 0,r2,L1

becomes

     BB,>=   r1, 26, L1

Level Two Optimization Modules 

The level two optimization modules are:

   *   Advanced register allocation

   *   Induction variables and strength reduction

   *   Common subexpression elimination

   *   Constant folding

   *   Loop invariant code motion

   *   Store/Copy optimization

   *   Unused definition elimination

The examples in this section are shown at the source code level wherever
possible; transformations that cannot be shown at the source level are
shown in assembly language.  See Table 6-2  for a description of the
assembly language routines.

Advanced Register Allocation Module.   

The advanced register allocation module performs some copy optimizations,
as well as allocating registers.  Before the register allocator is run,
the instructions contain register numbers that do not correspond to
actual registers.  The register allocator assigns real registers to these
instructions and removes unnecessary COPY instructions.

For example, the following code shows the type of optimization the
coloring register allocation module performs.  The code

     LDI     2,r104
     COPY    r104,r103
     LDO     5(r103),r106
     COPY    r106,r105
     LDO     10(r105),r107

becomes

     LDI     2,r25
     LDO     5(r25),r26
     LDO     10(r26),r31

Strength Reduction.   

The induction variables and strength reduction module removes linear
functions of a loop counter and replaces them with the loop counter.
Variables of the same linear function are computed only once.  This
module also simplifies the function by replacing multiplication
instructions with addition instructions wherever possible.

For example, the code

     DO i = 1,10
     j(i) = i*k
     END DO

becomes

     t1 = k
     DO i = 1,10
          j(i) = t1
          t1 = t1+k
     END DO

Common Subexpression Elimination.   

The common subexpression elimination module identifies expressions that
appear more than once and have the same result, computes the result, and
substitutes the result for each occurrence of the expression.  The types
of subexpressions include instructions that load values from memory, as
well as arithmetic evaluation.

For example, the code

     a = x + y + z
     b = x + y + w

becomes

     t1 = x + y
     a = t1 + z
     b = t1 + w

Constant Folding Module.   

While the optimizer is collecting information about uses and definitions
of resources, the constant folding module replaces constant expressions
with their values.

For example, the code

     a = 1
     b = 2
     c = a + b

becomes

     a = 1
     b = 2
     c = 3

Loop Invariant Code Motion Module.   

The loop invariant code motion module recognizes instructions inside a
loop whose results do not change and moves the instructions outside the
loop.

For example, the code

     x = z
     DO i = 1,10
          a(i) = 4 * x + i
     END DO

becomes

     x = z
     t1 = 4 * x
     DO i = 1,10
          a(i) = t1 + i
     END DO

Store/Copy Optimization Module.   

Where possible, the store/copy optimization module substitutes registers
for memory locations by replacing store instructions with copy
instructions and deleting load instructions.

For example, the following FORTRAN 77 code

     INTEGER FUNCTION i
          .
          .
          .
     i = j + 23
     RETURN
     END

produces this code for the unoptimized case

     LDO     23(r26),r1
     STW     r1,-52(0,sp)
     LDW     -52(0,sp),ret0

and this code for the optimized case:

     LDO     23(r26),ret0

Unused Definition Elimination Module.   

The unused definition elimination module removes unused memory location
and register definitions.  These definitions are often a result of
transformations made by other optimization modules.

For example, the function

     INTEGER FUNCTION f(x)
     INTEGER x,a,b
     a = 1
     b = 2
     f = x * b
     RETURN
     END

becomes

     INTEGER FUNCTION f(x)
     INTEGER x,a,b
     b = 2
     f = x * b
     RETURN
     END

Optimizer Guidelines 

These guidelines will help you use the optimizer effectively and write
efficient HP FORTRAN 77 programs.

   1.  Where possible, expand procedures with fewer than five lines in
       the program or convert them to macros.  The optimizer makes better
       use of register variables if the procedures have fewer than 100
       lines.  If a loop only contains a procedure call, it is more
       efficient to put the loop in the procedure.

   2.  Make hash table sizes and field sizes of variables in powers of
       two.

   3.  Where possible, use local variables to help the optimizer promote
       variables to registers.

   4.  Where possible, construct loops so the control variable increases
       or decreases towards zero.  The code generated for a test of a
       loop termination is more efficient with a test against zero than
       for a test against some other value.

   5.  Where possible, use constants instead of variables for shift,
       multiplication, division, and remaindering.

   6.  Where possible, avoid using extensive equivalencing and memory
       mapping schemes.

HP FORTRAN 77 Optimizer Assumptions.   

During optimization, the compiler gathers information about the use of
variables and passes this information to the optimizer.  The optimizer
uses this information to ensure that every code transformation maintains
the correctness of the program (at least to the extent that the original
unoptimized program is correct).  When gathering this information, the HP
FORTRAN 77 compiler assumes that inside a routine (either a function or a
subroutine), the only variables that can be accessed directly or
indirectly or by another function call are:

   *   Common variables declared in this routine

   *   Local variables (all static variables and nonstatic variables)

   *   Parameters to this routine

In general, you do not need to be concerned about this assumption.  Good
programming practices preclude code that violates the assumption.
However, if you have code that violates the assumption, the optimizer can
change the behavior of the program in an undesired way.  In particular,
you should avoid the following coding practices to ensure correct program
execution for optimized code:

   *   Avoid referencing outside the bounds of an array.

   *   Avoid using variables that can be accessed by a process other than
       the program, such as shared common variables.  The compiler
       assumes that the program is the only process accessing its data.
       The only exception to this is if a semaphore in the form of a
       function call is used to "lock" and "unlock" access to a shared
       variable.  In this case, optimization is assumed to be correct.

OPTIMIZE Compiler Directive.   

The OPTIMIZE compiler directive gives you the ability to give information
about the program to the compiler.

The OPTIMIZE directive controls which functions are optimized and which
set of optimizations is performed.  Some directives must be placed before
the function to be optimized, while others can appear anywhere within the
function.

This is the syntax of the OPTIMIZE compiler directive: 

Syntax 

          [LEVEL1                                 ]
          [LEVEL2                                 ]
          [LEVEL2_MIN                             ]
          [LEVEL2_MAX                             ]
          [ASSUME_NO_EXTERNAL_PARMS               ]
$OPTIMIZE [ASSUME_NO_FLOATING_INVARIANT           ] [ON ]
          [ASSUME_NO_PARAMETER_OVERLAPS           ] [OFF]
          [ASSUME_NO_SHARED_COMMON_PARMS          ]
          [ASSUME_NO_SIDE_EFFECTS                 ]
          [ASSUME_PARM_TYPES_MATCHED              ]
          [LOOP_UNROLL[COPIES=n SIZE=n STATISTICS]]

ON                            Alone, specifies level 2 optimization.

                              With a preceding option, sets that option
                              off.

OFF                           Alone, specifies level 0 optimization.
                              This is the default.

                              With a preceding option, sets that option
                              off.

LEVEL1                        Specifies level 1 optimization.

LEVEL2                        Specifies level 2 optimization, with the
                              following ASSUME settings:

                                     ASSUME_NO_EXTERNAL_PARMS       ON
                                     ASSUME_NO_FLOATING_INVARIANT   ON
                                     ASSUME_NO_PARAMETER_OVERLAPS   ON
                                     ASSUME_NO_SHARED_COMMON_PARMS  ON
                                     ASSUME_NO_SIDE EFFECTS         OFF
                                     ASSUME_PARM_TYPES_MATCHED      ON
                                     LOOP_UNROLL                    ON

LEVEL2_MIN                    Specifies level 2 optimization with all the
                              ASSUME settings OFF.

LEVEL2_MAX                    Specifies level 2 optimization with all the
                              ASSUME settings ON.

ASSUME_NO_EXTERNAL_PARMS      Assumes that none of the parameters passed
                              to the current procedure are from an
                              external space, that is, different from the
                              user's own data space.  Parameters can come
                              from another space if they come from
                              operating system space or if they are in a
                              space shared by other users.

ASSUME_NO_FLOATING_INVARIANT  Assumes that no floating invariant
                              operations are executed conditionally with
                              loops.

ASSUME_NO_PARAMETER_OVERLAPS  Assumes that no actual parameters passed to
                              a procedure overlap each other.

ASSUME_NO_SHARED_COMMON_PARMS This directive should be ON when all of the
                              following are true:
                                 *   The parameter passed to the current
                                     procedure is part of a common block
                                     used by that procedure.
                                 *   The parameter is named differently
                                     than the variable name it has in the
                                     common block.
                                 *   The parameter is reassigned with the
                                     same value within the procedure.

ASSUME_NO_SIDE_EFFECTS        Assumes that the current procedure changes
                              only local variables.  It does not change
                              any variables in COMMON, nor does it change
                              parameters.

ASSUME_PARM_TYPES_MATCHED     Assumes that all of the actual parameters
                              passed were the type expected by this
                              subroutine.

LOOP_UNROLL                   Unrolls DO loops having 60 or less
                              operations four times.  For further
                              details, see "Loop Unrolling" in this
                              chapter.  The default is ON.

There are five levels of optimization:

Level 0          Does no optimizing.  This is obtained by specifying
                 $OPTIMIZE OFF.

Level 1          Optimizes only within each basic block.  This is
                 obtained by specifying $OPTIMIZE LEVEL1 ON.

Level 2 minimum  Optimizes within each procedure with no assumptions on
                 interactions of procedures.  That is, the compiler
                 assumes nothing, making this the most conservative level
                 2 optimization.  This level is obtained by specifying
                 $OPTIMIZE LEVEL2_MIN ON within each procedure.

Level 2 normal   Optimizes within each procedure with normal assumptions
                 on interactions of procedures set as described earlier.
                 In general, these settings are appropriate for most
                 FORTRAN programs.  This level is obtained by specifying
                 $OPTIMIZE LEVEL2 ON, $OPTIMIZE ON or just $OPTIMIZE
                 within each procedure.

Level 2 maximum  Optimizes within each procedure with all assumptions on
                 interactions of procedures set to OFF. This is obtained
                 by specifying $OPTIMIZE LEVEL2_MAX ON within each
                 procedure.

A basic block is a set of instructions to be executed in sequence, with
one entrance, the first instruction, and one exit, the last; the block
contains no branches.

Parameters can come from another space if they come from the operating
system or if they are in a space shared by other users.

With level two optimization, the compiler and optimizer can achieve very
sophisticated optimization.  Use the ASSUME options to provide the
information required for level two optimization.

ASSUME_NO_PARAMETER_OVERLAPS tells the compiler that the parameters
passed to the current routine never overlap each other, as in the
following code:

     subroutine a(i,j,k)
          .
          .
          .
     PROGRAM b
     CALL a(l,m,n)
     END

The ASSUME_NO_PARAMETER_OVERLAPS option should usually be set to ON.
However, for the following code

     subroutine a(i,j,k)
          .
          .
          .
     END
     PROGRAM b
     CALL a(l,l,m)
     END

the ASSUME_NO_PARAMETER_OVERLAPS option should not be set to ON because
the first two parameters passed to A are actually the same variable (that
is, the parameters overlap).

ASSUME_NO_SIDE_EFFECTS ON tells the compiler that all the procedure calls
after this option do not change any of the common variables or the
contents of the parameters being passed.  For example, in the following
code

           PROGRAM a
           COMMON c,d,e
     $OPTIMIZE ASSUME_NO_SIDE_EFFECTS ON
           CALL s1(i,j,k)
           CALL s2(l,m,n)
           END

the compiler assumes that subroutines s1 and s2 will not change the
values of parameters i, j, k, l, m, n, or common variables c, d, or e.

ASSUME_NO_PARM_TYPES_MATCHED ON tells the compiler that the type
declaration of each of the parameters in the called routine is the same
as that of the caller.  For example, in the following code

           PROGRAM a
           INTEGER i,j,k
     $OPTIMIZE ASSUME_PARM_TYPES_MATCHED ON
           CALL s1(j)
           END

           SUBROUTINE s1(j)
           INTEGER j
           j = 1
           END

the type declaration of parameter i in the called routine is integer,
matching the type declaration in the caller, PROGRAM a.

However, for the following code

     PROGRAM a
     INTEGER i,j,k
     CALL s1(i)
     END

     SUBROUTINE s1(j)
     INTEGER j(3)
     j(1) = 1
     END

the ASSUME_PARAMETERS_MATCHED option has to be set to OFF before the call
to s1 because the called subroutine s1 declares parameter j to be an
integer array, which is not the same as an integer in the caller a.
Notice that s1 is actually intended to change the contents of j(2), which
is not i but the variable following i.  However, what follows i is system
dependent.

ASSUME_NO_EXTERNAL_PARMS ON tells the compiler that none of the
parameters passed to the current procedure are from an external space.
That is, none are different from the user's own data space.  For example,
if you are accessing data in the operating system, you are accessing data
from an external space.  Shared data or shared common variables fall into
this category.  If ASSUME_NO_EXTERNAL_PARMS is OFF, the compiler is
unable to perform certain optimizations, such as array accessing
optimization.

The following options are meaningful only when the compiler is performing
level 2 optimization, that is, only if the option ON, LEVEL2, LEVEL2_MIN,
or LEVEL2_MAX has been specified:

     ASSUME_NO_PARAMETER_OVERLAPS
     ASSUME_NO_SIDE_EFFECTS
     ASSUME_PARM_TYPES_MATCHED
     ASSUME_NO_EXTERNAL_PARMS
     ASSUME_NO_FLOATING_INVARIANT
     LOOP_UNROLL

Default               Off.

Location              The following OPTIMIZE options must appear before
                      any nondirective statements in the program unit:

                           OFF
                           ON
                           LEVEL1
                           LEVEL2
                           LEVEL2_MIN
                           LEVEL2_MAX
                           ASSUME_NO_PARAMETER_OVERLAPS
                           ASSUME_NO_EXTERNAL_PARMS
                           ASSUME_NO_SHARED_COMMON_PARMS
                           ASSUME_NO_FLOATING_INVARIANT

                      These options can appear anywhere within a program
                      unit:

                           ASSUME_NO_SIDE_EFFECTS
                           ASSUME_PARM_TYPES_MATCHED
                           LOOP_UNROLL

Toggling/ Duration    The optimize options remain in effect until they
                      are changed by another OPTIMIZE directive.

Impact on             This directive can improve performance.  Loop
Performance           unrolling, which usually improves performance, can
                      occasionally degrade performance because of large
                      loops (register spilling) and code expansion
                      (crossing the page boundary causing cache misses
                      and TLB misses.)

Flagging Uninitialized Variables 

When the compiler is performing level 2 optimization, it will detect any
uninitialized non-static simple local variables.  However, it will not
detect uninitialized common variables, static variables, or variables of
character and complex type.  For example:

     $OPTIMIZE
         FUNCTION func(type)
         COMMON /a/comvar
         SAVE statvar
         REAL foo,type
         type = 10.2
         foo = comvar
         foo = statvar
         foo = typo
         RETURN
         END

The variable typo is flagged as an uninitialized variable because it was
typed incorrectly and, therefore, not initialized.  However, statvar
and comvar are not flagged because of their global and static
characteristics.  A warning message will be issued when an uninitialized
variable is detected.

Example 

     C     Start with minimum level 2 optimization.
     $OPTIMIZE LEVEL2_MIN

           PROGRAM FEQ7
           INTEGER num(10), ans, calculate
           CHARACTER*2 option(10)
     C
     C     For the next two calls, the parameter type declarations are the same in
     C     the main program and the subroutine or function.  Therefore, we can
     C     further optimize the program by setting the following optimizer option.
     C
     $OPTIMIZE ASSUME_PARM_TYPES_MATCHED ON
           call getnum_option(num,option)
     C
     C     For the next call, the function will not change the parameter value or
     C     any global variables in COMMON blocks.  Therefore, we can further
     C     optimize the program by setting the following optimizer option.
     C
     $OPTIMIZE ASSUME_NO_SIDE_EFFECTS ON
           ans= calculate(num,option)
     $OPTIMIZE ASSUME_NO_SIDE_EFFECTS OFF
           WRITE(6,*) 'Result = ',ans
           END
     C
     C     For the next subroutine, you know that the actual parameters passed
     C     to this subroutine are not overlapped with each
     C     other or from a space different from your program.
     C     Thus, you can further optimize
     C     the program by setting the following optimizer options.
     C
     $OPTIMIZE ASSUME_NO_PARAMETER_OVERLAPS ON
     $OPTIMIZE ASSUME_NO_EXTERNAL_PARMS ON
           SUBROUTINE getnum_option(value,operation)
           INTEGER value(10)
           CHARACTER*2 operation(10)
           DO 10  i = 1,10
     20    WRITE(6,*) 'Please input operation type and integer value :'
           READ(5,*) operation(i),value(i)

           IF (operation(i).EQ.' ') GOTO 30

           IF ((operation(i).NE.'**').AND.
          /    (operation(i).NE.'*' ).AND.
          /    (operation(i).NE.'/' ).AND.
          /    (operation(i).NE.'-' ).AND.
          /    (operation(i).NE.'+' )) GOTO 20
     10    CONTINUE
     30    RETURN
           END
     C
     C     For the next subroutine, you know that the actual parameters passed to
     C     this subroutine are not overlapped with each
     C     other and not from an external space.
     C     Thus, you can leave the
     C     ASSUME_NO_PARAMETER_OVERLAPS and ASSUME_NO_EXTERNAL_PARMS
     C     settings ON.
     C

           FUNCTION calculate(value,operation)
           INTEGER value(10),calculate,ans
           CHARACTER*2 operation(10)

           ans = 0
           DO 10  i = 1,10
           IF (operation(i).EQ.' ') GOTO 30

           IF (operation(i).EQ.'**') THEN
                ans = ans ** value(i)
           ELSE IF (operation(i).EQ.'*' ) THEN
                ans = ans * value(i)
           ELSE IF (operation(i).EQ.'/' ) THEN
                ans = ans / value(i)
           ELSE IF (operation(i).EQ.'-' ) THEN
                ans = ans - value(i)
           ELSE IF (operation(i).EQ.'+' ) THEN
                ans = ans + value(i)
           ENDIF
     10    CONTINUE
     30    calculate = ans
           RETURN
           END

The ASSUME_NO_FLOATING_INVARIANT option should be set to ON unless you
need to turn it off for a specific subprogram.  The following example
illustrates this option.

     C This program gets a divide by zero trap when compiled without
     C ASSUME_NO_FLOATING_INVARIANT ON specified (the default).  Because
     C b/a is an invariant floating point operation (FLOP), it is moved
     C out of the loop and executed whether the condition i.GT.10 is
     C true or false.  The ASSUME_NO_FLOATING_INVARIANT directive tells
     C the optimizer not to perform this code for FLOPs that are executed
     C conditionally.

     $OPTIMIZE ASSUME_NO_FLOATING_INVARIANT OFF
           PROGRAM test
           REAL a, b, c
           DATA c/1.0/, b/1.0/, a/0.0/
           READ *, n

           DO i=1,n
             IF (i .GT. 10) THEN
               c = b/a
             ENDIF

           c = c + i
           END DO

           PRINT *, a, b, c
           END

Loop Unrolling 

                      [ON        ]
                      [OFF       ]
$OPTIMIZE LOOP_UNROLL [COPIES = n]
                      [SIZE = n  ]
                      [STATISTICS]

ON                    Turns on loop unrolling.  ON is the default at
                      level 2.

OFF                   Turns off loop unrolling.

COPIES = n            Tells the compiler to unroll the loop n times.  The
                      default is four times.

SIZE = n              Tells the compiler to unroll the loops that have
                      less than n operations.  The default is 60
                      operations.

STATISTICS            Tells the compiler to give statistics about the
                      unrolled loops.

Limits on Use.   

DO loops at level 2 are unrolled four times by default.  If the loop
limit is either not known at compile time or is less than four times, an
extra copy of the DO loop body is generated.  This is called unrolling
the loop four or more times.

Although loop unrolling optimization usually increases performance, it
can occasionally degrade performance because of large loops (register
spilling) and code expansion (crossing the page boundary causing cache
misses and TLB misses.)  When you encounter these circumstances, you can
turn off loop unrolling locally by using the compiler directive.  Use the
compiler directive $OPTIMIZE to specify optimization level in the source
and for changing the assumptions made by the compiler.  You can use a
suboption LOOP_UNROLL to control some constraints:

     $OPTIMIZE LOOP_UNROLL

You can also use the LOOP_UNROLL suboption on the $OPTIMIZE directive to
change the DO LOOP constraints for unrolling dynamically:

   *   You can unroll a DO loop more than four times.

   *   You can force a DO loop to unroll despite its large size.

   *   You can find the reason why a DO loop is not unrolled.

The highest level of optimization must be on for LOOP_UNROLL to work.
Otherwise, LOOP_UNROLL is ignored.  If LOOP_UNROLL is ignored, but
STATISTICS has been specified, you will still get the DO loop statistics.


NOTE The number of operations reported by STATISTICS is approximate. Each assignment, arithmetic operation, and logical operation counts as an operation. Each subscript of a subscripted variable counts as a separate operation.
To unroll the loop two times instead of four times (which is the default), use $OPTIMIZE LOOP_UNROLL COPIES=2 To unroll a DO loop that is larger than the default, use $OPTIMIZE LOOP_UNROLL COPIES=2, SIZE=500 substituting an appropriate size for the digit 500. Example. C Example to illustrate the use of LOOP_UNROLL $OPTIMIZE ON PROGRAM UNROLL_EXAMPLE DIMENSION A(10), B(10,10) DIMENSION X(10,10,10), Y(10,10,10), Z(10,10,10) ... ... C The inner loop has only one statement. The loop can be unrolled C 10 times avoiding a branch and an extra copy of the loop. A straight C line code is generated for the inner loop. $OPTIMIZE LOOP_UNROLL COPIES=10 DO 20 J=1,10 DO 10 I=1,10 A(I) = A(I) + B(I,J) 10 CONTINUE 20 CONTINUE C Change COPIES back to default. $OPTIMIZE LOOP_UNROLL COPIES=4 . . . . C This DO loop has more than 60 operations. C This does not get unrolled by default. The LOOP_UNROLL option is used C to unroll it two times by increasing the SIZE to a large value. $OPTIMIZE LOOP_UNROLL COPIES=2, SIZE=200 DO 40 I=1,10 DO 30 J=1,20 V1 = X(I,J+1,K) - X(I,J-1,K) V2 = Y(I,J+1,K) - Y(I,J-1,K) V3 = Z(I,J+1,K) - Z(I,J-1,K) X(I,J,K) = X(I,J,K) + A11*V1 + A2*V2 + * A3*V3 + S*(Y(I+1,J,K)-2.0*X(I,J,K)+X(I-1,J,K)) Y(I,J,K) = Y(I,J,K) + A1*V1 + A2*V2 + * A3*V3 + S*(Y(I+1,J,K)-2.0*Y(I,J,K)+Y(I-1,J,K)) Z(I,J,K) = Z(I,J,K) + A1*V1 + A2*V2 + * A3*V3 + S*(Z(I+1,J,K)-2.0*Z(I,J,K)+Z(I-1,J,K)) 30 CONTINUE 40 CONTINUE C Change the options back to the default values. $OPTIMIZE LOOP_UNROLL COPIES=4, SIZE=60 . . . . STOP END What to Do If the Optimized Program Fails Occasionally a program works differently after optimization. If this happens: 1. Make sure that optimizer assumptions were not violated. If they were, correct the code and recompile, or recompile the code without optimization. 2. Isolate the problem code and first try optimization with level one modules. If that does not work, recompile the code without optimization. If the problem still occurs, contact the HP Software Support Center or your HP representative.


MPE/iX 5.0 Documentation