Using the Optimizer [ HP FORTRAN 77/iX Programmer's Guide ] MPE/iX 5.0 Documentation
HP FORTRAN 77/iX Programmer's Guide
Using the Optimizer
The optimizer is an optional part of the compiler that modifies your
program so that machine resources are used more efficiently, using less
space and running faster. The optimizer consists of 12 modules: five
for level one optimization and seven for level two optimization.
This section describes the following:
* When and how to use the optimizer
* Level one and two optimization modules
* Optimizer assumptions
* How to write code that is easily optimized
* What to do if your optimized program fails
Introduction to the Optimizer
You can run the optimizer in one of three ways:
1. No optimization (this is the default).
2. Level one optimization: This level only performs a subset of the
available optimization modules. The transformations performed are
local to small subsections of code, and therefore are performed
quickly with little run-time storage required by the compiler.
Level one optimization should be used when some optimization is
desired, but when compile time performance is more important than
run time performance.
3. Level two optimization: This level of optimization performs all
of the available optimization modules. Transformations are
performed over the scope of each procedure. If you use this level
of optimization, the compiler uses more memory and takes longer to
process your program.
When to Use the Optimizer
Use the optimizer only on debugged code that is ready to run, because the
compiler cannot generate debug information and perform optimizations at
the same time. After level two optimizations are performed, the code is
radically reordered and variable values might not be maintained in
memory, which makes symbolic debugging impossible. Therefore, once a
program is optimized, you cannot use symbolic debugging unless you
recompile without optimization.
Invoking the Optimizer on MPE/iX
Invoke the optimizer by specifying the OPTIMIZE compiler directive in
your source file or by passing the directives through the INFO string.
For level one optimization, use the command
$OPTIMIZE LEVEL1
For level two optimization, use the command
$OPTIMIZE
or
$OPTIMIZE LEVEL2
Level One Optimization Modules
The level one optimization modules are:
* Branch optimization
* Dead code elimination
* Faster register allocation
* Instruction scheduler
* Peephole optimization
The examples in this section are shown at the source code level wherever
possible; transformations that cannot be shown at the source level are
shown in assembly language.
Table 6-2 summarizes the assembly language routines.
Table 6-2. Descriptions of Assembly Language Routines
---------------------------------------------------------------------------
| | |
| Instruction | Description |
| | |
---------------------------------------------------------------------------
| | |
| LDW offset(sr, base), | Loads a word from memory into register |
| target | target. sr is the space register (0 through |
| | 7); base is the base register (0 through |
| | 31). |
| | |
---------------------------------------------------------------------------
| | |
| ADDI const, reg, target | Adds the constant const to the contents of |
| | register reg and puts the result in register |
| | target. |
| | |
---------------------------------------------------------------------------
| | |
| LDI const, target | Loads the constant const into register |
| | target. |
| | |
---------------------------------------------------------------------------
| | |
| AND reg1, reg2, target | Performs a bitwise AND of the contents of |
| | registers reg1 and reg2 and puts the result |
| | in register target. |
| | |
---------------------------------------------------------------------------
| | |
| COMIB, cond, const, reg, | Compares the constant const to the contents |
| lab | of register reg and branches to label lab if |
| | the condition cond is true. |
| | |
---------------------------------------------------------------------------
| | |
| BB, cond, reg, num, lab | Tests the bit number num in the contents of |
| | register reg and branches to label lab if |
| | the condition cond is true. |
| | |
---------------------------------------------------------------------------
| | |
| COPY reg, target | Copies the contents of register reg to |
| | register target. |
| | |
---------------------------------------------------------------------------
| | |
| STW reg, offset(sr, | Stores the word in register reg to memory. |
| base) | sr is the space register (0 through 7); base |
| | is the base register (0 through 31). |
| | |
---------------------------------------------------------------------------
Branch Optimization Module.
The branch optimization module makes branch instruction sequences more
efficient whenever possible. Examples of possible tranformations are:
* Deleting branches whose target is the fall-through instruction
(that is, the target is two instructions away)
* When the target of a branch is an unconditional branch, changing
the target of the first branch to be the target of the second
unconditional branch
* Transforming an unconditional branch at the bottom of a loop,
which branches to a conditional branch at the top of the loop,
into a conditional branch at the bottom of the loop
* Changing an unconditional branch to the exit of a procedure into
an exit sequence where possible
* Changing conditional or unconditional branch instructions that
branch over a single instruction into a conditional nullification
in the previous instruction
* Looking for conditional branches over unconditional branches,
inverting the sense of the first branch and deleting the second
branch. These result from null THEN clauses and from THEN clauses
that only contain GOTO statements. For example, the code
IF (x) THEN
statement 1
ELSE
GOTO 100
ENDIF
statement 2
100 statement 3
becomes
IF (.NOT. x) GOTO 100
statement 1
statement 2
100 statement 3
Dead Code Elimination Module.
The dead code elimination module removes unreachable code that is never
executed.
For example, the code
if (.FALSE.) then
a=1
else
a=2
endif
becomes
a=2
Faster Register Allocation Module.
The faster register allocation module, used with unoptimized code,
analyzes register use faster than the advanced register allocator (a
level two module).
This module performs the following:
* Inserts entry and exit code
* Generates code for operations (such as multiplication and
division)
* Eliminates unnecessary copy instructions
* Allocates actual registers to the dummy registers in instructions
Instruction Scheduler Module.
The instruction scheduler module performs the following:
* Reorders the instructions in a basic block to improve memory
pipelining. (For example, where possible, a load instruction is
separated from the use of the loaded register.)
* Where possible, follows a branch instruction with an instruction
that can be executed as the branch occurs.
* Schedules floating point instructions.
For example, the code
LDW -52(0,30),r1
ADDI 3,r1,r31 ;interlock with load of r1
LDI 10,r19
becomes
LDW -52(0,sp),r1
LDI 10,r19
ADDI 3,r1,r31 ;use of r1 is now separated from load
Peephole Optimization Module
The peephole optimization module is a machine-dependent module that
makes a pass through an intermediate representation of the code
applying patterns to a small window of code looking for optimization
opportunities. The optimizations performed are:
* Changing the addressing mode of instructions so they use shorter
sequences
* Substituting sequences of instructions that access bit fields with
shorter, equivalent instructions
For example, the code
LDI 32,r3
AND r1,r3,r2
COMIB,= 0,r2,L1
becomes
BB,>= r1, 26, L1
Level Two Optimization Modules
The level two optimization modules are:
* Advanced register allocation
* Induction variables and strength reduction
* Common subexpression elimination
* Constant folding
* Loop invariant code motion
* Store/Copy optimization
* Unused definition elimination
The examples in this section are shown at the source code level wherever
possible; transformations that cannot be shown at the source level are
shown in assembly language. See Table 6-2 for a description of the
assembly language routines.
Advanced Register Allocation Module.
The advanced register allocation module performs some copy optimizations,
as well as allocating registers. Before the register allocator is run,
the instructions contain register numbers that do not correspond to
actual registers. The register allocator assigns real registers to these
instructions and removes unnecessary COPY instructions.
For example, the following code shows the type of optimization the
coloring register allocation module performs. The code
LDI 2,r104
COPY r104,r103
LDO 5(r103),r106
COPY r106,r105
LDO 10(r105),r107
becomes
LDI 2,r25
LDO 5(r25),r26
LDO 10(r26),r31
Strength Reduction.
The induction variables and strength reduction module removes linear
functions of a loop counter and replaces them with the loop counter.
Variables of the same linear function are computed only once. This
module also simplifies the function by replacing multiplication
instructions with addition instructions wherever possible.
For example, the code
DO i = 1,10
j(i) = i*k
END DO
becomes
t1 = k
DO i = 1,10
j(i) = t1
t1 = t1+k
END DO
Common Subexpression Elimination.
The common subexpression elimination module identifies expressions that
appear more than once and have the same result, computes the result, and
substitutes the result for each occurrence of the expression. The types
of subexpressions include instructions that load values from memory, as
well as arithmetic evaluation.
For example, the code
a = x + y + z
b = x + y + w
becomes
t1 = x + y
a = t1 + z
b = t1 + w
Constant Folding Module.
While the optimizer is collecting information about uses and definitions
of resources, the constant folding module replaces constant expressions
with their values.
For example, the code
a = 1
b = 2
c = a + b
becomes
a = 1
b = 2
c = 3
Loop Invariant Code Motion Module.
The loop invariant code motion module recognizes instructions inside a
loop whose results do not change and moves the instructions outside the
loop.
For example, the code
x = z
DO i = 1,10
a(i) = 4 * x + i
END DO
becomes
x = z
t1 = 4 * x
DO i = 1,10
a(i) = t1 + i
END DO
Store/Copy Optimization Module.
Where possible, the store/copy optimization module substitutes registers
for memory locations by replacing store instructions with copy
instructions and deleting load instructions.
For example, the following FORTRAN 77 code
INTEGER FUNCTION i
.
.
.
i = j + 23
RETURN
END
produces this code for the unoptimized case
LDO 23(r26),r1
STW r1,-52(0,sp)
LDW -52(0,sp),ret0
and this code for the optimized case:
LDO 23(r26),ret0
Unused Definition Elimination Module.
The unused definition elimination module removes unused memory location
and register definitions. These definitions are often a result of
transformations made by other optimization modules.
For example, the function
INTEGER FUNCTION f(x)
INTEGER x,a,b
a = 1
b = 2
f = x * b
RETURN
END
becomes
INTEGER FUNCTION f(x)
INTEGER x,a,b
b = 2
f = x * b
RETURN
END
Optimizer Guidelines
These guidelines will help you use the optimizer effectively and write
efficient HP FORTRAN 77 programs.
1. Where possible, expand procedures with fewer than five lines in
the program or convert them to macros. The optimizer makes better
use of register variables if the procedures have fewer than 100
lines. If a loop only contains a procedure call, it is more
efficient to put the loop in the procedure.
2. Make hash table sizes and field sizes of variables in powers of
two.
3. Where possible, use local variables to help the optimizer promote
variables to registers.
4. Where possible, construct loops so the control variable increases
or decreases towards zero. The code generated for a test of a
loop termination is more efficient with a test against zero than
for a test against some other value.
5. Where possible, use constants instead of variables for shift,
multiplication, division, and remaindering.
6. Where possible, avoid using extensive equivalencing and memory
mapping schemes.
HP FORTRAN 77 Optimizer Assumptions.
During optimization, the compiler gathers information about the use of
variables and passes this information to the optimizer. The optimizer
uses this information to ensure that every code transformation maintains
the correctness of the program (at least to the extent that the original
unoptimized program is correct). When gathering this information, the HP
FORTRAN 77 compiler assumes that inside a routine (either a function or a
subroutine), the only variables that can be accessed directly or
indirectly or by another function call are:
* Common variables declared in this routine
* Local variables (all static variables and nonstatic variables)
* Parameters to this routine
In general, you do not need to be concerned about this assumption. Good
programming practices preclude code that violates the assumption.
However, if you have code that violates the assumption, the optimizer can
change the behavior of the program in an undesired way. In particular,
you should avoid the following coding practices to ensure correct program
execution for optimized code:
* Avoid referencing outside the bounds of an array.
* Avoid using variables that can be accessed by a process other than
the program, such as shared common variables. The compiler
assumes that the program is the only process accessing its data.
The only exception to this is if a semaphore in the form of a
function call is used to "lock" and "unlock" access to a shared
variable. In this case, optimization is assumed to be correct.
OPTIMIZE Compiler Directive.
The OPTIMIZE compiler directive gives you the ability to give information
about the program to the compiler.
The OPTIMIZE directive controls which functions are optimized and which
set of optimizations is performed. Some directives must be placed before
the function to be optimized, while others can appear anywhere within the
function.
This is the syntax of the OPTIMIZE compiler directive:
Syntax
[LEVEL1 ]
[LEVEL2 ]
[LEVEL2_MIN ]
[LEVEL2_MAX ]
[ASSUME_NO_EXTERNAL_PARMS ]
$OPTIMIZE [ASSUME_NO_FLOATING_INVARIANT ] [ON ]
[ASSUME_NO_PARAMETER_OVERLAPS ] [OFF]
[ASSUME_NO_SHARED_COMMON_PARMS ]
[ASSUME_NO_SIDE_EFFECTS ]
[ASSUME_PARM_TYPES_MATCHED ]
[LOOP_UNROLL[COPIES=n SIZE=n STATISTICS]]
ON Alone, specifies level 2 optimization.
With a preceding option, sets that option
off.
OFF Alone, specifies level 0 optimization.
This is the default.
With a preceding option, sets that option
off.
LEVEL1 Specifies level 1 optimization.
LEVEL2 Specifies level 2 optimization, with the
following ASSUME settings:
ASSUME_NO_EXTERNAL_PARMS ON
ASSUME_NO_FLOATING_INVARIANT ON
ASSUME_NO_PARAMETER_OVERLAPS ON
ASSUME_NO_SHARED_COMMON_PARMS ON
ASSUME_NO_SIDE EFFECTS OFF
ASSUME_PARM_TYPES_MATCHED ON
LOOP_UNROLL ON
LEVEL2_MIN Specifies level 2 optimization with all the
ASSUME settings OFF.
LEVEL2_MAX Specifies level 2 optimization with all the
ASSUME settings ON.
ASSUME_NO_EXTERNAL_PARMS Assumes that none of the parameters passed
to the current procedure are from an
external space, that is, different from the
user's own data space. Parameters can come
from another space if they come from
operating system space or if they are in a
space shared by other users.
ASSUME_NO_FLOATING_INVARIANT Assumes that no floating invariant
operations are executed conditionally with
loops.
ASSUME_NO_PARAMETER_OVERLAPS Assumes that no actual parameters passed to
a procedure overlap each other.
ASSUME_NO_SHARED_COMMON_PARMS This directive should be ON when all of the
following are true:
* The parameter passed to the current
procedure is part of a common block
used by that procedure.
* The parameter is named differently
than the variable name it has in the
common block.
* The parameter is reassigned with the
same value within the procedure.
ASSUME_NO_SIDE_EFFECTS Assumes that the current procedure changes
only local variables. It does not change
any variables in COMMON, nor does it change
parameters.
ASSUME_PARM_TYPES_MATCHED Assumes that all of the actual parameters
passed were the type expected by this
subroutine.
LOOP_UNROLL Unrolls DO loops having 60 or less
operations four times. For further
details, see "Loop Unrolling" in this
chapter. The default is ON.
There are five levels of optimization:
Level 0 Does no optimizing. This is obtained by specifying
$OPTIMIZE OFF.
Level 1 Optimizes only within each basic block. This is
obtained by specifying $OPTIMIZE LEVEL1 ON.
Level 2 minimum Optimizes within each procedure with no assumptions on
interactions of procedures. That is, the compiler
assumes nothing, making this the most conservative level
2 optimization. This level is obtained by specifying
$OPTIMIZE LEVEL2_MIN ON within each procedure.
Level 2 normal Optimizes within each procedure with normal assumptions
on interactions of procedures set as described earlier.
In general, these settings are appropriate for most
FORTRAN programs. This level is obtained by specifying
$OPTIMIZE LEVEL2 ON, $OPTIMIZE ON or just $OPTIMIZE
within each procedure.
Level 2 maximum Optimizes within each procedure with all assumptions on
interactions of procedures set to OFF. This is obtained
by specifying $OPTIMIZE LEVEL2_MAX ON within each
procedure.
A basic block is a set of instructions to be executed in sequence, with
one entrance, the first instruction, and one exit, the last; the block
contains no branches.
Parameters can come from another space if they come from the operating
system or if they are in a space shared by other users.
With level two optimization, the compiler and optimizer can achieve very
sophisticated optimization. Use the ASSUME options to provide the
information required for level two optimization.
ASSUME_NO_PARAMETER_OVERLAPS tells the compiler that the parameters
passed to the current routine never overlap each other, as in the
following code:
subroutine a(i,j,k)
.
.
.
PROGRAM b
CALL a(l,m,n)
END
The ASSUME_NO_PARAMETER_OVERLAPS option should usually be set to ON.
However, for the following code
subroutine a(i,j,k)
.
.
.
END
PROGRAM b
CALL a(l,l,m)
END
the ASSUME_NO_PARAMETER_OVERLAPS option should not be set to ON because
the first two parameters passed to A are actually the same variable (that
is, the parameters overlap).
ASSUME_NO_SIDE_EFFECTS ON tells the compiler that all the procedure calls
after this option do not change any of the common variables or the
contents of the parameters being passed. For example, in the following
code
PROGRAM a
COMMON c,d,e
$OPTIMIZE ASSUME_NO_SIDE_EFFECTS ON
CALL s1(i,j,k)
CALL s2(l,m,n)
END
the compiler assumes that subroutines s1 and s2 will not change the
values of parameters i, j, k, l, m, n, or common variables c, d, or e.
ASSUME_NO_PARM_TYPES_MATCHED ON tells the compiler that the type
declaration of each of the parameters in the called routine is the same
as that of the caller. For example, in the following code
PROGRAM a
INTEGER i,j,k
$OPTIMIZE ASSUME_PARM_TYPES_MATCHED ON
CALL s1(j)
END
SUBROUTINE s1(j)
INTEGER j
j = 1
END
the type declaration of parameter i in the called routine is integer,
matching the type declaration in the caller, PROGRAM a.
However, for the following code
PROGRAM a
INTEGER i,j,k
CALL s1(i)
END
SUBROUTINE s1(j)
INTEGER j(3)
j(1) = 1
END
the ASSUME_PARAMETERS_MATCHED option has to be set to OFF before the call
to s1 because the called subroutine s1 declares parameter j to be an
integer array, which is not the same as an integer in the caller a.
Notice that s1 is actually intended to change the contents of j(2), which
is not i but the variable following i. However, what follows i is system
dependent.
ASSUME_NO_EXTERNAL_PARMS ON tells the compiler that none of the
parameters passed to the current procedure are from an external space.
That is, none are different from the user's own data space. For example,
if you are accessing data in the operating system, you are accessing data
from an external space. Shared data or shared common variables fall into
this category. If ASSUME_NO_EXTERNAL_PARMS is OFF, the compiler is
unable to perform certain optimizations, such as array accessing
optimization.
The following options are meaningful only when the compiler is performing
level 2 optimization, that is, only if the option ON, LEVEL2, LEVEL2_MIN,
or LEVEL2_MAX has been specified:
ASSUME_NO_PARAMETER_OVERLAPS
ASSUME_NO_SIDE_EFFECTS
ASSUME_PARM_TYPES_MATCHED
ASSUME_NO_EXTERNAL_PARMS
ASSUME_NO_FLOATING_INVARIANT
LOOP_UNROLL
Default Off.
Location The following OPTIMIZE options must appear before
any nondirective statements in the program unit:
OFF
ON
LEVEL1
LEVEL2
LEVEL2_MIN
LEVEL2_MAX
ASSUME_NO_PARAMETER_OVERLAPS
ASSUME_NO_EXTERNAL_PARMS
ASSUME_NO_SHARED_COMMON_PARMS
ASSUME_NO_FLOATING_INVARIANT
These options can appear anywhere within a program
unit:
ASSUME_NO_SIDE_EFFECTS
ASSUME_PARM_TYPES_MATCHED
LOOP_UNROLL
Toggling/ Duration The optimize options remain in effect until they
are changed by another OPTIMIZE directive.
Impact on This directive can improve performance. Loop
Performance unrolling, which usually improves performance, can
occasionally degrade performance because of large
loops (register spilling) and code expansion
(crossing the page boundary causing cache misses
and TLB misses.)
Flagging Uninitialized Variables
When the compiler is performing level 2 optimization, it will detect any
uninitialized non-static simple local variables. However, it will not
detect uninitialized common variables, static variables, or variables of
character and complex type. For example:
$OPTIMIZE
FUNCTION func(type)
COMMON /a/comvar
SAVE statvar
REAL foo,type
type = 10.2
foo = comvar
foo = statvar
foo = typo
RETURN
END
The variable typo is flagged as an uninitialized variable because it was
typed incorrectly and, therefore, not initialized. However, statvar
and comvar are not flagged because of their global and static
characteristics. A warning message will be issued when an uninitialized
variable is detected.
Example
C Start with minimum level 2 optimization.
$OPTIMIZE LEVEL2_MIN
PROGRAM FEQ7
INTEGER num(10), ans, calculate
CHARACTER*2 option(10)
C
C For the next two calls, the parameter type declarations are the same in
C the main program and the subroutine or function. Therefore, we can
C further optimize the program by setting the following optimizer option.
C
$OPTIMIZE ASSUME_PARM_TYPES_MATCHED ON
call getnum_option(num,option)
C
C For the next call, the function will not change the parameter value or
C any global variables in COMMON blocks. Therefore, we can further
C optimize the program by setting the following optimizer option.
C
$OPTIMIZE ASSUME_NO_SIDE_EFFECTS ON
ans= calculate(num,option)
$OPTIMIZE ASSUME_NO_SIDE_EFFECTS OFF
WRITE(6,*) 'Result = ',ans
END
C
C For the next subroutine, you know that the actual parameters passed
C to this subroutine are not overlapped with each
C other or from a space different from your program.
C Thus, you can further optimize
C the program by setting the following optimizer options.
C
$OPTIMIZE ASSUME_NO_PARAMETER_OVERLAPS ON
$OPTIMIZE ASSUME_NO_EXTERNAL_PARMS ON
SUBROUTINE getnum_option(value,operation)
INTEGER value(10)
CHARACTER*2 operation(10)
DO 10 i = 1,10
20 WRITE(6,*) 'Please input operation type and integer value :'
READ(5,*) operation(i),value(i)
IF (operation(i).EQ.' ') GOTO 30
IF ((operation(i).NE.'**').AND.
/ (operation(i).NE.'*' ).AND.
/ (operation(i).NE.'/' ).AND.
/ (operation(i).NE.'-' ).AND.
/ (operation(i).NE.'+' )) GOTO 20
10 CONTINUE
30 RETURN
END
C
C For the next subroutine, you know that the actual parameters passed to
C this subroutine are not overlapped with each
C other and not from an external space.
C Thus, you can leave the
C ASSUME_NO_PARAMETER_OVERLAPS and ASSUME_NO_EXTERNAL_PARMS
C settings ON.
C
FUNCTION calculate(value,operation)
INTEGER value(10),calculate,ans
CHARACTER*2 operation(10)
ans = 0
DO 10 i = 1,10
IF (operation(i).EQ.' ') GOTO 30
IF (operation(i).EQ.'**') THEN
ans = ans ** value(i)
ELSE IF (operation(i).EQ.'*' ) THEN
ans = ans * value(i)
ELSE IF (operation(i).EQ.'/' ) THEN
ans = ans / value(i)
ELSE IF (operation(i).EQ.'-' ) THEN
ans = ans - value(i)
ELSE IF (operation(i).EQ.'+' ) THEN
ans = ans + value(i)
ENDIF
10 CONTINUE
30 calculate = ans
RETURN
END
The ASSUME_NO_FLOATING_INVARIANT option should be set to ON unless you
need to turn it off for a specific subprogram. The following example
illustrates this option.
C This program gets a divide by zero trap when compiled without
C ASSUME_NO_FLOATING_INVARIANT ON specified (the default). Because
C b/a is an invariant floating point operation (FLOP), it is moved
C out of the loop and executed whether the condition i.GT.10 is
C true or false. The ASSUME_NO_FLOATING_INVARIANT directive tells
C the optimizer not to perform this code for FLOPs that are executed
C conditionally.
$OPTIMIZE ASSUME_NO_FLOATING_INVARIANT OFF
PROGRAM test
REAL a, b, c
DATA c/1.0/, b/1.0/, a/0.0/
READ *, n
DO i=1,n
IF (i .GT. 10) THEN
c = b/a
ENDIF
c = c + i
END DO
PRINT *, a, b, c
END
Loop Unrolling
[ON ]
[OFF ]
$OPTIMIZE LOOP_UNROLL [COPIES = n]
[SIZE = n ]
[STATISTICS]
ON Turns on loop unrolling. ON is the default at
level 2.
OFF Turns off loop unrolling.
COPIES = n Tells the compiler to unroll the loop n times. The
default is four times.
SIZE = n Tells the compiler to unroll the loops that have
less than n operations. The default is 60
operations.
STATISTICS Tells the compiler to give statistics about the
unrolled loops.
Limits on Use.
DO loops at level 2 are unrolled four times by default. If the loop
limit is either not known at compile time or is less than four times, an
extra copy of the DO loop body is generated. This is called unrolling
the loop four or more times.
Although loop unrolling optimization usually increases performance, it
can occasionally degrade performance because of large loops (register
spilling) and code expansion (crossing the page boundary causing cache
misses and TLB misses.) When you encounter these circumstances, you can
turn off loop unrolling locally by using the compiler directive. Use the
compiler directive $OPTIMIZE to specify optimization level in the source
and for changing the assumptions made by the compiler. You can use a
suboption LOOP_UNROLL to control some constraints:
$OPTIMIZE LOOP_UNROLL
You can also use the LOOP_UNROLL suboption on the $OPTIMIZE directive to
change the DO LOOP constraints for unrolling dynamically:
* You can unroll a DO loop more than four times.
* You can force a DO loop to unroll despite its large size.
* You can find the reason why a DO loop is not unrolled.
The highest level of optimization must be on for LOOP_UNROLL to work.
Otherwise, LOOP_UNROLL is ignored. If LOOP_UNROLL is ignored, but
STATISTICS has been specified, you will still get the DO loop statistics.
NOTE The number of operations reported by STATISTICS is approximate.
Each assignment, arithmetic operation, and logical operation counts
as an operation. Each subscript of a subscripted variable counts
as a separate operation.
To unroll the loop two times instead of four times (which is the
default), use
$OPTIMIZE LOOP_UNROLL COPIES=2
To unroll a DO loop that is larger than the default, use
$OPTIMIZE LOOP_UNROLL COPIES=2, SIZE=500
substituting an appropriate size for the digit 500.
Example.
C Example to illustrate the use of LOOP_UNROLL
$OPTIMIZE ON
PROGRAM UNROLL_EXAMPLE
DIMENSION A(10), B(10,10)
DIMENSION X(10,10,10), Y(10,10,10), Z(10,10,10)
...
...
C The inner loop has only one statement. The loop can be unrolled
C 10 times avoiding a branch and an extra copy of the loop. A straight
C line code is generated for the inner loop.
$OPTIMIZE LOOP_UNROLL COPIES=10
DO 20 J=1,10
DO 10 I=1,10
A(I) = A(I) + B(I,J)
10 CONTINUE
20 CONTINUE
C Change COPIES back to default.
$OPTIMIZE LOOP_UNROLL COPIES=4
. .
. .
C This DO loop has more than 60 operations.
C This does not get unrolled by default. The LOOP_UNROLL option is used
C to unroll it two times by increasing the SIZE to a large value.
$OPTIMIZE LOOP_UNROLL COPIES=2, SIZE=200
DO 40 I=1,10
DO 30 J=1,20
V1 = X(I,J+1,K) - X(I,J-1,K)
V2 = Y(I,J+1,K) - Y(I,J-1,K)
V3 = Z(I,J+1,K) - Z(I,J-1,K)
X(I,J,K) = X(I,J,K) + A11*V1 + A2*V2 +
* A3*V3 + S*(Y(I+1,J,K)-2.0*X(I,J,K)+X(I-1,J,K))
Y(I,J,K) = Y(I,J,K) + A1*V1 + A2*V2 +
* A3*V3 + S*(Y(I+1,J,K)-2.0*Y(I,J,K)+Y(I-1,J,K))
Z(I,J,K) = Z(I,J,K) + A1*V1 + A2*V2 +
* A3*V3 + S*(Z(I+1,J,K)-2.0*Z(I,J,K)+Z(I-1,J,K))
30 CONTINUE
40 CONTINUE
C Change the options back to the default values.
$OPTIMIZE LOOP_UNROLL COPIES=4, SIZE=60
. .
. .
STOP
END
What to Do If the Optimized Program Fails
Occasionally a program works differently after optimization. If this
happens:
1. Make sure that optimizer assumptions were not violated. If they
were, correct the code and recompile, or recompile the code
without optimization.
2. Isolate the problem code and first try optimization with level one
modules. If that does not work, recompile the code without
optimization.
If the problem still occurs, contact the HP Software Support Center or
your HP representative.
MPE/iX 5.0 Documentation