To ensure the best performance from a parallel program, do
not run more than one parallel program on a multiprocessor machine
at the same time. Running two or more parallel programs simultaneously
or running one parallel program on a heavily loaded system, will
slow performance.
You should run a parallel-executing program at a higher priority
than any other user program; see rtprio(1)
for information about setting real-time priorities.
Conditions Inhibiting Loop Parallelization |
 |
The following sections describe different conditions that
can inhibit parallelization.
Calling Routines with Side Effects
The compiler will not parallelize any loop containing a call
to a routine that has side effects. A routine has side effects if
it does any of the following:
Modifies an extern,
static, or global
variable.
Redefines variables that are local to the calling
routine.
Calls another subroutine or function that does any
of the above.
Indeterminate Iteration Counts
If the compiler cannot determine what the runtime loop iteration
count is before the loop executes, it does not parallelize the loop.
The reason for this limitation is that the runtime code must know
the iteration count in order to know how many iterations to distribute
to the different processors for execution.
The following conditions can prevent a runtime count:
The loop is an infinite loop.
A conditional break statement or goto
out of the loop appears in the loop.
The loop modifies either the loop-control or loop-limit
variable.
The loop is a while
construct and the condition being tested is defined within the loop.
When a loop is parallelized, the iterations are executed independently
on different processors, and the order of execution differs from
the serial order that occurs on a single processor. This effect
of parallelization is not a problem. The iterations could be executed
in any order with no effect on the results. Consider the following
loop:
for (i=0; i<5; i++) a[i] = a[i] * b[i];
|
In this example, the array a
would always end up with the same data regardless of whether the
order of execution were 0-1-2-3-4, 4-3-2-1-0, 3-1-4-0-2, or any
other order. The independence of each iteration from the others
makes the loop eligible candidate for parallelization.
Such is not the case in the following:
for (i=1; i<5; i++) a[i] = a[i-1] * b[i];
|
In this loop, the order of execution does matter. The data
used in iteration i
is dependent upon the data that was produced
in the previous iteration [i-1].
a would end up
with very different data if the order of execution were any other
than 1-2-3-4. The data dependence in this loop thus makes it ineligible
for parallelization.
Not all data dependences must inhibit parallelization. The
following paragraphs discuss some of the exceptions.
Nested Loops and Matrices
Some nested loops that operate on matrices may have a data
dependence in the inner loop only, allowing the outer loop to be
parallelized. Consider the following:
for (i=0; i<10; i++) for (j=1; j<100; j++) a[i][j] = a[i][j-1] + 1;
|
The data dependence in this nested loop occurs in the inner
[j]
loop: Each row access of a[i][j]
depends upon the preceding row [j-1]
having been assigned in the previous iteration. If the iterations
of the [j] loop
were to execute in any other order than the one in which they would
execute on a single processor, the matrix would be assigned different
values. The inner loop, therefore, must not be parallelized.
But no such data dependence appears in the outer loop: Each
column access is independent of every other column access. Consequently,
the compiler can safely distribute entire columns of the matrix
to execute on different processors; the data assignments will be
the same regardless of the order in which the columns are executed,
so long as each executes in serial order.
When analyzing a loop, the compiler errs on the safe side
and assume that what looks like a data dependence really is one
and so it does not parallelize the loop. Consider the following:
for (i=100; i<200; i++) a[i] = a[i-k];
|
The compiler assumes that a data dependence exists in this
loop because it appears that data that has been defined in a previous
iteration is being used in a later iteration. However, if the value
of k is 100,
the dependence is assumed rather than real because a[i-k]
is defined outside the loop.