|
PoP -
Syracuse University Physics Computational Cluster
PoP - Performance
Single node
Alan Middleton has timed several of his simulation codes on a range
of Intel and Alpha machines running Linux, including the PoP nodes.
Have a look at his
comparison table.
Whetstone
For what it is worth, I ran the Whetstone benchmark on a pop node
(2 300MHz PII, fort77 v1.14a (invokes f2c), gcc v 2.7.2.3, 10 outer
loops, 10000 inner loops, double precision version) and got:
No optimization flag (equiv -O0)
Single job ~119Mwhet/s
Two jobs ~119Mwhet/s each -> ~238Mwhet/s total
With optimization (-O3)
Single job ~177Mwhet/s
Two jobs ~177Mwhet/s each -> ~354Mwhet/s total
This benchmark has a small loop of CPU bound math functions and so
scales very well and overall perfomance is fine even when Linux has
to deal with more jobs than processors. Optimization makes a big
difference!
There is a Mac biased comparison table
available which shows some very strangely low performance figures for Intel
processors but quotes 149Mwhet/s for a G3 processor @ 317MHz, and 169Mwhet/s
for a 604e processor at 350MHz. They don't say anything about the compiler.
Also found a
C version of the whetstone benchmark which has two of
the 12 original sections removed. Under gcc v 2.7.2.3 I got:
With optimization (-O3)
Single job ~227Mwhet/s
Two jobs ~227Mwhet/s each -> ~454Mwhet/s total
This is, however, a slighly different code so we can't use the numbers to
compare with the Fortran numbers.
Linpack
After some messing with the timing I compiled the C version of the
single processor linpack benchmark
(C translation Bonnie Toy 5/88, bug fix Jack Dongarra 25/2/94).
Using gcc 2.7.2.3, with -O4 I got:
- Double precision, rolled ~25Mflops
- Double precision, unrolled ~28Mflops
- Single precision, rolled ~62Mflops
- Single precision, unrolled ~85Mflops
These numbers are a little suspect since the timing seems very inaccurate
(upto 20% variation between runs). However, they give an approximation.
Overall
Watch this space for parallel benchmarks...
Aggregates
From whetstone at 227Mwhet/s/proc with 32 processors we have ~7.2Gwhet/s.
From linpack at 85Mflops/proc with 32 processors we have ~2.7Gflops.
Written by
Simeon Warner, maintained by
Dan Kirkpatrick
Last updated 06 August 2010
|