LAPACK Benchmark

This section contains performance numbers for selected LAPACK driver routines. These routines provide complete solutions for the most common problems of numerical linear algebra, and are the routines users are most likely to call:

Solve an n-by-n system of linear equations with 1 right hand side using SGESV/DGESV.
Solve an n-by-n linear least squares problem with 1 right hand side using SGELS/DGELS.
Find only the eigenvalues of an n-by-n nonsymmetric matrix using SGEEV/DGEEV.
Find the eigenvalues and right eigenvectors of an n-by-n nonsymmetric matrix using SGEEV/DGEEV.
Find only the eigenvalues of an n-by-n symmetric matrix using SSYEVD/DSYEVD.
Find the eigenvalues and eigenvectors of an n-by-n symmetric matrix using
SSYEVD/DSYEVD.
Find only the singular values of an n-by-n matrix using SGESVD/DGESVD.
Find the singular values and right and left singular vectors of an n-by-n matrix using
SGESVD/DGESVD.

Data is provided for a variety of vector computers, shared memory parallel computers, and high performance workstations. All timings were obtained by using the machine-specific optimized BLAS available on each machine. For the IBM RISC Sys/6000-550 and IBM POWER2 model 590, the ESSL BLAS were used. In all cases the data consisted of 64-bit floating point numbers (single precision on the CRAY C90 and double precision on the other machines). For each machine and each driver, a small problem (N = 100 with LDA = 101) and a large problem (N = 1000 with LDA = 1001) were run. Block sizes NB = 1, 16, 32 and 64 were tried, with data only for the fastest run reported in the tables below. Similarly, UPLO = 'L' and UPLO = 'U' were timed for SSYEVD/DSYEVD, but only times for UPLO = 'U' were reported. For SGEEV/DGEEV, ILO = 1 and IHI = N. The test matrices were generated with randomly distributed entries. All run times are reported in seconds, and block size is denoted by nb. The value of nb was chosen to make N = 1000 optimal. It is not necessarily the best choice for N = 100. See Section 6.2 for details.

The performance data is reported using three or four statistics. First, the run-time in seconds is given. The second statistic measures how well our performance compares to the speed of the BLAS, specifically SGEMM/DGEMM. This ``equivalent matrix multiplies'' statistic is calculated as

and labeled as in the tables. The performance information for the BLAS routines
SGEMV/DGEMV (TRANS='N') and SGEMM/DGEMM (TRANSA='N', TRANSB='N') is provided in Table 3.8, along with the clock speed for each machine in Table 3.2. The third statistic is the true megaflop rating. For the eigenvalue and singular value drivers, a fourth ``synthetic megaflop'' statistic is also presented. We provide this statistic because the number of floating point operations needed to find eigenvalues and singular values depends on the input data, unlike linear equation solving or linear least squares solving with SGELS/DGELS. The synthetic megaflop rating is defined to be the ``standard'' number of flops required to solve the problem, divided by the run-time in microseconds. This ``standard'' number of flops is taken to be the average for a standard algorithm over a variety of problems, as given in Table 3.9 (we ignore terms of order ) [45].

Table 3.8: Execution time and Megaflop rates for SGEMV/DGEMV and SGEMM/DGEMM

Note that the synthetic megaflop rating is much higher than the true megaflop rating for
SSYEVD/DSYEVD in Table 3.15; this is because SSYEVD/DSYEVD performs many fewer floating point operations than the standard algorithm, SSYEV/DSYEV.

Table 3.9: ``Standard'' floating point operation counts for LAPACK drivers for n-by-n matrices