gen | ||
.gitignore | ||
benchmark-gcbd.R | ||
benchmark-revolution.R | ||
benchmark-sample.R | ||
benchmark-urbanek.R | ||
master-ctrl-slaves.sh | ||
README.md | ||
results.Rmd | ||
slave-cmds.sh |
BLAS libraries benchmarks
Andrzej Wójtowicz
Document generation date: 2016-06-03 15:36:31
Table of Contents
Configuration
R software: Microsoft R Open (3.2.4).
Libraries:
CPU (single-threaded) | CPU (multi-threaded) | GPU |
---|---|---|
Netlib (debian package, blas 1.2.20110419, lapack 3.5.0) | OpenBLAS (debian package, 0.2.12) | NVIDIA cuBLAS (NVBLAS 6.5 + Intel MKL) |
ATLAS (debian package, 3.10.2) | ATLAS (dev branch, 3.11.38) | |
GotoBLAS2 (Survive fork, 3.141) | ||
Intel MKL (part of RevoMath package, 3.2.4) | ||
BLIS (dev branch, 0.2.0+/17.05.2016) |
Hosts:
Benchmarks: Urbanek, Revolution, Gcbd.
Results per host
Intel Core i5-4590 + NVIDIA GeForce GT 430
Urbanek benchmark
2800x2800 cross-product matrix
Time in seconds - 10 runs - lower is better
Linear regr. over a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Eigenvalues of a 640x640 random matrix
Time in seconds - 10 runs - lower is better
Determinant of a 2500x2500 random matrix
Time in seconds - 10 runs - lower is better
Cholesky decomposition of a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Inverse of a 1600x1600 random matrix
Time in seconds - 10 runs - lower is better
Escoufier's method on a 45x45 matrix
Time in seconds - 10 runs - lower is better
Revolution benchmark
Matrix Multiply
Time in seconds - 10 runs - lower is better
Cholesky Factorization
Time in seconds - 10 runs - lower is better
Singular Value Deomposition
Time in seconds - 10 runs - lower is better
Principal Components Analysis
Time in seconds - 10 runs - lower is better
Linear Discriminant Analysis
Time in seconds - 10 runs - lower is better
Gcbd benchmark
Matrix Multiply
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
QR Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Singular Value Deomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Triangular Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Intel Core i5-4590 + NVIDIA GeForce GTX 750 Ti
Urbanek benchmark
2800x2800 cross-product matrix
Time in seconds - 10 runs - lower is better
Linear regr. over a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Eigenvalues of a 640x640 random matrix
Time in seconds - 10 runs - lower is better
Determinant of a 2500x2500 random matrix
Time in seconds - 10 runs - lower is better
Cholesky decomposition of a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Inverse of a 1600x1600 random matrix
Time in seconds - 10 runs - lower is better
Escoufier's method on a 45x45 matrix
Time in seconds - 10 runs - lower is better
Revolution benchmark
Matrix Multiply
Time in seconds - 10 runs - lower is better
Cholesky Factorization
Time in seconds - 10 runs - lower is better
Singular Value Deomposition
Time in seconds - 10 runs - lower is better
Principal Components Analysis
Time in seconds - 10 runs - lower is better
Linear Discriminant Analysis
Time in seconds - 10 runs - lower is better
Gcbd benchmark
Matrix Multiply
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
QR Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Singular Value Deomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Triangular Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Intel Core i5-3570
Urbanek benchmark
2800x2800 cross-product matrix
Time in seconds - 10 runs - lower is better
Linear regr. over a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Eigenvalues of a 640x640 random matrix
Time in seconds - 10 runs - lower is better
Determinant of a 2500x2500 random matrix
Time in seconds - 10 runs - lower is better
Cholesky decomposition of a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Inverse of a 1600x1600 random matrix
Time in seconds - 10 runs - lower is better
Escoufier's method on a 45x45 matrix
Time in seconds - 10 runs - lower is better
Revolution benchmark
Matrix Multiply
Time in seconds - 10 runs - lower is better
Cholesky Factorization
Time in seconds - 10 runs - lower is better
Singular Value Deomposition
Time in seconds - 10 runs - lower is better
Principal Components Analysis
Time in seconds - 10 runs - lower is better
Linear Discriminant Analysis
Time in seconds - 10 runs - lower is better
Gcbd benchmark
Matrix Multiply
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
QR Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Singular Value Deomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Triangular Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Intel Core i3-2120
Urbanek benchmark
2800x2800 cross-product matrix
Time in seconds - 10 runs - lower is better
Linear regr. over a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Eigenvalues of a 640x640 random matrix
Time in seconds - 10 runs - lower is better
Determinant of a 2500x2500 random matrix
Time in seconds - 10 runs - lower is better
Cholesky decomposition of a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Inverse of a 1600x1600 random matrix
Time in seconds - 10 runs - lower is better
Escoufier's method on a 45x45 matrix
Time in seconds - 10 runs - lower is better
Revolution benchmark
Matrix Multiply
Time in seconds - 10 runs - lower is better
Cholesky Factorization
Time in seconds - 10 runs - lower is better
Singular Value Deomposition
Time in seconds - 10 runs - lower is better
Principal Components Analysis
Time in seconds - 10 runs - lower is better
Linear Discriminant Analysis
Time in seconds - 10 runs - lower is better
Gcbd benchmark
Matrix Multiply
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
QR Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Singular Value Deomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Triangular Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Intel Core i3-3120M
Urbanek benchmark
2800x2800 cross-product matrix
Time in seconds - 10 runs - lower is better
Linear regr. over a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Eigenvalues of a 640x640 random matrix
Time in seconds - 10 runs - lower is better
Determinant of a 2500x2500 random matrix
Time in seconds - 10 runs - lower is better
Cholesky decomposition of a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Inverse of a 1600x1600 random matrix
Time in seconds - 10 runs - lower is better
Escoufier's method on a 45x45 matrix
Time in seconds - 10 runs - lower is better
Revolution benchmark
Matrix Multiply
Time in seconds - 10 runs - lower is better
Cholesky Factorization
Time in seconds - 10 runs - lower is better
Singular Value Deomposition
Time in seconds - 10 runs - lower is better
Principal Components Analysis
Time in seconds - 10 runs - lower is better
Linear Discriminant Analysis
Time in seconds - 10 runs - lower is better
Gcbd benchmark
Matrix Multiply
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
QR Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Singular Value Deomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Triangular Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Intel Core i5-3317U + NVIDIA GeForce GT 630M
Urbanek benchmark
2800x2800 cross-product matrix
Time in seconds - 10 runs - lower is better
Linear regr. over a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Eigenvalues of a 640x640 random matrix
Time in seconds - 10 runs - lower is better
Determinant of a 2500x2500 random matrix
Time in seconds - 10 runs - lower is better
Cholesky decomposition of a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Inverse of a 1600x1600 random matrix
Time in seconds - 10 runs - lower is better
Escoufier's method on a 45x45 matrix
Time in seconds - 10 runs - lower is better
Revolution benchmark
Matrix Multiply
Time in seconds - 10 runs - lower is better
Cholesky Factorization
Time in seconds - 10 runs - lower is better
Singular Value Deomposition
Time in seconds - 10 runs - lower is better
Principal Components Analysis
Time in seconds - 10 runs - lower is better
Linear Discriminant Analysis
Time in seconds - 10 runs - lower is better
Gcbd benchmark
Matrix Multiply
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
QR Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Singular Value Deomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Triangular Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Intel Pentium Dual-Core E5300
Urbanek benchmark
2800x2800 cross-product matrix
Time in seconds - 10 runs - lower is better
Linear regr. over a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Eigenvalues of a 640x640 random matrix
BLIS hangs in this test
Time in seconds - 10 runs - lower is better
Determinant of a 2500x2500 random matrix
Time in seconds - 10 runs - lower is better
Cholesky decomposition of a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Inverse of a 1600x1600 random matrix
Time in seconds - 10 runs - lower is better
Escoufier's method on a 45x45 matrix
Time in seconds - 10 runs - lower is better
Revolution benchmark
Matrix Multiply
Time in seconds - 10 runs - lower is better
Cholesky Factorization
Time in seconds - 10 runs - lower is better
Singular Value Deomposition
Time in seconds - 10 runs - lower is better
Principal Components Analysis
Time in seconds - 10 runs - lower is better
Linear Discriminant Analysis
Time in seconds - 10 runs - lower is better
Gcbd benchmark
Matrix Multiply
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
QR Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Singular Value Deomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Triangular Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Results per library
Netlib
Urbanek benchmark
2800x2800 cross-product matrix
Time in seconds - 10 runs - lower is better
Linear regr. over a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Eigenvalues of a 640x640 random matrix
Time in seconds - 10 runs - lower is better
Determinant of a 2500x2500 random matrix
Time in seconds - 10 runs - lower is better
Cholesky decomposition of a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Inverse of a 1600x1600 random matrix
Time in seconds - 10 runs - lower is better
Escoufier's method on a 45x45 matrix
Time in seconds - 10 runs - lower is better
Revolution benchmark
Matrix Multiply
Time in seconds - 10 runs - lower is better
Cholesky Factorization
Time in seconds - 10 runs - lower is better
Singular Value Deomposition
Time in seconds - 10 runs - lower is better
Principal Components Analysis
Time in seconds - 10 runs - lower is better
Linear Discriminant Analysis
Time in seconds - 10 runs - lower is better
Gcbd benchmark
Matrix Multiply
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
QR Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Singular Value Deomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Triangular Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
ATLAS (st)
Urbanek benchmark
2800x2800 cross-product matrix
Time in seconds - 10 runs - lower is better
Linear regr. over a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Eigenvalues of a 640x640 random matrix
Time in seconds - 10 runs - lower is better
Determinant of a 2500x2500 random matrix
Time in seconds - 10 runs - lower is better
Cholesky decomposition of a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Inverse of a 1600x1600 random matrix
Time in seconds - 10 runs - lower is better
Escoufier's method on a 45x45 matrix
Time in seconds - 10 runs - lower is better
Revolution benchmark
Matrix Multiply
Time in seconds - 10 runs - lower is better
Cholesky Factorization
Time in seconds - 10 runs - lower is better
Singular Value Deomposition
Time in seconds - 10 runs - lower is better
Principal Components Analysis
Time in seconds - 10 runs - lower is better
Linear Discriminant Analysis
Time in seconds - 10 runs - lower is better
Gcbd benchmark
Matrix Multiply
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
QR Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Singular Value Deomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Triangular Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
OpenBLAS
Urbanek benchmark
2800x2800 cross-product matrix
Time in seconds - 10 runs - lower is better
Linear regr. over a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Eigenvalues of a 640x640 random matrix
Time in seconds - 10 runs - lower is better
Determinant of a 2500x2500 random matrix
Time in seconds - 10 runs - lower is better
Cholesky decomposition of a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Inverse of a 1600x1600 random matrix
Time in seconds - 10 runs - lower is better
Escoufier's method on a 45x45 matrix
Time in seconds - 10 runs - lower is better
Revolution benchmark
Matrix Multiply
Time in seconds - 10 runs - lower is better
Cholesky Factorization
Time in seconds - 10 runs - lower is better
Singular Value Deomposition
Time in seconds - 10 runs - lower is better
Principal Components Analysis
Time in seconds - 10 runs - lower is better
Linear Discriminant Analysis
Time in seconds - 10 runs - lower is better
Gcbd benchmark
Matrix Multiply
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
QR Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Singular Value Deomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Triangular Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
ATLAS (mt)
Urbanek benchmark
2800x2800 cross-product matrix
Time in seconds - 10 runs - lower is better
Linear regr. over a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Eigenvalues of a 640x640 random matrix
Time in seconds - 10 runs - lower is better
Determinant of a 2500x2500 random matrix
Time in seconds - 10 runs - lower is better
Cholesky decomposition of a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Inverse of a 1600x1600 random matrix
Time in seconds - 10 runs - lower is better
Escoufier's method on a 45x45 matrix
Time in seconds - 10 runs - lower is better
Revolution benchmark
Matrix Multiply
Time in seconds - 10 runs - lower is better
Cholesky Factorization
Time in seconds - 10 runs - lower is better
Singular Value Deomposition
Time in seconds - 10 runs - lower is better
Principal Components Analysis
Time in seconds - 10 runs - lower is better
Linear Discriminant Analysis
Time in seconds - 10 runs - lower is better
Gcbd benchmark
Matrix Multiply
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
QR Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Singular Value Deomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Triangular Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
GotoBLAS2
Urbanek benchmark
2800x2800 cross-product matrix
Time in seconds - 10 runs - lower is better
Linear regr. over a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Eigenvalues of a 640x640 random matrix
Time in seconds - 10 runs - lower is better
Determinant of a 2500x2500 random matrix
Time in seconds - 10 runs - lower is better
Cholesky decomposition of a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Inverse of a 1600x1600 random matrix
Time in seconds - 10 runs - lower is better
Escoufier's method on a 45x45 matrix
Time in seconds - 10 runs - lower is better
Revolution benchmark
Matrix Multiply
Time in seconds - 10 runs - lower is better
Cholesky Factorization
Time in seconds - 10 runs - lower is better
Singular Value Deomposition
Time in seconds - 10 runs - lower is better
Principal Components Analysis
Time in seconds - 10 runs - lower is better
Linear Discriminant Analysis
Time in seconds - 10 runs - lower is better
Gcbd benchmark
Matrix Multiply
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
QR Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Singular Value Deomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Triangular Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
MKL
Urbanek benchmark
2800x2800 cross-product matrix
Time in seconds - 10 runs - lower is better
Linear regr. over a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Eigenvalues of a 640x640 random matrix
Time in seconds - 10 runs - lower is better
Determinant of a 2500x2500 random matrix
Time in seconds - 10 runs - lower is better
Cholesky decomposition of a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Inverse of a 1600x1600 random matrix
Time in seconds - 10 runs - lower is better
Escoufier's method on a 45x45 matrix
Time in seconds - 10 runs - lower is better
Revolution benchmark
Matrix Multiply
Time in seconds - 10 runs - lower is better
Cholesky Factorization
Time in seconds - 10 runs - lower is better
Singular Value Deomposition
Time in seconds - 10 runs - lower is better
Principal Components Analysis
Time in seconds - 10 runs - lower is better
Linear Discriminant Analysis
Time in seconds - 10 runs - lower is better
Gcbd benchmark
Matrix Multiply
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
QR Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Singular Value Deomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Triangular Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
BLIS
Urbanek benchmark
2800x2800 cross-product matrix
Time in seconds - 10 runs - lower is better
Linear regr. over a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Eigenvalues of a 640x640 random matrix
Intel Pentium Dual-Core E5300 hangs in this test
Time in seconds - 10 runs - lower is better
Determinant of a 2500x2500 random matrix
Time in seconds - 10 runs - lower is better
Cholesky decomposition of a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Inverse of a 1600x1600 random matrix
Time in seconds - 10 runs - lower is better
Escoufier's method on a 45x45 matrix
Time in seconds - 10 runs - lower is better
Revolution benchmark
Matrix Multiply
Time in seconds - 10 runs - lower is better
Cholesky Factorization
Time in seconds - 10 runs - lower is better
Singular Value Deomposition
Time in seconds - 10 runs - lower is better
Principal Components Analysis
Time in seconds - 10 runs - lower is better
Linear Discriminant Analysis
Time in seconds - 10 runs - lower is better
Gcbd benchmark
Matrix Multiply
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
QR Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Singular Value Deomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Triangular Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
cuBLAS
Urbanek benchmark
2800x2800 cross-product matrix
Time in seconds - 10 runs - lower is better
Linear regr. over a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Eigenvalues of a 640x640 random matrix
Time in seconds - 10 runs - lower is better
Determinant of a 2500x2500 random matrix
Time in seconds - 10 runs - lower is better
Cholesky decomposition of a 3000x3000 matrix
Time in seconds - 10 runs - lower is better
Inverse of a 1600x1600 random matrix
Time in seconds - 10 runs - lower is better
Escoufier's method on a 45x45 matrix
Time in seconds - 10 runs - lower is better
Revolution benchmark
Matrix Multiply
Time in seconds - 10 runs - lower is better
Cholesky Factorization
Time in seconds - 10 runs - lower is better
Singular Value Deomposition
Time in seconds - 10 runs - lower is better
Principal Components Analysis
Time in seconds - 10 runs - lower is better
Linear Discriminant Analysis
Time in seconds - 10 runs - lower is better
Gcbd benchmark
Matrix Multiply
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
QR Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Singular Value Deomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better
Triangular Decomposition
Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better