1
0
mirror of https://github.com/andre-wojtowicz/blas-benchmarks synced 2024-11-03 20:50:27 +01:00
blas-benchmarks/README.md

30 KiB

BLAS libraries benchmarks

Andrzej Wójtowicz

Document generation date: 2016-05-26 19:23:05

Table of Contents

  1. Configuration
  2. Results per host
  3. Results per library

Configuration

R software: Microsoft R Open.

Libraries:

CPU (single-threaded) CPU (multi-threaded) GPU
Netlib (debian package) OpenBLAS (debian package) NVIDIA cuBLAS (NVBLAS + Intel MKL)
ATLAS (debian package) ATLAS (dev branch)
GotoBLAS2 (Survive fork)
Intel MKL (part of Microsoft R Open)
BLIS

Hosts:

No. CPU GPU
1. Intel Core i5-4590 NVIDIA GeForce GT 430
2. Intel Core i5-3570 -
3. Intel Core i3-2120 -
4. Intel Core i3-3120M -

Benchmarks: Urbanek, Revolution, Gcbd.

Results per host

Intel Core i5-4590 + NVIDIA GeForce GT 430

Urbanek benchmark

2800x2800 cross-product matrix

Time in seconds - 10 runs - lower is better

Linear regr. over a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Eigenvalues of a 640x640 random matrix

Time in seconds - 10 runs - lower is better

Determinant of a 2500x2500 random matrix

Time in seconds - 10 runs - lower is better

Cholesky decomposition of a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Inverse of a 1600x1600 random matrix

Time in seconds - 10 runs - lower is better

Escoufier's method on a 45x45 matrix

Time in seconds - 10 runs - lower is better

Revolution benchmark

Matrix Multiply

Time in seconds - 10 runs - lower is better

Cholesky Factorization

Time in seconds - 10 runs - lower is better

Singular Value Deomposition

Time in seconds - 10 runs - lower is better

Principal Components Analysis

Time in seconds - 10 runs - lower is better

Linear Discriminant Analysis

Time in seconds - 10 runs - lower is better

Gcbd benchmark

Matrix Multiply

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

QR Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Singular Value Deomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Triangular Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Intel Core i5-3570

Urbanek benchmark

2800x2800 cross-product matrix

Time in seconds - 10 runs - lower is better

Linear regr. over a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Eigenvalues of a 640x640 random matrix

Time in seconds - 10 runs - lower is better

Determinant of a 2500x2500 random matrix

Time in seconds - 10 runs - lower is better

Cholesky decomposition of a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Inverse of a 1600x1600 random matrix

Time in seconds - 10 runs - lower is better

Escoufier's method on a 45x45 matrix

Time in seconds - 10 runs - lower is better

Revolution benchmark

Matrix Multiply

Time in seconds - 10 runs - lower is better

Cholesky Factorization

Time in seconds - 10 runs - lower is better

Singular Value Deomposition

Time in seconds - 10 runs - lower is better

Principal Components Analysis

Time in seconds - 10 runs - lower is better

Linear Discriminant Analysis

Time in seconds - 10 runs - lower is better

Gcbd benchmark

Matrix Multiply

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

QR Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Singular Value Deomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Triangular Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Intel Core i3-2120

Urbanek benchmark

2800x2800 cross-product matrix

Time in seconds - 10 runs - lower is better

Linear regr. over a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Eigenvalues of a 640x640 random matrix

Time in seconds - 10 runs - lower is better

Determinant of a 2500x2500 random matrix

Time in seconds - 10 runs - lower is better

Cholesky decomposition of a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Inverse of a 1600x1600 random matrix

Time in seconds - 10 runs - lower is better

Escoufier's method on a 45x45 matrix

Time in seconds - 10 runs - lower is better

Revolution benchmark

Matrix Multiply

Time in seconds - 10 runs - lower is better

Cholesky Factorization

Time in seconds - 10 runs - lower is better

Singular Value Deomposition

Time in seconds - 10 runs - lower is better

Principal Components Analysis

Time in seconds - 10 runs - lower is better

Linear Discriminant Analysis

Time in seconds - 10 runs - lower is better

Gcbd benchmark

Matrix Multiply

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

QR Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Singular Value Deomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Triangular Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Intel Core i3-3120M

Urbanek benchmark

2800x2800 cross-product matrix

Time in seconds - 10 runs - lower is better

Linear regr. over a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Eigenvalues of a 640x640 random matrix

Time in seconds - 10 runs - lower is better

Determinant of a 2500x2500 random matrix

Time in seconds - 10 runs - lower is better

Cholesky decomposition of a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Inverse of a 1600x1600 random matrix

Time in seconds - 10 runs - lower is better

Escoufier's method on a 45x45 matrix

Time in seconds - 10 runs - lower is better

Revolution benchmark

Matrix Multiply

Time in seconds - 10 runs - lower is better

Cholesky Factorization

Time in seconds - 10 runs - lower is better

Singular Value Deomposition

Time in seconds - 10 runs - lower is better

Principal Components Analysis

Time in seconds - 10 runs - lower is better

Linear Discriminant Analysis

Time in seconds - 10 runs - lower is better

Gcbd benchmark

Matrix Multiply

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

QR Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Singular Value Deomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Triangular Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Results per library

Netlib

Urbanek benchmark

2800x2800 cross-product matrix

Time in seconds - 10 runs - lower is better

Linear regr. over a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Eigenvalues of a 640x640 random matrix

Time in seconds - 10 runs - lower is better

Determinant of a 2500x2500 random matrix

Time in seconds - 10 runs - lower is better

Cholesky decomposition of a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Inverse of a 1600x1600 random matrix

Time in seconds - 10 runs - lower is better

Escoufier's method on a 45x45 matrix

Time in seconds - 10 runs - lower is better

Revolution benchmark

Matrix Multiply

Time in seconds - 10 runs - lower is better

Cholesky Factorization

Time in seconds - 10 runs - lower is better

Singular Value Deomposition

Time in seconds - 10 runs - lower is better

Principal Components Analysis

Time in seconds - 10 runs - lower is better

Linear Discriminant Analysis

Time in seconds - 10 runs - lower is better

Gcbd benchmark

Matrix Multiply

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

QR Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Singular Value Deomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Triangular Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

ATLAS (st)

Urbanek benchmark

2800x2800 cross-product matrix

Time in seconds - 10 runs - lower is better

Linear regr. over a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Eigenvalues of a 640x640 random matrix

Time in seconds - 10 runs - lower is better

Determinant of a 2500x2500 random matrix

Time in seconds - 10 runs - lower is better

Cholesky decomposition of a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Inverse of a 1600x1600 random matrix

Time in seconds - 10 runs - lower is better

Escoufier's method on a 45x45 matrix

Time in seconds - 10 runs - lower is better

Revolution benchmark

Matrix Multiply

Time in seconds - 10 runs - lower is better

Cholesky Factorization

Time in seconds - 10 runs - lower is better

Singular Value Deomposition

Time in seconds - 10 runs - lower is better

Principal Components Analysis

Time in seconds - 10 runs - lower is better

Linear Discriminant Analysis

Time in seconds - 10 runs - lower is better

Gcbd benchmark

Matrix Multiply

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

QR Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Singular Value Deomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Triangular Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

OpenBLAS

Urbanek benchmark

2800x2800 cross-product matrix

Time in seconds - 10 runs - lower is better

Linear regr. over a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Eigenvalues of a 640x640 random matrix

Time in seconds - 10 runs - lower is better

Determinant of a 2500x2500 random matrix

Time in seconds - 10 runs - lower is better

Cholesky decomposition of a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Inverse of a 1600x1600 random matrix

Time in seconds - 10 runs - lower is better

Escoufier's method on a 45x45 matrix

Time in seconds - 10 runs - lower is better

Revolution benchmark

Matrix Multiply

Time in seconds - 10 runs - lower is better

Cholesky Factorization

Time in seconds - 10 runs - lower is better

Singular Value Deomposition

Time in seconds - 10 runs - lower is better

Principal Components Analysis

Time in seconds - 10 runs - lower is better

Linear Discriminant Analysis

Time in seconds - 10 runs - lower is better

Gcbd benchmark

Matrix Multiply

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

QR Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Singular Value Deomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Triangular Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

ATLAS (mt)

Urbanek benchmark

2800x2800 cross-product matrix

Time in seconds - 10 runs - lower is better

Linear regr. over a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Eigenvalues of a 640x640 random matrix

Time in seconds - 10 runs - lower is better

Determinant of a 2500x2500 random matrix

Time in seconds - 10 runs - lower is better

Cholesky decomposition of a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Inverse of a 1600x1600 random matrix

Time in seconds - 10 runs - lower is better

Escoufier's method on a 45x45 matrix

Time in seconds - 10 runs - lower is better

Revolution benchmark

Matrix Multiply

Time in seconds - 10 runs - lower is better

Cholesky Factorization

Time in seconds - 10 runs - lower is better

Singular Value Deomposition

Time in seconds - 10 runs - lower is better

Principal Components Analysis

Time in seconds - 10 runs - lower is better

Linear Discriminant Analysis

Time in seconds - 10 runs - lower is better

Gcbd benchmark

Matrix Multiply

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

QR Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Singular Value Deomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Triangular Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

GotoBLAS2

Urbanek benchmark

2800x2800 cross-product matrix

Time in seconds - 10 runs - lower is better

Linear regr. over a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Eigenvalues of a 640x640 random matrix

Time in seconds - 10 runs - lower is better

Determinant of a 2500x2500 random matrix

Time in seconds - 10 runs - lower is better

Cholesky decomposition of a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Inverse of a 1600x1600 random matrix

Time in seconds - 10 runs - lower is better

Escoufier's method on a 45x45 matrix

Time in seconds - 10 runs - lower is better

Revolution benchmark

Matrix Multiply

Time in seconds - 10 runs - lower is better

Cholesky Factorization

Time in seconds - 10 runs - lower is better

Singular Value Deomposition

Time in seconds - 10 runs - lower is better

Principal Components Analysis

Time in seconds - 10 runs - lower is better

Linear Discriminant Analysis

Time in seconds - 10 runs - lower is better

Gcbd benchmark

Matrix Multiply

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

QR Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Singular Value Deomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Triangular Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

MKL

Urbanek benchmark

2800x2800 cross-product matrix

Time in seconds - 10 runs - lower is better

Linear regr. over a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Eigenvalues of a 640x640 random matrix

Time in seconds - 10 runs - lower is better

Determinant of a 2500x2500 random matrix

Time in seconds - 10 runs - lower is better

Cholesky decomposition of a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Inverse of a 1600x1600 random matrix

Time in seconds - 10 runs - lower is better

Escoufier's method on a 45x45 matrix

Time in seconds - 10 runs - lower is better

Revolution benchmark

Matrix Multiply

Time in seconds - 10 runs - lower is better

Cholesky Factorization

Time in seconds - 10 runs - lower is better

Singular Value Deomposition

Time in seconds - 10 runs - lower is better

Principal Components Analysis

Time in seconds - 10 runs - lower is better

Linear Discriminant Analysis

Time in seconds - 10 runs - lower is better

Gcbd benchmark

Matrix Multiply

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

QR Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Singular Value Deomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Triangular Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

BLIS

Urbanek benchmark

2800x2800 cross-product matrix

Time in seconds - 10 runs - lower is better

Linear regr. over a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Eigenvalues of a 640x640 random matrix

Time in seconds - 10 runs - lower is better

Determinant of a 2500x2500 random matrix

Time in seconds - 10 runs - lower is better

Cholesky decomposition of a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Inverse of a 1600x1600 random matrix

Time in seconds - 10 runs - lower is better

Escoufier's method on a 45x45 matrix

Time in seconds - 10 runs - lower is better

Revolution benchmark

Matrix Multiply

Time in seconds - 10 runs - lower is better

Cholesky Factorization

Time in seconds - 10 runs - lower is better

Singular Value Deomposition

Time in seconds - 10 runs - lower is better

Principal Components Analysis

Time in seconds - 10 runs - lower is better

Linear Discriminant Analysis

Time in seconds - 10 runs - lower is better

Gcbd benchmark

Matrix Multiply

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

QR Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Singular Value Deomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Triangular Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

cuBLAS

Urbanek benchmark

2800x2800 cross-product matrix

Time in seconds - 10 runs - lower is better

Linear regr. over a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Eigenvalues of a 640x640 random matrix

Time in seconds - 10 runs - lower is better

Determinant of a 2500x2500 random matrix

Time in seconds - 10 runs - lower is better

Cholesky decomposition of a 3000x3000 matrix

Time in seconds - 10 runs - lower is better

Inverse of a 1600x1600 random matrix

Time in seconds - 10 runs - lower is better

Escoufier's method on a 45x45 matrix

Time in seconds - 10 runs - lower is better

Revolution benchmark

Matrix Multiply

Time in seconds - 10 runs - lower is better

Cholesky Factorization

Time in seconds - 10 runs - lower is better

Singular Value Deomposition

Time in seconds - 10 runs - lower is better

Principal Components Analysis

Time in seconds - 10 runs - lower is better

Linear Discriminant Analysis

Time in seconds - 10 runs - lower is better

Gcbd benchmark

Matrix Multiply

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

QR Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Singular Value Deomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better

Triangular Decomposition

Time in seconds regarding matrix size - right panel on log scale - from 50 to 5 runs - lower is better