Memory-efficient self-cross-product for large matrices using R and Python

Mohammad Ali Nilforooshan

Abstract


In many quantitative studies, calculations of matrix self-cross-products (B’B) are needed, where B is any matrix of interest. For matrix B with many number of rows, there might be memory limitations in storing B’B. Also, calculating B’B has a computational complexity of m 2n, for B with n and m number of rows and columns, respectively. Because B’B is symmetric, almost half of the calculations and the memory usage are redundant. The half-matrix multiplication algorithm (HMMA) was introduced, which creates B’B upper-triangular. Matrix multiplication functions %*% and crossprod in R, numpy.dot in Python, and user-defined HMMA functions hmma r and hmma py in R and Python were compared, for matrix B with 40,000 real numbers, and various dimensions. Runtime of B’B calculation was less than a second when B had more than 4 rows. The longest runtime was for B with 1 row and crossprod (21.3 sec), and then numpy.dot (9.7 sec). Considering B with 4 or less number of rows, hmma_py, %*%, and hmma r ranked 1 to 3 for the shortest runtime. The memory usage of a (40,000 × 40,000) B’B was 12.8 Gb, and the main advantage of HMMA was reducing it to the half.

Full Text: PDF

Published: 2020-02-13

How to Cite this Article:

Mohammad Ali Nilforooshan, Memory-efficient self-cross-product for large matrices using R and Python, J. Math. Comput. Sci., 10 (2020), 497-506

Copyright © 2020 Mohammad Ali Nilforooshan. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

J. Math. Comput. Sci.

ISSN: 1927-5307

Editorial Office: jmcs@scik.org

 

Copyright ©2020 JMCS