Memory-efficient self-cross-product for large matrices using R and Python
Abstract
In many quantitative studies, calculations of matrix self-cross-products (B’B) are needed, where B is any matrix of interest. For matrix B with many number of rows, there might be memory limitations in storing B’B. Also, calculating B’B has a computational complexity of m 2n, for B with n and m number of rows and columns, respectively. Because B’B is symmetric, almost half of the calculations and the memory usage are redundant. The half-matrix multiplication algorithm (HMMA) was introduced, which creates B’B upper-triangular. Matrix multiplication functions %*% and crossprod in R, numpy.dot in Python, and user-defined HMMA functions hmma r and hmma py in R and Python were compared, for matrix B with 40,000 real numbers, and various dimensions. Runtime of B’B calculation was less than a second when B had more than 4 rows. The longest runtime was for B with 1 row and crossprod (21.3 sec), and then numpy.dot (9.7 sec). Considering B with 4 or less number of rows, hmma_py, %*%, and hmma r ranked 1 to 3 for the shortest runtime. The memory usage of a (40,000 × 40,000) B’B was 12.8 Gb, and the main advantage of HMMA was reducing it to the half.
Copyright ©2024 JMCS