Einstein Notation and Fast Matrix Multiplications

We discuss how Einstein notation can help clarify why kij loops are used when implementing naive matrix multiplication.

30 November 2025

This algorithm is still relevant as of the time of this blog post, since the naive way is fundamentally the way most (all?) libraries implement, albeit with blocking methods (architecture specific) for performance. Theoretical methods have not taken over yet.

Suppose we want to compute the product of two matrices A,BA,B and save it into a matrix CC. In Einstein notation, we would write the entry (i,j)(i,j) of CC as

Cji=AkiBjk,\begin{equation} C^i_j = A^i_k B^k_j, \end{equation}

where kk is a dummy index. To completely fill in matrix CC, we must loop over i,ji,j as well. Altogether, we must loop over indices i,j,ki,j,k. But the order in which we do so does not matter by Fubini's / commutativity of addition. This raises the question of which index we should put into the hot loop.

The insight is that in row-major order languages, to obey spatial locality, we fix the row and iterate over the columns. In Einstein notation, this corresponds to fixing the top index and moving the bottom one. Looking at equation (1), we see a repeat of index jj on both sides of the equation (namely on matrices C and B). Therefore, we choose hot index jj, which means choosing either ikjikj or kijkij.

Contrast this to the naive ijkijk implementation, where the dummy index kk is hot. In that case, BB disobeys spatial locality since matrix multiplication has an access with stride pp, where pp is the number of columns of AA.