A Bitwise Convolution Tutorial

Prerequisites

An introduction

It is an important and popular fact that the things that we classify as FFT on code-drills, are not always actual fast fourier transforms (if we're being overly-technical). Suppose you are given two arrays \(A\) and \(B\) and an operation \(*\). We will call the \(*\) convolution of \(A\) and \(B\) the array \(C\), such that: $$ C_p = (A*B)_p = \sum_{i*j=p}{A_i B_j} $$ In the case of actual FFT, \(*\) is addition, but it might as well be something else, like AND, OR, XOR, or something like gcd or lcm that require techniques far less advanced than the ones presented in this article, intended to introduce the (*sighs* rare) reader to techniques that allow us to efficiently compute such convolutions by linear algebraic means.
I will also present the solutions to some problems that involve these techniques such as Random Nim Generator.

Throwback to FFT

As you should know, FFT is basically a divide and conquer algorithm designed to reduce the complexity of the multiplication of a vector with a specific Vandermonde matrix: $$ V_n = \begin{bmatrix} \omega^{0 \cdot 0} & \omega^{0 \cdot 1} & \omega^{0 \cdot 2} & \dots & \omega^{0 \cdot n} \\ \omega^{1 \cdot 0} & \omega^{1 \cdot 1} & \omega^{1 \cdot 2} & \dots & \omega^{1 \cdot n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \omega^{n \cdot 0} & \omega^{n \cdot 1} & \omega^{n \cdot 2} & \dots & \omega^{n \cdot n} \end{bmatrix} \large{ , \omega=e^{ \frac{2\pi i}{n} } } $$ Let's recap a bit the FFT process:

The Bitwise Convolutions Matrices

Now that there we're done with the short "classical FFT" recap, we may dive into XOR, AND and OR covolutions, and there is good news! They are quite easier to understand, don't require complex numbers or roots of unity in some field, run much faster and are shorter to code. All of that matrix decomposition maths was there to point out the fact that most convolution algorithms (but not gcd convolution for example) are based on such matrix decompositions and present a familiar example, even though it is one of the most complex ones from this perspective.

Remember the \(C=V_n^{-1}((V_n A) \odot (V_n B))\) relation from the beginning of the article? Let's change \(V_n\) to \(T_n\) (for "transformation") to avoid confusion. Now the meaning of \(C=T_n^{-1}((T_n A) \odot (T_n B))\) is that we apply some transformation to each of the vectors, obtain another vector trough one by one multiplication ('\(\odot\)') and then reverse the transformation in order to get the result of the convolution. Looking at this equation from above, \(T_n\) (the transformation) is the unknown. In the case of "classical FFT", the vandermonde matrix over the roots of unity worked, but now we must find another that yelds the XOR convolution for example. Let's see how we find out the values of \(T_n\) in that case!
We will use \( \oplus \) as the operator symbol of XOR, as it is shorter and looks fancier and \($\) as the XOR convolution symbol. Also, \(n\) will be a power of two to ensure that everything is contained.
Let's see how this convolution would work for vectors of size 2. By definition: $$ (A$B)_p=\sum_{k=0}^{n - 1}{A_k B_{k \oplus p}} $$ Additionally: $$ (A$B)_p=T_n^{-1}((T_nA) \odot \ (T_nB)) $$ And we'll set $$ A= \begin{bmatrix} a_0 \\ a_1 \\ \end{bmatrix}, B= \begin{bmatrix} b_0 \\ b_1 \\ \end{bmatrix}, T_2= \begin{bmatrix} w & x \\ y & z \\ \end{bmatrix} \\ \text{so} \\ A$B= \begin{bmatrix} a_0b_0 + a_1b_1 \\ a_0b_1 + a_1b_0 \\ \end{bmatrix} $$ So now we can start solving for \(T_2\). $$ (A$B)_p = T_n^{-1}((T_nA) \odot \ (T_nB)) \implies T_n (A$B) = (T_nA) \odot \ (T_nB) \\ \implies \\ \begin{bmatrix} w & x \\ y & z \\ \end{bmatrix} \begin{bmatrix} a_0b_0 + a_1b_1 \\ a_0b_1 + a_1b_0 \\ \end{bmatrix} = \begin{bmatrix} w & x \\ y & z \\ \end{bmatrix} \begin{bmatrix} a_0 \\ a_1 \\ \end{bmatrix} \odot \begin{bmatrix} w & x \\ y & z \\ \end{bmatrix} \begin{bmatrix} b_0 \\ b_1 \\ \end{bmatrix} \\ \implies \\ \begin{bmatrix} w (a_0b_0 + a_1b_1) + x (a_0b_1 + a_1b_0) \\ y (a_0b_0 + a_1b_1) + z (a_0b_1 + a_1b_0) \\ \end{bmatrix} = \begin{bmatrix} w a_0 + x a_1 \\ y a_0 + z a_1 \\ \end{bmatrix} \odot \begin{bmatrix} w b_0 + x b_1 \\ y b_0 + z b_1 \\ \end{bmatrix} \\ \implies \\ \begin{bmatrix} w (a_0b_0 + a_1b_1) + x (a_0b_1 + a_1b_0) \\ y (a_0b_0 + a_1b_1) + z (a_0b_1 + a_1b_0) \\ \end{bmatrix} = \begin{bmatrix} (w a_0 + x a_1) (w b_0 + x b_1) \\ (y a_0 + z a_1) (y b_0 + z b_1) \\ \end{bmatrix} \\ \implies \\ \begin{bmatrix} w (a_0b_0 + a_1b_1) + x (a_0b_1 + a_1b_0) \\ y (a_0b_0 + a_1b_1) + z (a_0b_1 + a_1b_0) \\ \end{bmatrix} = \begin{bmatrix} w^2 (a_0b_0 + x^2 a_1b_1) + wx (a_0b_1 + a_1b_0) \\ y^2 (a_0b_0 + z^2 a_1b_1) + wx (a_0b_1 + a_1b_0) \\ \end{bmatrix} $$ Now, because \(a_0, a_1, b_0, b_1\) are variables, this gives us a system of equations: $$ \begin{cases} w^2=w, x^2=w, x=wx \\ y^2=y, z^2=y, z=yz \\ \end{cases} $$ Which has the following solutions: $$ (w,x) \in \{(0, 0), (1, 1), (1, -1) \} \\ \land \\ (y,z) \in \{(0, 0), (1, 1), (1, -1)\} $$ But there is a catch, \(T_2\) must be invertible. That is \(\det{T_2} \neq 0\), which leaves us with these solutions $$ T_2 \in \left \{ \begin{bmatrix} 1 & 1 \\ 1 & -1 \\ \end{bmatrix}, \begin{bmatrix} 1 & -1 \\ 1 & 1 \\ \end{bmatrix} \right \} $$ And indeed, these are the the matrices whose correspondend transformation gives us the XOR convolution for arrays of length \(n=2\). So we'd better figure out a way to extend this to larger powers of 2, let's say \(n=2^k\). Then, the transformation matrix would be \(T_n=T_2^{\otimes k}\), which in our case is the Hadamard Matrix. The proof of this, again is based on induction (and actually not that hard) and will appear in case of popular demand. But the main principle behind it is that for XOR, OR and AND, the bits are independent and the Kroenenker product has a distributive multiplication efect, as in the case of "classical FFT", on the sequence partitioned in buckets of size 2, then 4, then 8 etc.
There's just one more thing to point out about the kroneker product: $$ (M^{\otimes n})^{-1} = (M^{-1})^{\otimes n} $$
Let's see how we efficiently multiply these kroneker products with vectors if $$ T_2= \begin{bmatrix} w & x \\ y & z \\ \end{bmatrix} \implies T_2^{-1}= \frac{1}{wz-xy} \begin{bmatrix} z & -x \\ -y & w \\ \end{bmatrix} $$

void transform(vector<double> &pol) {
    int n = size(pol);
	
	for (int step = 1; step < n; step*= 2) { 
		for (int pos = 0; pos < n; pos+= step * 2) {
			for (int i = 0; i < len; ++i) {
                // replace values pol[pos + i] pol[pos + 1 + step] with their product with T_2
				double a = pol[pos + i];
				double b = pol[pos + i + step];
				pol[pos + i] = w * a + x * b;
				pol[pos + i + step] = y * a + z * b;
            }
        }
    }
}

void inverse_transform(vector<double> &pol) {
	const double determinant = w * z - x * y;
    int n = size(pol);

	for (int step = 1; step < n; step*= 2) { 
		for (int pos = 0; pos < n; pos+= step * 2) {
			for (int i = 0; i < len; ++i) {
                // replace values pol[pos + i] pol[pos + 1 + step] with their product with the inverse of T_2
				double a = pol[pos + i];
				double b = pol[pos + i + step];
				pol[pos + i] = (z * a - y * b) / determinant;
				pol[pos + i + step] = (w * b - x * a) / determinant;
            }
        }
    }
}

And in case you want a XOR convolution, just set \(w, x, y, z\) to the values from the specific XOR transformation matrix.

Some Pen and Paper Exercises

Random Nim Generator

"How many sequences of length \(n\) containing numbers from \(0\) to \(k\) have their total XOR sum greater than \(0\)?"
Let's see a dp approach! $$ \text{dp[a][b]} = \sum_{i=0}^{k}{dp[a - 1][i \oplus b]} $$ where \(\text{a}\) is the number of elements taken so far and \(\text{b}\) is their current XOR sum. And the dp is initialized with \(\text{dp[1][i]}=1, \forall i \in [0..k] \text{ and } 0 \text{ in rest}\).
Let's define \(K=[1, 1, 1, ..., 1, 0, 0, ..., 0]\) (the first k + 1 values are \(1\) and the rest are \(0\), the array's length being the smallest power of \(2\) greater or equal to \(k\)). Now we can write the recurrence as $$ \text{dp[a]} = \text{dp[a-1]}$K \\ \text{but} \\ \text{dp[1]}=K \\ \text{so} \\ \text{dp[a]} = K$K$K$ \dots $K \text{ (a times)} $$ So you just have to use the XOR transform on \(K\) convolute it with itself \(n\) times and output the sum of the values of the non-zero positions of the resulting vector.
Here is my source code.

AND closure

Just apply a "self-AND-convolution" to the frequency array log times and output the positions with non-zero corresponding values.
Here is my source code.