BCH Codes

Consider this common scenario in today’s world: trying to send a message across a noisy network.
More concretely, when we send a \(n\)-bit binary string \(s\in\{0,1\}^n\) across the channel, the receiver will receive some possibly altered \(n\)-bit string \(s’\in\{0,1\}^n\) due to errors.

In face of such a dilemma, we can use some error-correcting code to detect and even correct those errors.
In particular we will cover BCH codes, which are highly effective in their error correcting capabilities as well as having efficient encoding/decoding algorithms. They also present a nice application of algebra to computing.

Finite Fields

Finite fields have a ton of intersesting properties, but we’ll only cover the essentials to construct BCH codes.

All finite fields must have prime characteristic, meaning 1 in the field added to itself some prime number of times results in 0. \(1+\ldots+1 = 0\)
e.g. the numbers modulo some prime \(\mathbb{Z}_p\) are all invertible and form a field in the “obvious” way.

For a given prime \(p\), all fields of characteristic \(p\) are of the form \(\mathbb{F}_{p^r}\) or fields with \(p^r\) elements.

If we have the polynomial \(x^n-1\) over \(\mathbb{Z}_2 = \mathbb{F}_2\), there is some field extension of the form \(\mathbb{F}_{2^r}\) that as \(n\) distinct roots of \(x^n-1\).
All of the roots of \(x^n-1\) will be powers of some \(\omega\)
i.e. The roots are \(1, \omega, \omega^2, \ldots\) and \(\omega\) is called the primitive \(n\)th root of unity.

Coding Preliminaries

A \((n, k)\)-block code where \(n > k\) is an error-correcting code that encodes information in \(k\)-bit blocks, and sends data across the noisy channel in \(n\)-bit blocks. Given some \(k\)-bit string of data, there is some function \(E: \mathbb{Z}_2^k \to \mathbb{Z}_2^n\) to encode and a similar function \(D: \mathbb{Z}_2^n \to \mathbb{Z}_2^k\) to decode.

A codeword is any \(n\)-bit string in the image of \(E\).

Define the distance between two strings as the number of bits that differ between them.
Symbolically, \(d(010, 011) = 1\). This is also called the hamming distance.

The weight of a binary string is the number of 1s. i.e. \(w(1010) = 2\)

For codes, instead of analyzing distances between individual codes, we typically analyze the minimum distance \(d_{min}\) between all pairs of distinct codewords. For some code \(\mathcal{C}\) to be \(m\) error detecting, \(d_{min} \geq m+1\). For the code to be \(m\) error correcting, we must have \(d_{min} \geq 2m+1\).

Now let’s impart some more structure upon the codes that we are studying to utilize existing knowledge on groups and rings. A linear code is a code where the codewords form a vector space and is thus closed under linear combinations.

With this given structure, notice that the difference between two codewords \(x-y\) is itself a codeword, so for linear codes \(d_{min} = w_{min}\) where \(w_{min}\) is the minimum weight of any codeword.

With these definitions, we can already build some interesting block codes using knowledge from linear algebra.

Polynomial Codes

A cyclic code is a code that is closed under bit shifts. i.e. \(1101\) shifted once to the right is \(1110\).

We are going to take a look at polynomial codes, which are all cyclic codes.

Given a tuple representing a codeword \(a_0, a_2, \ldots, a_{n-1}\), we can associate it with the polynomial \(a_0 + a_1x + \ldots + a_{n-1}x^{n-1}\). We will use this encoding to generate polynomial codes.

Suppose we want to construct a \(n, k\) code. We can start with some polynomial of degree at most \(n-k\), \(g(x) = a_0+a_1x+\ldots+a_{n-k-1}x^{n-k-1}\) and encode any block of degree at most \(k-1\), \(f(x) = b_0+b_1x + \ldots + b_{k-1}x^{k-1}\) by simply multiplying the polynomials: \(f(x)g(x)\). Codes constructed in this way are polynomial codes. Every codeword has degree less than \(n\) and is divisible by \(g(x)\).

The entire space of possible polynomial of degree less than \(n\): \(R_n\) is isomorphic to \(\mathbb{Z}_2[x]/ \langle x^n-1\rangle\) as vector spaces. Recall that when given a polynomial \(g(x)\), the possible codewords are all polynomials divisible by \(g(x)\). Or rather, the set of all possible multiples of \(g(x)\) is the set of all possible codes i.e. The space of all possible polynomials is the ideal \(\langle g(x)\rangle\) in \(\mathbb{Z}_2[x]/\langle x^n-1\rangle\).

However, there is an easy way to categorize the ideals in \(\mathbb{Z}_2[x]/\langle x^n-a\rangle\). By the first isomorphism theorem for rings, these target ideals are in bijective correspondence with ideals in \(\mathbb{Z}_2[x]\) that contain the ideal \(\langle x^n-1\rangle\). The bijection between ideals is given by the quotient map: \(\mathbb{Z}_2[x] \to \mathbb{Z}_2[x]/\langle x^n-1\rangle\).

Since \(\mathbb{Z}_2[x]\) is a principle ideal domain, every target ideal containing \(\langle x^n-1\rangle\) are of the form \(\langle g(x)\rangle\) where \(g(x)\) divides \(x^n-1\) in \(\mathbb{Z}_2[x]\).

This drastically limits the polynomials we need analyze that correspond to polynomial codes.

BCH Code Construction

Lemma

Before proceeding to defining BCH codes, we must cover a short lemma.
Let \(\omega\) be the primitive \(n\)th root of of unity over \(\mathbb{Z}_2\). If \(g(x)\) is the polynomial generating some polynomial code that contains \(s\) consecutive powers of \(\omega\) as roots, then the code generated by \(g(x)\) has minimum distance at least \(d\).

Suppose that \(\omega^r, \omega^{r+1}, \ldots, \omega^{r+s-1}\) are roots of \(g(x)\). Suppose \(f(x)\) is some codeword with weight at most \(s\).
i.e. \(f(x) = \sum_{j=1}^s a_jx^{i_j}\)

We know that \(\omega^r, \ldots, \omega^{r+s-1}\) are all roots of this polynomial since \(g(x)\) divides \(f(x)\).

\[a_1(\omega^r)^{i_1} + a_2(\omega^r)^{i_2} + \ldots + a_s(\omega^r)^{i_s} = 0 \\ a_1(\omega^{r+1})^{i_1} + a_2(\omega^{r+1})^{i_2} + \ldots + a_s(\omega^{r+1})^{i_s} = 0 \\ \vdots \\ a_1(\omega^{r+s-1})^{i_1} + a_2(\omega^{r+s-1})^{i_2} + \ldots + a_s(\omega^{r+s-1})^{i_s} = 0\]

This implies that \(a_i\)’s are solutions to the following system of equations

\[(\omega^{i_1})^rx_1+(\omega^{i_2})^rx_2+\ldots + (\omega^{i_s})^rx_s = 0 \\ (\omega^{i_1})^{r+1}x_1+(\omega^{i_2})^{r+1}x_2+\ldots + (\omega^{i_s})^{r+1}x_s = 0 \\ \vdots \\ (\omega^{i_1})^{r+s-1}x_1+(\omega^{i_2})^{r+s-1}x_2+\ldots + (\omega^{i_s})^{r+s-1}x_s = 0 \\\]

However, the following matrix has nonzero determinant since it is vandermonde.

\[\begin{bmatrix} (\omega^{i_1})^r & (\omega^{i_2})^r & \cdots & (\omega^{i_s})^r \\ (\omega^{i_1})^{r+1} & (\omega^{i_2})^{r+1} & \cdots & (\omega^{i_s})^{r+1} \\ \vdots & \vdots & \ddots & \vdots \\ (\omega^{i_1})^{r+s-1} & (\omega^{i_2})^{r+s-1} & \cdots & (\omega^{i_s})^{r+s-1} \end{bmatrix}\]

BCH Construction

We’ve built all the necessary machinery to understand BCH codes; describing BCH codes is straightforward.

Suppose that we want a code that can correct up to \(r\) errors so the weight of our desired code is \(d=2r+1\). We will need some primitive root \(\omega\) of order at least \(d\). We define \(g(x) = \text{lcm}[m_1(x), \ldots, m_{2r}(x)]\) where \(m_i\) is the minimal polynomial of \(\omega^i\) over \(\mathbb{Z}_2\).

Based on our previous lemma, the code generated by this polynomial will have distance at least \(d\) as desired.

Decoding

Besides the optimality of BCH codes with regards to error correction, they are easy to decode using syndrome decoding.

Suppose we have some codeword: \(c(x) = m(x)g(x)\) where \(g(x)\) is the generator polynomial. The received polynomial is \(r(x) = c(x) + e(x)\) where \(e(x)\) is the polynomial representing errors.

Let \(\omega\) be the primitive root used to construct the BCH code and \(e(x) = \sum_{j=1}^t x^{i_j}\) is the error polynomial with \(t\) errors. The received polynomial is thus \(r(x) = c(x) + e(x)\).

Then, define the syndrome as the sequence \(s_i = r(\omega^i) = e(\omega^i)\) where the minimal polyonmial of \(\omega^i\) was used to construct \(g(x)\). The \(c(x)\) portion of \(r(x)\) vanishes since the powers of \(\omega\) are by construction roots of the generator polynomial. We also expect at least \(2d\) syndromes where \(d\) is the desired min distance used to generate the code.

If the error polynomial is \(e(x) = x^{i_1}+x^{i_2}+\ldots+x^{i_t}\), define the error locator polynomial as \(\Lambda(x) = \prod_{j=1}^t (1-x\omega^{i_j})\).
Knowing the error-locator polynomial, we can identify the location of the errors by solving for the roots.
If there is an error at position \(x^{i_k}\), \(\omega^{-i_k}\) is a root of \(\Lambda\).

Suppose we write out this error-locator polynomial: \(\Lambda = \sum_{\ell=0}^t \Lambda_\ell x^\ell\)

We can sum the equation with \((\omega^{i_j})^k\) as coefficients for \(k \in \{t+1,\ldots, 2t\}\)

\[0 = \sum_{j=1}^t (\omega^{i_j})^k \Lambda(\omega^{-i_j})\\ 0 = \sum_{j=1}^t (\omega^{i_j})^k \sum_{\ell=0}^t \Lambda_\ell(\omega^{-i_j})^\ell\\ 0 = \sum_{\ell=0}^t \Lambda_\ell \sum_{j=1}^t(\omega^{i_j})^{k-\ell}\\ 0 = \sum_{\ell=0}^t \Lambda_\ell s_{k-\ell}\]

i.e. The coefficients of \(\Lambda\) have a linear relationship with the syndromes \(s_i\).

Thus we can solve for the coefficients of \(\Lambda\) with the following equation

\[\begin{bmatrix} s_1 & s_2 & \cdots & s_t \\ s_2 & s_3 & \cdots & s_{t+1} \\ \vdots & \vdots & \ddots & \vdots\\ s_t & s_{t+1} & \cdots & s_{2t-1} \end{bmatrix} \begin{bmatrix} \Lambda_t\\ \Lambda_{t-1}\\ \vdots \\ \Lambda_0 \end{bmatrix} = \begin{bmatrix} -s_{t+1}\\ -s_{t+2}\\ \vdots\\ -s_{2t} \end{bmatrix}\]

Once we have the coefficients of \(\Lambda\), we just need to solve for the roots, and since the roots must be in some finite field, we can test them one by one. Chien search would be an efficient way to do so.

The PGZ algorithm uses these observations to find the error locator polynomial by iterating downward the possible degree of the error-locator polynomial until the above matrix equation is solvable and the error-locator polynomial can be identified..