November 29, 2023
When people think about linear algebra, the first thought is often "matrices", and the interest on this subject evaporates instantly. This is truly an unfortunate scenario given how important and prominent linear algebra has rooted itself in our modern society. This is the main motivation behind my next blog post: I want to make the concept of linear algebra not only accessible to the general audience, but also convey its importance in a fun and easy-to-digest manner. I am going to do this via "eigenvalues", which is another convoluted-sounding term in linear algebra that is in fact not only simple to understand, but will almost single-handedly show why linear algebra makes modern technology possible, which is why this central branch of math is so significant in our society in the first place.
The following equation is actually how modern societies operate:
\begin{align} y &= a_1x_1 + a_2x_2 + \dots + a_nx_n \\ \\ &= \sum_{k = 1}^n a_kx_k \\ \end{align}
The above equation is called a "linear equation" because it represents a line when you graph it out (try it with the simple case \(y = a_1 x_1\), for example). The second line is simply using the summation symbol "\(\sum\)" to shorten the addition.
Why is the above equation so prevalent in our society? As I alluded in my earlier blog post about "Making a Pi(e)", or a cake, or whatever food fancies your mind right now, that is exactly what the aforementioned linear equation is doing. It's making a pie. The \(x\)'s are the ingredients of the pie, and the \(y\) is the finished product. The amount of each ingredient to add to the batter so that we can bake our delicious pie is exactly the \(a_k\)'s for the corresponding \(x_k\)'s. For example, if \(x_1\) represents flour and we need two cups of it, then \(a_1 = 2\), and so on. Of course, the example here is just a metaphor. In wireless communication, that "pie" in "\(y\)" would be the received message and the ingredients in the \(x\)'s will be the time-delayed copies of the transmitted signal. The \(a\)'s will be the strength of each of the time-delayed copies. In machine learning, the pie in \(y\) is a probability in deciding whether the input is the target outcome based on the trained model. The ingredients in the \(x\)'s are the different features factored into making the decision, and the \(a\)'s are the weights given for each feature, which comprises the trained model itself. The truth is that there are so many different "pies" we would like to make in our modern world, and once we know the ingredients and the corresponding amounts for them, we invoke the linear equation above to model the technology that will bake the pie. Linear algebra studies the above equation and that's why it's everywhere in our world.
Here I want to address the elephant in the room: the very mention of matrices and vectors is probably the biggest reason in preventing people from developing a greater interest in linear algebra. The concepts behind those things just sound inherently tedious. Anyone who has studied linear algebra probably have dreaded multiplying matrices and finding the reduced forms of matrices and so on. Digressing on some social commentary, I believe that it is wrong to teach matrix multiplcation by diving into the formulas and the technicalities, let alone going deeper into reducing a matrix to its...[let me check]...row echelon form via Gauss-Jordan elimination. I must admit that even for a math aficionado, having those things as my first exposure into linear algebra almost made me quit math forever. I am glad it didn't quite complete the job.
Instead, if I were to teach matrix multiplication, I would start with vector multiplication by sticking close to the roots: the linear equations that linear algebra studies in the first place. The main message: vectors and matrices are nothing but shorthand notations to writing out linear equations. Don't believe me? Let's look at the linear equation from earlier:
\begin{align} y &= a_1x_1 + a_2x_2 + \dots + a_nx_n \\ \end{align}
Here, the goal is not to use the summation symbol or any Greek/Latin letters to re-write the equation, but to introduce a new concept that packages the above equation, which contains multiple terms, into one that contains only one term. That concept is called "vectors". The way the concept works is that we define a single entity, called \(\mathbf{a}\), that contains the elements \(a_k\)'s. That is:
\begin{align} \mathbf{a} &= \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \\ \end{bmatrix}\\ \end{align}
(On paper, we would draw an arrow on top of our vector "a", but on computer, it is common to bold the letters to denote vectors.)
By convention, all vectors are column vectors, that is, the elements are stacked on top of each other. If we want row vectors, we simply "transpose" the vector as follows:
\begin{align} \mathbf{a}^T &= \begin{bmatrix} a_1 a_2 \dots a_n \\ \end{bmatrix}\\ \end{align}
If the \(a_k\)'s are complex and we want to transpose the vector, it is almost always the case that we have to take its complex conjugate at the same time. Then, this "transpose" operation becomes finding the vector's "Hermitian". We use "H" instead of "T" in this case, as follows:
\begin{align} \mathbf{a}^H &= \begin{bmatrix} a_1^* a_2^* \dots a_n^* \\ \end{bmatrix}\\ \end{align}
Where "*" denotes complex conjugation.
Now, multiplying two vectors to produce a scalar often involves taking the "dot product", or more generally, the "inner product" between those two vectors. This is simply the sum of the individual elements multiplied, namely, our original linear equation!
\begin{align} y &= a_1x_1 + a_2x_2 + \dots + a_nx_n \\ \end{align}
The inner product between two vectors is the above form, and using vectors, it is written as:
\begin{align} y &= a_1x_1 + a_2x_2 + \dots + a_nx_n \\ \\ &= \mathbf{a}^H \mathbf{x} \\ \end{align}
Of course, if the vectors only contain real numbers, then we can also write as \(y = \mathbf{a}^T \mathbf{x}\). The point is, vector multiplication between a row and a column vector is nothing but taking the inner product, or the dot product, between the two vectors, forming our pie-making linear equation! This form of vector multiplication is what you will most often see in linear algebra literature.
***
Matrices, then, can be thought of as the "stacked" version of "vectors". You may have been taught that whenever a matrix multiplies a vector, the matrix acts as a "linear transformation" of the vector. While this concept makes graphical sense, a better way to understand the significance of matrix multiplication is the concept of self-terms and cross-terms.
Before we get into that, I would like to make one public service announcement: matrix multiplication isn't hard, if you following the one principle defining it:
Given matrices \(\mathbf{A}\) and \(\mathbf{B}\), the \(i\)th row and \(j\)th column entry of the product matrix from the matrix multiplication \(\mathbf{A} \mathbf{B}\) is simply the dot product between the \(i\)th row of \(\mathbf{A}\) and the \(j\)th column of \(\mathbf{B}\). Knowing this one principle, you will be able to grasp matrix multiplication.
The above principle also explains why the inner dimensions of \(\mathbf{A}\) and \(\mathbf{B}\) must agree for the multiplication to make sense, and why the product matrix takes on the outer dimensions of \(\mathbf{A}\) and \(\mathbf{B}\).
Now, consider an n by n matrix \(\mathbf{A}\) multiplying an n by 1 vector \(\mathbf{v}\). The resulting product, then, is simply an n by 1 vector whose elements are the respective inner products between the row vectors of \(\mathbf{A}\) and \(\mathbf{v}\). If we write out the entire entries for both the matrix and the vector, we get the following:
\begin{align} \mathbf{A} \mathbf{v} &= \begin{bmatrix} a_{1,1} & a_{1,2} & a_{1,3} & \dots & a_{1,n} \\ a_{2,1} & a_{2,2} & a_{2,3} & \dots & a_{2,n} \\ a_{3,1} & a_{3,2} & a_{3,3} & \dots & a_{3,n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ a_{n,1} & a_{n,2} & a_{n,3} & \dots & a_{n,n} \\ \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \\ v_3 \\ \vdots \\ v_n \\ \end{bmatrix} \\ \\ & = \begin{bmatrix} \mathbf{a_{1,1}v_1} + a_{1,2}v_2 + a_{1,3}v_3 + \dots + a_{1,n}v_n \\ a_{2,1}v_1 + \mathbf{a_{2,2}v_2} + a_{2,3}v_3 + \dots + a_{2,n}v_n \\ a_{3,1}v_1 + a_{3,2}v_2 + \mathbf{a_{3,3}v_3} + \dots + a_{3,n}v_n \\ \vdots\\ a_{n,1}v_1 + a_{n,2}v_2 + a_{n,3}v_3 + \dots + \mathbf{a_{n,n}v_n} \\ \end{bmatrix} \end{align}
I purposely bolded the "self-term" in the above product vector. What matrix multiplication is doing is just scaling the original vector by the entries of the matrix. However, each corresponding element in the resulting vector is not simply dependent on the corresponding terms of the original vector, but also on the other vectors. For example, for \(v_1\), the corresponding element in the resulting vector is \(\mathbf{a_{1,1}v_1} + \dots + a_{1,n}v_n\), indicating that not only is it dependent on \(v_1\), but it is also dependent on the other elements \(v_2\) through \(v_n\). The terms consisting of other elements in the resulting element are essentially "cross-talk", or the "cross-terms", of the resulting element.
This can be extremely problematic in the world of engineering. One important use case for the above matrix-vector multiplication is the modeling of massive MIMO systems used by advanced wireless communication networks, such as 5G and 6G. In this case, the resulting vector is the received signal, and our goal is to decode the original transmitted signal in \(\mathbf{v}\). Although \(\mathbf{A}\), which is called the "channel matrix" in this case, is known ahead of time, it is still of impractical complexity to decode the original \(\mathbf{v}\) given the cross-terms, and the complexity scales dramatically with larger and larger \(n\), which is often the case for massive MIMO.
To solve the above problem, we invoke an unlikely hero--eigenvalues. You might have been taught that the eigenvalues for a square matrix \(\mathbf{A}\) are the values such that for a particular vector \(\mathbf{v}\), which is called the corresponding "eigenvector" for that eigenvalue \(\lambda\), we have:
\begin{align} \mathbf{A} \mathbf{v} &= \lambda \mathbf{v} \\ \end{align}
You might also have been taught that the concept of the eigenvalue is that any linear transformation done on the corresponding eigenvector preserves the direction of that eigenvector. However, here I am presenting a different perspective for eigenvectors: it is used to diagonalize the matrix \(\mathbf{A}\).
A diagonal matrix is simply a square matrix whose non-zero terms reside along its main diagonal. For example, if the matrix \(\mathbf{\Lambda}\) is diagonal, then we have:
\begin{align} \mathbf{\Lambda} &= \begin{bmatrix} \lambda_{1,1} & 0 & 0 & \dots & 0 \\ 0 & \lambda_{2,2} & 0 & \dots & 0 \\ 0& 0 & \lambda_{3,3} & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & \lambda_{n,n} \\ \end{bmatrix} \end{align}
Applying matrix multiplication for \(\mathbf{\Lambda}\) on vector \(\mathbf{v}\), we get:
\begin{align} \mathbf{\Lambda} \mathbf{v} &= \begin{bmatrix} \lambda_{1,1} & 0 & 0 & \dots & 0 \\ 0 & \lambda_{2,2} & 0 & \dots & 0 \\ 0& 0 & \lambda_{3,3} & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & \lambda_{n,n} \\ \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \\ v_3 \\ \vdots \\ v_n \\ \end{bmatrix} \\ \\ &= \begin{bmatrix} \mathbf{\lambda_{1,1}v_1} + 0v_2 + 0v_3 + \dots + 0v_n \\ 0v_1 + \mathbf{\lambda_{2,2}v_2} + 0v_3 + \dots + 0v_n \\ 0v_1 + 0v_2 + \mathbf{\lambda_{3,3}v_3} + \dots + 0v_n \\ \vdots\\ 0v_1 + 0v_2 + 0v_3 + \dots + \mathbf{\lambda_{n,n}v_n} \\ \end{bmatrix} \\ \\ &= \begin{bmatrix} \lambda_{1,1}v_1 \\ \lambda_{2,2}v_2 \\ \lambda_{3,3}v_3 \\ \vdots\\ \lambda_{n,n}v_n \end{bmatrix} \end{align}
Lo and behold! Our cross-terms now disappear! We are left with only the self-terms. Given the resulting vector, getting back the original vector is a cinch: all we have to do is multiply each element by \(\frac{1}{\lambda_{k, k}}\), provided the corresponding \(\lambda\) is not zero. The complexity scales linear in time with \(n\), and 6G wireless communication is now possible!
By the way, the diagonal entries of a diagonal matrix are often called "column multipliers" because when the diagonal matrix is multiplying a column vector, the resulting product vector is simply one where the elements are multiplied by the corresponding diagonal entry of the matrix. When a row vector is multiplying the diagonal matrix, the diagonal entries are correspondingly called "row multipliers". Those two cases are equivalent.
Of course, the channel matrix won't be diagonal in general. However, it is possible to draw the connection between the channel matrix, \(\mathbf{A}\), and the diagonal matrix, \(\mathbf{\Lambda}\), by claiming that the latter is the matrix diagonalization of the former. How do we go from \(\mathbf{A}\) to \(\mathbf{\Lambda}\)? Well, we first need to examine the definition of the eigenvalues carefully. How many separate equations do the following represent?
\begin{align} \mathbf{A} \mathbf{v} &= \lambda \mathbf{v} \\ \end{align}
If \(\mathbf{A}\) is n by n and \(\mathbf{v}\) is n by 1, then we already have n equations, each representing one element of the resulting product vector. However, how many eigenvalues does \(\mathbf{A}\) have? Recalling that we find the eigenvalues by solving the following equation:
\begin{align} \det{(\lambda\mathbf{I} - \mathbf{A})} &= 0\\ \end{align}
The determinant expression is actually a polynomial in \(\lambda\) of degree \(n\). By the Fundamenal Theorem of Algebra, the polynomial will have \(n\) complex roots, of which at least one is guaranteed to be non-zero. So we have \(n\) eigenvalues for \(\mathbf{A}\). That means our equation for the definition of eigenvalues actually represents \(n^2\) separate equations.
Remember that the purpose behind the concept of vectors and matrices is to package multiple equations into one neat equation. How can we do that for the \(n^2\) separate equations behind \(\mathbf{A} \mathbf{v} = \lambda \mathbf{v}\)?
Well, we start on the right-hand side. The original equation represents one eigenvalue. For \(n\) eigenvalues, we would need the resulting \(\lambda \mathbf{v}\), which is a matrix stacked with column vectors by the way, to be the matrix where the \(k\)th column is \(\lambda_k \mathbf{v}_k\). That smells like column/row multipliers! It's time to bring out the \(\mathbf{\Lambda}\).
Let's then define a matrix, \(\mathbf{Q}\), whose columns are the corresponding eigenvectors for each of the eigenvalues. That is, \(\mathbf{Q}\) is an n by n matrix where:
\begin{align} \mathbf{Q} &= \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \dots & \mathbf{v}_n \\ \end{bmatrix} \end{align}
One neat thing about matrix multiplication is that matrices made from stacked vectors, like \(\mathbf{Q}\), can be treated like a vector. When we do \(\mathbf{Q} \mathbf{\Lambda}\), the matrix multiplication boils down to treating the diagonal entries of the diagonal matrix \(\mathbf{\Lambda}\) as "row multipliers" to the "row vector" \(\mathbf{Q}\), as follows:
\begin{align} \mathbf{Q} \mathbf{\Lambda} &= \begin{bmatrix} \lambda_1\mathbf{v}_1 & \lambda_2\mathbf{v}_2 & \dots & \lambda_n\mathbf{v}_n \\ \end{bmatrix} \end{align}
Therefore, \(\mathbf{\Lambda}\) is an n by n diagonal matrix whose non-zero diagonal entries are the corresponding eigenvalues of \(\mathbf{A}\), and \(\mathbf{Q}\) is an n by n matrix whose columns are the corresponding eigenvectors for each of the eigenvalues.
Now let's examine the left-hand side \(\mathbf{A} \mathbf{v}\). Our goal is to have an expression that is equal to \(\mathbf{Q} \mathbf{\Lambda}\). Recall that \(\mathbf{A} \mathbf{v}\) is simply a vector whose \(k\)th element is the inner product between the \(k\)th row of \(\mathbf{A}\) and \(\mathbf{v}\), we can re-write \(\mathbf{A}\) as a vertically-stacked matrix of its rows as the row vectors:
\begin{align} \mathbf{A} &= \begin{bmatrix} \mathbf{a}_1 \\ \mathbf{a}_2 \\ \vdots \\ \mathbf{a}_n \\ \end{bmatrix} \end{align}
Then, we have \(\mathbf{A} \mathbf{Q}\) as our desired result:
\begin{align} \mathbf{A} \mathbf{Q} &= \begin{bmatrix} \mathbf{a}_1 \\ \mathbf{a}_2 \\ \vdots \\ \mathbf{a}_n \\ \end{bmatrix} \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \dots & \mathbf{v}_n \\ \end{bmatrix} \\ \\ &= \begin{bmatrix} \mathbf{a}_1\mathbf{v}_1 & \mathbf{a}_1\mathbf{v}_2 & \dots & \mathbf{a}_1\mathbf{v}_n \\ \mathbf{a}_2\mathbf{v}_1 & \mathbf{a}_2\mathbf{v}_2 & \dots & \mathbf{a}_2\mathbf{v}_n \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \mathbf{a}_n\mathbf{v}_1 & \mathbf{a}_n\mathbf{v}_2 & \dots & \mathbf{a}_n\mathbf{v}_n \\ \end{bmatrix} \\ \\ &= \begin{bmatrix} \mathbf{A}\mathbf{v}_1 & \mathbf{A}\mathbf{v}_2 & \dots & \mathbf{A}\mathbf{v}_n \\ \end{bmatrix} \\ \end{align}
Putting everything together, we have:
\begin{align} \begin{bmatrix} \mathbf{A}\mathbf{v}_1 & \mathbf{A}\mathbf{v}_2 & \dots & \mathbf{A}\mathbf{v}_n \\ \end{bmatrix} &= \begin{bmatrix} \lambda_1\mathbf{v}_1 & \lambda_2\mathbf{v}_2 & \dots & \lambda_n\mathbf{v}_n \\ \end{bmatrix} \\ \\ \mathbf{A} \mathbf{Q} &= \mathbf{Q} \mathbf{\Lambda} \end{align}
In other words, the equation \(\mathbf{A} \mathbf{Q} = \mathbf{Q} \mathbf{\Lambda}\) encapsulates the entire \(n^2\) separate equations from the definition of eigenvalues and eigenvectors \(\mathbf{A} \mathbf{v} = \lambda \mathbf{v}\).
Furthermore, if all eigenvalues of \(\mathbf{A}\) are all distinct and non-zero, then the corresponding eigenvectors will be linearly independent of each other. This makes \(\mathbf{Q}\) full rank and therefore invertible, and we can actually solve for \(\mathbf{A}\):
\begin{align} \mathbf{A} \mathbf{Q} &= \mathbf{Q} \mathbf{\Lambda} \\ \\ \mathbf{A} &= \mathbf{Q} \mathbf{\Lambda} \mathbf{Q}^{-1} \end{align}
In other words, if the eigenvalues of \(\mathbf{A}\) are all non-zero and distinct, then we say that \(\mathbf{A}\) is diagonalizable, and its matrix diagonalization is given by the equation above.
To summarize, in order to diagonalize \(\mathbf{A}\), we do the following:
So we just diagonalized \(\mathbf{A}\). How can we take advantage of this?
Going back to the massive MIMO example. Recall that \(\mathbf{A}\) is known in advance. Therefore, we can attempt to diagonalize it. If it is successful, then we get a much less complex signal decoder. Let \(\mathbf{x}\) be the n by 1 transmitted signal, which is what we want to find, and let \(\mathbf{y}\) be the n by 1 received signal, which is something we observe and need to use, along with \(\mathbf{A}\), to solve for \(\mathbf{x}\). Starting with the massive MIMO model, we have:
\begin{align} \mathbf{y} &= \mathbf{A} \mathbf{x}\\ \\ \mathbf{y} &= \mathbf{Q} \mathbf{\Lambda} \mathbf{Q}^{-1} \mathbf{x}\\ \\ \mathbf{Q}^{-1}\mathbf{y} &= \mathbf{\Lambda} \mathbf{Q}^{-1} \mathbf{x} \\ \\ \mathbf{\tilde{y}} &= \mathbf{\Lambda} \mathbf{\tilde{x}} \\ \end{align}
So we can get our pre-processed transmitted signal, \(\mathbf{\tilde{x}}\), from the post-processed received signal, \(\mathbf{\tilde{y}}\), by dividing each of the elements of the received signal by the corresponding eigenvalues of \(\mathbf{A}\) found in the diagonal entries of \(\mathbf{\Lambda}\). We can recover \(\mathbf{x}\) via post-processing by pre-multiplying \(\mathbf{\tilde{x}}\) with \(\mathbf{Q}\). It might seem that the multiplications in the pre-processing and post-processing of the signals incur extra overhead, but those overhead is neglegible when compared to the ease of decoding the transmitted message by a diagnalized channel matrix. Furthermore, matrices \(\mathbf{Q}\), \(\mathbf{\Lambda}\), and \(\mathbf{Q}^{-1}\) can be pre-computed and stored given \(\mathbf{A}\). Computing the inverse of \(\mathbf{A}\) directly, however, would not be trivial in the context of a massive MIMO decoder. We even have a term for this application of matrix diagonalization: channel diagonalization. Furthermore, the decoder using the diagonalized matrix can leverage the fact that only self-terms are involved for each of the received signals and use a distributed system to parallelize the operation efficiently.
My blog post, thus far, hopefully has achieved its purpose of demystifying linear algebra and convey its usefulness and significance to the general audience via eigenvalues. However, I have more advanced topics that I would like to share, but before that I must warn that I will have to get a little bit more technical from here on. So, if you feel like you have read enough math, please feel free to stop here. However, if you would like to boldly go further down into the rabbit hole, you're never too late!
The world of matrices is more fascinating than one thinks, only if the boring stigma can be removed, and I hope this blog post provide a small part in that effort. The truth is that the matrix world is like an onion, and there are layers upon layers to peel through to find more and more useful types of matrices. You start off with matrices of all sizes. Then, you find out that a subset of those matrices are the square ones, whose dimensions (row and column) are equal. Keep going, then among the square matrices, you find Hermitian ones. That's where things get really interesting. In fact, most matrices in the world of engineering are Hermitian. What are Hermitian matrices and why are they ubiqitous in engineering literature? By definition, a matrix \(\mathbf{A}\) is Hermitian if:
\begin{align} \mathbf{A} &= \mathbf{A}^H \\ \end{align}
That is, given \(\mathbf{A}\) with complex entries, if you transpose it and take its complex conjugate, you get back the same matrix \(\mathbf{A}\), it is Hermitian. If \(\mathbf{A}\) only contains real entries, then we have \(\mathbf{A} = \mathbf{A}^T\). This means that the matrix \(\mathbf{A}\) is symmetric. A matrix is symmetric if and only if it is Hermitian, and symmetric matrices are just special cases of Hermitian matrices with real entries.
If you take any matrix and multiply it by its Hermitian, or vice versa, you are guaranteed to get the resulting product matrix as Hermitian. Such operations occur very often in engineering, such as signal processing and machine learning, and hence why Hermitian matrices are everywhere.
Let's see what happens to the eigenvalues of Hermitian matrices. We know, by the Fundamental Theorem of Algebra, that \(\mathbf{A}\) has at least one non-zero eigenvalue. The explanation is a bit more complicated so I will leave to the reader for further exploration. Let that be \(\lambda\). We start off by considering the expression \(\lambda ||\mathbf{v}||^2\), where \(\mathbf{v}\) is the corresponding eigenvector. Since \(\lambda\neq 0\), \(||\mathbf{v}||^2 > 0\). Also, the L2-norm squared of a vector is simply the dot product between the vector and itself, which is just its Hermitian multiplied by itself.
\begin{align} \lambda ||\mathbf{v}||^2 &= \lambda \mathbf{v}^H \mathbf{v} \\ &= \mathbf{v}^H (\lambda \mathbf{v}) \\ &= \mathbf{v}^H (\mathbf{A} \mathbf{v}) \\ &= \mathbf{v}^H \mathbf{A} \mathbf{v} \\ &= \mathbf{v}^H \mathbf{A}^H \mathbf{v} \\ &= (\mathbf{A} \mathbf{v})^H \mathbf{v} \\ &= (\lambda \mathbf{v})^H \mathbf{v} \\ &= \lambda^* \mathbf{v}^H \mathbf{v} \\ &= \lambda^* ||\mathbf{v}||^2 \end{align}
Since \(||\mathbf{v}||^2 > 0\), it follows that \(\lambda = \lambda^*\). That is, the eigenvalues of the Hermitian matrix \(\mathbf{A}\) are all real numbers. This also means that the eigenvalues of all symmetric matrices are real numbers as well.
This is a very nice result, because real numbers can be compared, that is, they have ordinality. We can say that 5 is greater than 2, but we cannot say 1 + 4i is greater than 2 - 3i. Going deeper into the rabbit hole, we can find further subsets of Hermitian matrices: positive/negative (semi)-definite matrices. A positive definite matrix is an n by n Hermitian matrix \(\mathbf{A}\) defined as follows: for all column vector \(\mathbf{v}\) where the vector can be of complex values in general, we have \(\mathbf{v}^H A \mathbf{v} > 0\). We literally denote this as \(\mathbf{A} > 0\). I will leave to my readers to verify that \(\mathbf{v}^H A \mathbf{v}\) gives a scalar value. A theorem states that \(\mathbf{A} > 0\) is equivalent to saying that the eigenvalues of \(\mathbf{A}\) are all greater than zero. Please note that we have just shown that the eigenvalues are all real for Hermitian \(\mathbf{A}\), so we are not dealing with complex numbers here for the eigenvalues.
The proof utilizes matrix diagonalization. But before diving into the proof, there is another interesting result regarding the eigenvectors of the Hermitian \(\mathbf{A}\) that is needed to complete the proof: they are all orthogonal to each other, and when they are all normalized, they would form an orthonormal basis of the vector space in which \(\mathbf{A}\) operates upon, that is, the vector space containing \(\mathbf{v}\). I talk about the significance of the orthonormal basis in one of my previous blog posts. Please give it a read if it piques your interest. The proof of this result is as follows:
First, applying the Fundamental Theorem of Algebra to \(\det(\lambda \mathbf{I} - \mathbf{A}) = 0\) for finding the eigenvalues in the first place, \(\mathbf{A}\) has at least one non-zero eigenvalue and its associated eigenvector. Denote those as \(\lambda_1\) and \(\mathbf{v}_1\), respectively. Let a subspace of the vector space containing \(\mathbf{v}\) be the linear span of \(\mathbf{v}_1\). Clearly, \(\mathbf{A} \mathbf{v}_1\) is in this subspace as it is equal to \(\lambda \mathbf{v}_1\).
Consider two vectors, \(\mathbf{x}\) and \(\mathbf{y}\) in the vector space containing \(\mathbf{v}\). If \(\mathbf{A}\) is Hermitian, then:
\begin{align} &(\mathbf{A} \mathbf{x})^H \mathbf{y} \\ &= \mathbf{x}^H \mathbf{A}^H \mathbf{y} \\ &= \mathbf{x}^H (\mathbf{A}^H \mathbf{y}) \\ &= \mathbf{x}^H (\mathbf{A} \mathbf{y}) \\ \end{align}
That is, the inner product between the transformed \(\mathbf{x}\) by Hermitian \(\mathbf{A}\) and \(\mathbf{y}\) is equal to the inner product between \(\mathbf{x}\) and the transformed \(\mathbf{y}\) by \(\mathbf{A}\). We now use this result by letting \(\mathbf{x}\) be \(\mathbf{v}_1\) and \(\mathbf{y}\) be a vector in the orthogonal subspace to that spanned by \(\mathbf{v}_1\). Thus, by definition, the inner product between \(\mathbf{A} \mathbf{v}_1\), which is still in the subspace spanned by \(\mathbf{v}_1\), and \(\mathbf{y}\), is zero. Hence, we have:
\begin{align} 0 &= (\mathbf{A} \mathbf{v}_1)^H \mathbf{y} \\ &= \mathbf{v}_1^H (\mathbf{A} \mathbf{y}) \\ \end{align}
This means \(\mathbf{A} \mathbf{y}\) is in the subspace containing \(\mathbf{y}\), as it is also orthogonal to the subspace containing \(\mathbf{v}_1\). We started off with the full vector space containing \(\mathbf{v}\) and found our first eigenvector in the subspace spanned by \(\mathbf{v}_1\). We expect \(n\) vectors in total to form a basis for the full vector space, and since all remaining vectors are in the subsequent subspace orthogonal to the one spanned by \(\mathbf{v}_1\), \(\mathbf{v}_2\) will be orthogonal to \(\mathbf{v}_1\). Applying the same logic until we reach \(\mathbf{v}_n\), we conclude that all of the eigenvectors of the Hermitian \(\mathbf{A}\) are orthogonal to each other. Normalizing each one will net us an orthonormal basis of the vector space containing \(\mathbf{v}\). This means:
A Hermitian \(\mathbf{A}\) is guaranteed to be diagonalizable, with the resulting normalized eigenvectors forming an orthonormal basis for the vector space containing the vectors in which \(\mathbf{A}\) operates upon. Please also note that the eigenvalues for any Hermitian matrix can never be zero, since the corresponding eigenvectors form an orthogonal basis and therefore are linearly independent of each other.
The above result is also known as the "Spectral Theorem".
If the eigenvectors form an orthonormal basis, then \(\mathbf{Q} = [\mathbf{v}_1 \quad \mathbf{v}_2 \quad \dots \quad \mathbf{v}_n]\) is said to be unitary. That is:
\begin{align} \mathbf{Q}^{-1} &= \mathbf{Q}^H \\ \end{align}
Therefore, finding the inverse of \(\mathbf{Q}\) is much easier by just transposing and taking the complex conjugate, and for real \(\mathbf{Q}\), just a transpose. In all, we have, for Hermitian \(\mathbf{A}\), the following diagonalization:
\begin{align} \mathbf{A} &= \mathbf{Q} \mathbf{\Lambda} \mathbf{Q}^H \\ \end{align}
We are now ready to prove that \(\mathbf{A} > 0\) is equivalent to the eigenvalues of \(\mathbf{A}\) being all greater than zero.
For the forward direction, given that \(\mathbf{A} > 0\), starting with the definition:
\begin{align} 0 &< \mathbf{v}^H \mathbf{A} \mathbf{v}\\ &= \mathbf{v}^H \mathbf{Q} \mathbf{\Lambda} \mathbf{Q}^H \mathbf{v}\\ &= (\mathbf{Q}^H \mathbf{v})^H \mathbf{\Lambda} (\mathbf{Q}^H \mathbf{v}) \\ &= \mathbf{u}^H \mathbf{\Lambda} \mathbf{u} \\ \end{align}
Where we let \(\mathbf{u} = \mathbf{Q}^H \mathbf{v}\). Writing out the last expression, we simply get:
\begin{align} \mathbf{u}^H \mathbf{\Lambda} \mathbf{u} &= \sum_{k = 1}^n \lambda_k |u_k|^2 > 0\\ \end{align}
Here since the \(u_k\)'s are complex in general, the result is the square of their magnitudes, which by the way is equal to \(u_k u_k^*\). Since none of the \(\lambda_k\)'s are zero and \(u_k^2 \geq 0\) for any given complex \(u_k\) values, all \(\lambda_k\)'s must be greater than zero in order for the inequality to hold.
For the backwards direction, given that all eigenvalues of the Hermitian \(\mathbf{A}\) is greater than zero, we need to show that \(\mathbf{A} > 0\). To do this, we invoke the contrapositive statement, which is that given there exists at least one \(\mathbf{v}\) such that \(\mathbf{v}^H \mathbf{A} \mathbf{v} < 0\), then there is one eigenvalue such that it is less than zero.
The reason why we invoke the contrapositive is because showing "there exists" is much easier than showing "for all", as we just need to find an example in the former case.
All we have to do is to find the \(\mathbf{v}\) such that the corresponding \(\mathbf{u} = \mathbf{Q}^H \mathbf{v}\) is a column vector whose first element is 1 and rest are zeros. We can find it since \(\mathbf{Q}\) is invertible, and so we just work backwards from \(\mathbf{u}\). Hence:
\begin{align} 0 &>\mathbf{u}^H \mathbf{\Lambda} \mathbf{u} \\ \\ &= \sum_{k = 1}^n \lambda_k |u_k|^2 \\ \\ &= \lambda_1 1^2 \\ \\ &= \lambda_1 \end{align}
Q.E.D.
In MATLAB, there is a function "eig" that takes in a matrix \(\mathbf{A}\) and outputs two matrices: \(\mathbf{Q}\) and \(\mathbf{\Lambda}\), respectively. The diagonal entries of the second output are the eigenvalues of \(\mathbf{A}\) with the corresponding eigenvectors in the respective columns of \(\mathbf{Q}\). The eigenvectors are by default normalized such that they are unit vectors.
If we apply that to a symmetric matrix, then the resulting \(\mathbf{Q}\) will be unitary, that is, its columns form an orthonormal basis for the N-dimensional Euclidean space. Let's see an example in MATLAB:
% Start off with any 3 x 3 matrix: A = [1 2 3; -2 -1.5 4; 0 2 -3]; % Make a symmetric matrix out of it. % The "'" takes the Hermitian of A, which in this case is the % transpose of A B = A'*A; >> B = 5.0000 5.0000 -5.0000 5.0000 10.2500 -6.0000 -5.0000 -6.0000 34.0000 % Clearly, B is symmetric % We diagonalize B via "eig": [Q, D] = eig(B); >> Q = 0.8700 0.4549 -0.1901 -0.4911 0.8338 -0.2522 0.0438 0.3127 0.9488 >> D = 1.9263 0 0 0 10.7277 0 0 0 36.5960
Here, the eigenvalues of \(\mathbf{B}\) are 1.9263, 10.7277, and 36.5960. Therefore, \(\mathbf{B}\) is also positive-definite, i.e. \(\mathbf{B} > 0\).
Since \(\mathbf{B}\) is also symmetric, then \(\mathbf{Q}\) is a unitary matrix where \(\mathbf{Q}^{-1} = \mathbf{Q}^T\). The diagonalization of \(\mathbf{B}\) is \(\boxed{\mathbf{B} = \mathbf{Q} \mathbf{D} \mathbf{Q}^T}\):
>> B = 5.0000 5.0000 -5.0000 5.0000 10.2500 -6.0000 -5.0000 -6.0000 34.0000 >> Q*D*Q' ans = 5.0000 5.0000 -5.0000 5.0000 10.2500 -6.0000 -5.0000 -6.0000 34.0000
We can further show that the columns of \(\mathbf{Q}\) form an orthonormal basis by applying the Gram-Schimdt Process to it and see that it returns itself. Here is the process implemented in MATLAB:
function U = getOrthoNormalBasis(V) % Given a complex N by N full-rank matrix V containing the column % vectors spanning a vector space, return the equivalent orthonormal % basis in the output unitary matrix U. % We utilize the Gram-Schimdt process if abs(det(V)) < eps error('Input matrix is not invertible!'); end [~, numBasis] = size(V); U = V; U(:, 1) = U(:, 1) / norm(U(:, 1)); for k = 2:numBasis diffVec = 0; for j = 1:k-1 diffVec = diffVec + getProjection(V(:, k), U(:, j)); end U(:, k) = V(:, k) - diffVec; % Get the orthogonal vector U(:, k) = U(:, k) / norm(U(:, k)); % Normalize the orthogonal vector end end function [proj, t] = getProjection(v, u) % Helper function % Returns proj_u v, and the corresponding scalar multiplier t t = dot(u, v)/(norm(u))^2; proj = t*u; end
When we use it on \(\mathbf{Q}\):
>> Q Q = 0.8700 0.4549 -0.1901 -0.4911 0.8338 -0.2522 0.0438 0.3127 0.9488 >> M = getOrthoNormalBasis(Q) M = 0.8700 0.4549 -0.1901 -0.4911 0.8338 -0.2522 0.0438 0.3127 0.9488
We get the same matrix, as expected.
We can find linear algebra in all fields of science and engineering, and even in math, cross-collaboration with other areas, such as calculus, is common. For one such cross-collaboration, please take a look at baking a delicious pie, part 2!