比較兩本熱門線性代數：Gilbert Strang vs. Theodore Shifrin

5 min readJan 28, 2024

大學的時候，我使用的是 Linear Algebra: A Geometric Approach, Theodore Shifrin and Malcomlm R. Adams。但因為多年來在讀電腦視覺、資料分析與paper中，常提到奇異值分解與正定矩陣，而此時文中只會一兩句話帶過。我對於他們只知道使用時機，缺乏觀念上理解，所以我再找一本資訊領域比較主流的教科書Linear Algebra and Its Applications, Gilbert Strang，補足缺乏的概念。而現在終於讀完可以跟大家分享我對這兩本書的比較與個人觀點。Jump to English translation

Shifrin 的幾何視角

Shifrin 特點如同封面所說的幾何方式介紹線性代數，進入特徵向量與特徵值章節之前會感受強烈，例如，線性生成空間（span）、基底（basis）跟集合從頭到尾都會提到。可以看到零點垂直於超空間、線垂直於超平面的描述會反覆提到。除此之外，他有提到抽象向量空間，如果函式是向量，函式的內積是積分，這抽象概念讓事情變得簡單好記，往後在看訊號處理課本能不受困於長積分算式。

Strang 的實用觀點

Strang 特點在於強調現實中電腦的運算速度和精度問題。以速度舉例來說，高斯消去法是O(n³)，但是對於三對角矩陣，它的時間複雜度是 O(n²)。強調這點，看某些影像處理題目上就能一眼明白使用特殊限制條件建構矩陣，讓問題變成三對角矩陣的目的是在於加速。另外電腦的精度問題，因為電腦的變數有效位數是有限的，所以怎麼運算的順序會影響到答案的穩定性，例如：A=[1, 1; 1, 1.001]與 B= [.001, 1; 1, 1]，這兩個矩陣在計算高斯消去法的時候，會因為乘法精度而得到什麼樣的結果。如果有需要分析對於開發與改良 solver 是攸關重要的。讀 Shifrin 的書會不了解為什麼存在這麼多方法解決同樣的問題，但讀 Strang 的書會知道這是基於速度跟精度考量。Strang 多介紹了正定矩陣（positive definite matrix）與奇異值分解（singular value decomposition）可以了解二次型（quadratic form）、最小平方法的數值穩定解還有在未來跟主成分分析（Principle component analysis）有好的連結。在接觸其他領域課本前，最好先認識奇異值分解，因為其他領域課本會假設讀者已經事先理解過，如果沒有的話，會有常常看到，但卻它擦肩而過認識它的本質。

讀後感受的差異

Strang一開始說明線性代數的兩大問題是Ax=b跟Ax = λx，學會解這兩個問題很有用，而線性代數很多值得欣賞的地方；Shifrin像是在介紹線性代數核心概念是什麼，從向量、矩陣、空間到基底與投影，看到第三章之前都不會知道它們可以做什麼，而是學著用它們的思考算式。

以對角化章節為例說明差異性

兩本書證明與舉例方式的差別很大，以對角化的章節為例子，Strang 的書中是這樣描述的：

“Suppose the n by n matrix A has n linearly independent eigenvectors. If these eigenvectors are columns of a matrix S, then S^(-1)AS is a diagonal matrix L. The eigenvalues of A are on the diagonal of L.”

證明之後，引導讀者看可以觀察到的現象

“Diagonalizability of A depends on enough eigenvectors.”

有些時候我們會得到不足夠的特徵向量，也就是相依的特徵向量，接著Strang給了一些不可對角化的矩陣，計算它們的特徵值與特徵向量。然而Shifrin的書中在可以對角化章節的敘述：

“Theorem 2.1. Let T: V⭢V be a linear transformation. Suppose v1, …, vk are eigenvectors of T with distinct corresponding eigenvalues λ1, …, λk. Then {v1, …, vk} is a linearly independent set of vectors.”

接著定理產生直接推論

“Corollary 2.2. Suppose V is an n-dimensional vector space and T: V⭢V has n distinct (real) eigenvalues. Then T is diagonalizable.”

Shifrin 在描寫對角化時多引入線性變換 T: V⭢V 的概念，也就是T由基底構成，並是可對角化的。在概念上綁定兩者。Shifrin 的證明則由 Proposition, Lemma, Thereom, Corollary 逐步構成，證明是完整的，但不會有多餘的例子；Strang 會提供很多的Remark觀察現象，證明會是點到為止的，例如不使用數學歸納法，比較少看到若且唯若（if and only if）的描述。

個人建議

推薦像是線性代數的基礎科目，一次讀多本課本才能互補概念。Shifrin提供較多空間、基底和集合加強線性代數間概念連結還有完整的證明；Strang則給予具體引導性的例子、數值分析還有特徵向量與特徵值章節。我的個人偏好是主要看Shifrin的因為頁數較少（372頁 vs. 487頁）重點集中快速抓大觀念，以Strang為輔助補足特徵向量與特徵值章節。

Comparing Two Popular Linear Algebra Textbooks: Gilbert Strang vs. Theodore Shifrin

In college, I used Linear Algebra: A Geometric Approach by Theodore Shifrin and Malcomlm R. Adams. However, over the years while reading computer vision, data analysis, and research papers, singular value decomposition (SVD) and positive definite matrices were often mentioned, with just one or two sentences explaining them. I only knew when to use them, lacking conceptual understanding. Therefore, I found another more mainstream textbook in information fields, Linear Algebra and Its Applications by Gilbert Strang, to fill in the missing concepts. Now finally having read both books, I can share my comparison and personal perspectives on them.

Shifrin’s Geometric Perspective

As the cover states, Shifrin features a geometric approach to introducing linear algebra. This is strongly felt before reaching the sections on characteristic vectors and values. Concepts like span, bases, and sets are mentioned throughout. Descriptions repeatedly highlight ideas like the zero vector being perpendicular to subspaces and lines being perpendicular to hyperplanes. Additionally, he brings up the abstract vector space idea that if a function is a vector, the inner product of functions is an integral. This abstract concept simplifies things and aids memory, allowing easier comprehension of subsequent signal processing textbooks without getting bogged down in long calculus derivations.

Strang’s Practical Perspective

Strang emphasizes real-world computer computational speed and precision issues. Regarding speed, for example, Gaussian elimination is O(n³), but for tridiagonal matrices, the time complexity is O(n²). Highlighting this, when looking at certain image processing problems, one can immediately discern why constructing matrices with special restrictive conditions, making the problem into a tridiagonal matrix, is for acceleration. Additionally, computer precision issues arise because computer variables have a finite number of significant figures. Therefore, the computation order impacts answer stability. For example, with A = [1, 1; 1, 1.001] and B = [.001, 1; 1, 1], these two matrices will yield different results when computing Gaussian elimination due to multiplication precision. Analyzing such issues is vitally important for solver development and improvement. Reading Shifrin’s book, one will not understand why so many methods exist to solve the same problem. In contrast, reading Strang reveals these variations are based on computational speed and precision considerations. Strang introduces more on positive definite matrices and singular value decomposition (SVD). This allows grasping concepts like quadratic forms, numerically stable least squares solutions, and good connections to future principal component analysis (PCA). Before touching textbooks in other fields, it is best to first understand SVD, as other texts will assume prior reader familiarity. Otherwise, one will often encounter but skip over recognizing its essence.

Differing Reading Impressions

Strang starts by stating the two main linear algebra problems are Ax = b and Ax = λx. Solving these is very useful, and there are many appreciable aspects of linear algebra. Shifrin seems to introduce what the core linear algebra concepts are, spanning vectors, matrices, spaces, bases, and projections. Up to Chapter 3, one will not know what they can do and instead learns to think about formulas using these concepts.

Contrasting Differences Through the Diagonalization Section

The two books have greatly divergent proof and example formats. To illustrate, in Strang’s diagonalization section, he states:

After proving this, he guides readers to observe the phenomenon:

“Diagonalizability of A depends on enough eigenvectors.”

Sometimes we obtain inadequate characteristic vectors, which are dependent characteristic vectors. Strang then provides some non-diagonalizable matrices, calculating their characteristic values and vectors. However, Shifrin’s diagonalization section states:

“Theorem 2.1. Let T:V→V be a linear transformation. Suppose v1, …, vk are eigenvectors of T with distinct corresponding eigenvalues lambda1, …, lambdak . Then {v1, …, vk} is a linearly independent set of vectors.”

This theorem then directly implies:

“Corollary 2.2. Suppose V is an n-dimensional vector space and T: V→V has n distinct (real) eigenvalues. Then T is diagonalizable.”

In describing diagonalization, Shifrin introduces more concepts of the linear transformation T: V→V. That is, T is comprised of bases and is diagonalizable. He conceptually ties the two together. Shifrin’s proofs build incrementally from propositions, lemmas, theorems, and corollaries, with complete proofs but without extraneous examples. Strang provides many insightful remarks, with his proofs being just enough to convey the essential ideas, omitting aspects like mathematical induction and rarely depicting “if and only if” relationships.

Personal Recommendations

I recommend that foundational topics like linear algebra should be studied from multiple textbooks to complementarily fill conceptual gaps. Shifrin provides more reinforcement of connections between concepts like spaces, bases, and sets, with complete proofs. Strang offers concrete instructive examples, numerical analysis, and fuller treatments of characteristic vectors and values sections. I personally prefer primarily reading Shifrin as the page count is lower (372 vs. 487 pages) with focused and efficient coverage to grasp key concepts, using Strang as a supplement particularly for the eigenvalue and eigenvectors section.