Blog Posts

Linear Algebra for Natural Language Processing

Blog: Think Data Analytics Blog

The field of Natural Language Processing involves building techniques to process text in natural language by people like you and me, and extract insights from it for performing a variety of tasks from interpreting user queries on search engines and returning web pages, to solving customer queries as chatbot assistant.

The importance of representing every word into a form that captures the meaning of the word and the overall context becomes crucial especially when major decisions are based upon the insights extracted from text on a large scale — like forecasting stock price change with social media.

In this article, we’ll begin with the basics of linear algebra to get an intuition of sof vectors and their significance for representing specific types of information, the different ways of representing text in vector space, and how the concept has evolved to the state of the art models we have now.

We’ll step through the following areas –

Unit vectors in our coordinate system

i-> Denotes a unit vector (vector of length 1 unit) pointing in the x-direction

j -> Denotes a unit vector in the y-direction

Together, they are called the basis of our coordinate vector space.

We’ll come to the term basis more in the subsequent parts below.

Standard Unit vectors — Image by Author

A vector in 2D X-Y space — Image by Author

Linear Combination of 2 vectors

If u & v are two vectors in a 2 dimensional space,then their linear combination resulting into a vector l is represented by –

l = x1. u + x2. v

The above expression of linear combination is equivalent to the following linear system –

Bx = l

Where B denotes a matrix whose columns are u and v.

Let’s understand this by an example below with vectors u & v in a 2 dimensional space –

# Vectors u & v
# The vectors are 3D, we'll only use 2 dimensions
u_vec = np.array([1, -1, 0])
v_vec = np.array([0, 1, -1])# Vector x
x_vec = np.array([1.5, 2])# Plotting them
# fetch coords from first 2 dimensions
data = np.vstack((u_vec, v_vec))[:,:2]
origin = np.array([[0, 0, 0], [0, 0, 0]])[:,:2]
QV = plt.quiver(origin[:,0],origin[:,1], data[:, 0], data[:, 1], color=['black', 'green'], angles='xy', scale_units='xy', scale=1.)

Linear Combination of vectors — Image by Author

We can also understand it from this explanation for a similar example with 3 dimensions given by a Professor in his notes here –

Linear Algebra, Chapter 1- Introduction to Vectors, MIT

Taking the 3 vectors from the example in the image and plotting them in 3D space (The units of axes are different than the vectors in the plot)

u_vec = np.array([1, -1, 0])
v_vec = np.array([0, 1, -1])
w_vec = np.array([0, 0, 1])data = np.vstack((u_vec, v_vec, w_vec))
origin = np.array([[0, 0, 0], [0, 0, 0], [0, 0, 0]])
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.quiver(origin[:,0],origin[:,1], origin[:,2], data[:, 0], data[:, 1], data[:,2])ax.set_xlim([-1, 2])
ax.set_ylim([-1, 2])
ax.set_zlim([-1, 2])

Vectors in 3D space — Image by Author


i.e span( ab) = R² (all vectors in 2D space) , provided they are not collinear.


Collinearity is the case when we have p different predictor variables but some of them are linear combinations of others, so they don’t add any other information. 2 collinear vectors / variables will have correlation close to +/- 1 and can be detected by their correlation matrix.

Multicollinearity exists when more than 2 vectors are collinear and any pair of vectors may not necessarily have high correlation.

Linear Independence

We say that v1 , v2, . . . , vn are linearly independent, if none of them is

a linear combination of the others. This is equivalent to saying

that x1.v1 + x2.v2 + . . . + = 0 implies x1 = x2 = . . . = xn = 0

Since collinear vectors can be expressed as linear combinations of each other, they are linearly dependent.


A basis is that set of linearly independent vectors that span that space.

We call these vectors as basis vectors

Vector space Models in NLP

Vector space is a set V of vectors, where two operations — vector addition and scalar multiplication are defined. For E.g. IF two vectors u & v are in space V, then their sum, w = u + v will also lie in the vector space V.

A 2D vector space is a set of linearly independent vasis vectors with 2 axes.

Each axis represents a dimension in the vector space.

Recalling the previous plot of vector a = (3,5) = 3 i + 5 j again. This vector is represented on a 2D space with 2 linearly independent basis vectors — X & Y, who also represent the 2 axes as well as the 2 dimenions of the space.

3 & 5 here are the x,y components of this vector for representation on the X-Y 2D space.

Vector in 2D X-Y plane — Image by Author

Vector space model in NLP

A vector space model is a representation of text in vector space.

Here, each word in a corpus is a linearly independent basis vector and each basis vector represents an axis in the vector space.

For 3 words, we’ll have a 3D vector model represented like this –

Vector Space Model for 3 words — Image by Author

The table above the graph represents the TF-IDF incident matrix.

D1 = (0.91, 0, 0.0011) represents a document vector in the 3 axes — good, house, car. Similarly, we have D2 & D3 document vectors.

How does representation in vector space help us, though?

For eg. For a search token ‘buy’ , we would want to get all the documents containing different forms of this word — buying, bought and even synonyms of the word ‘buy’. Such documents can not be captured from other rudimentary methods representing documents as Binary incident matrix.

This is achieved through distance metrics like cosine similarity between vectors of document & query, where the documents closer to the query are ranked the highest.

Dense vectors


This article introduced the concept of vector space based on linear algebra and highlighted the related concepts as part of their application in Natural Language Processing for representing text documents in semantics representation and extraction tasks.

The applications of word embeddings have been extended to more advanced and wide applications with much improved performance than earlier.

You can also refer to the source code and run on colab by accessing my Git repository here.

Hope you enjoyed reading it. 🙂


Original Source

The post Linear Algebra for Natural Language Processing appeared first on Big Data, Data Analytics, IOT, Software Testing, Blockchain, Data Lake – Submit Your Guest Post.

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples