Learn data science and machine learning by building real-world projects on Jovian

Assignment 1 - All About torch.Tensor

Deep Learning with PyTorch: Zero to GANs

About PyTorch

PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook's AI Research lab (FAIR). It is free and open-source software released under the Modified BSD license. Although the Python interface is more polished and the primary focus of development, PyTorch also has a C++ interface.

A number of pieces of Deep Learning software are built on top of PyTorch, including Uber's Pyro, HuggingFace's Transformers, and Catalyst.

PyTorch provides two high-level features:

Currently, PyTorch is competing against some renowned deep learning frameworks viz., Tensorflow, Apache MXNet, etc.

PyTorch tensors

PyTorch defines a class called Tensor (torch.Tensor) to store and operate on homogeneous multidimensional rectangular arrays of numbers. PyTorch Tensors are similar to NumPy Arrays, but can also be operated on a CUDA-capable Nvidia GPU. PyTorch supports various sub-types of Tensors.

This notebook is an attempt to explore some of the PyTorch functions which operates on tensors.

The functions explained in this notebook are:

  • torch.trace(input) → Tensor
  • torch.tril(input, diagonal=0, out=None) → Tensor
  • torch.tril_indices(row, col, offset=0, dtype=torch.long, device='cpu', layout=torch.strided) → Tensor
  • torch.addbmm(input, batch1, batch2, *, beta=1, alpha=1, out=None) → Tensor
  • torch.dot(input, tensor) → Tensor
In [1]:
# Import torch and other required modules
import torch

Function 1 - torch.trace(input) → Tensor

Returns the trace (i.e., sum of the elements of the diagonal) of the input 2-D matrix.

In [2]:
# Example 1 - working
x = torch.arange(34., 43.,).view(3, 3)
torch.trace(x)
Out[2]:
tensor(114.)
  1. creates a 3 * 3 tensor with values in the range 34 to 43 (43 excluded) and stores it in a variable x

  2. prints the trace of the tensor x

In [3]:
# Example 2 - working
torch.trace(torch.arange(404., 420.).view(4, 4))
Out[3]:
tensor(1646.)

One-liner code of torch.trace(input) with values from 404 to 420.

In [4]:
# Example 3 - breaking (to illustrate when it breaks)
x = torch.arange(0., 0.).view(0 , 0)
torch.trace(x)
# torch.trace(torch.arange(0., 0.).view(0 , 0))
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-4-eb39b5ba65d6> in <module> 1 # Example 3 - breaking (to illustrate when it breaks) 2 x = torch.arange(0., 0.).view(0 , 0) ----> 3 torch.trace(x) 4 # torch.trace(torch.arange(0., 0.).view(0 , 0)) RuntimeError: invalid argument 1: expected a matrix at /Users/distiller/project/conda/conda-bld/pytorch_1587428061935/work/aten/src/TH/generic/THTensorMoreMath.cpp:303

The above example fails because we are passing a matrix of size 0 * 0, which in other words, is not a matrix and torch.trace(input) expects input to be a matrix

Also, I have provided a one-liner code for the above example in comment because tracing is hard in one-liner.

Applications - Usually tensor.trace(input) is used to find the trace of a tensor which is in turn a matrix so, all the applications of finding trace of a matrix can be the applications of this function.

Some of them are listed here

Function 2 - torch.tril(input, diagonal=0, out=None) → Tensor

Returns the lower triangular part (i.e., elements on and below the diagonal) of the matrix and remaining elements of matrix are set to 0.

The argument diagonal controls which diagonal to consider.

  • If diagonal = 0, all elements on and below the main diagonal are retained.
  • A positive value includes just as many diagonals above the main diagonal.
  • A negative value excludes just as many diagonals below the main diagonal.
Parameters
  • input (Tensor) – the input tensor.
  • diagonal (int, optional) – the diagonal to consider
  • out (Tensor, optional) – the output tensor.
In [5]:
# Example 1 - working
x = torch.randn(3, 3)
torch.tril(x)
Out[5]:
tensor([[ 0.5205,  0.0000,  0.0000],
        [ 2.0953,  0.8648,  0.0000],
        [ 0.9816, -0.8634, -0.1185]])

All the elements on or below the main diagonal (i.e., lower triangular part of matrix) are printed as it is and remaining elements are set to 0.

In [6]:
# Example 2 - working
torch.tril(x, diagonal=-1)
Out[6]:
tensor([[ 0.0000,  0.0000,  0.0000],
        [ 2.0953,  0.0000,  0.0000],
        [ 0.9816, -0.8634,  0.0000]])

Same as above example except that in this example elements on main diagonal are also set to 0.

You can picture the main diagonal as X-axis of co-ordinate system and diagonal=-1 as line x=-1, all the elements on the line and to the left of the line will be retained while others will be set to 0.

In [7]:
# Example 3 - breaking (to illustrate when it breaks)
torch.tril(x, diagonal=1, out=y)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-7-8e4510f5e617> in <module> 1 # Example 3 - breaking (to illustrate when it breaks) ----> 2 torch.tril(x, diagonal=1, out=y) NameError: name 'y' is not defined

The argument out should have a pre-defined tensor as it's value.

Applications - This function can be used for filtering out input tensors based on main diagonal.

Note - Replacing l in torch.tril with u will give same result but for upper triangular matrix

Function 3 - torch.tril_indices(row, col, offset=0, dtype=torch.long, device='cpu', layout=torch.strided) → Tensor

Returns the indices of the lower triangular part of a row * col matrix in a 2 * N Tensor, where the first row contains row coordinates of all indices and the second row contains column coordinates. Indices are ordered based on rows and then columns.

The argument offset is same as argument diagonal of

torch.tril(input, diagonal=0, out=None) → Tensor

NOTE

When running on CUDA, row * col must be less than 2^59 to prevent overflow during calculation.

Parameters

  • row (int) – number of rows in the 2-D matrix.
  • col (int) – number of columns in the 2-D matrix.
  • offset (int) – diagonal offset from the main diagonal. Default: if not provided, 0.
  • dtype (torch.dtype, optional) – the desired data type of returned tensor. Default: if None, torch.long.
  • device (torch.device, optional) – the desired device of returned tensor. Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  • layout (torch.layout, optional) – currently only support torch.strided.
In [8]:
# Example 1 - working
y = torch.tril_indices(3, 3, 1)
y
Out[8]:
tensor([[0, 0, 1, 1, 1, 2, 2, 2],
        [0, 1, 0, 1, 2, 0, 1, 2]])

Read the above tensor vertically and you will get positions of retained elements of the lower triangular part of the matrix

Example, first position retained is (0,0), then, (0,1), then (1,0) and so on.

We can see that element at position (0,2) is set to 0. This is done because we have given offset=1

In [9]:
# Example 2 - working
torch.tril_indices(6, 4, 2, dtype=float)
Out[9]:
tensor([[0., 0., 0., 1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3., 4., 4., 4.,
         4., 5., 5., 5., 5.],
        [0., 1., 2., 0., 1., 2., 3., 0., 1., 2., 3., 0., 1., 2., 3., 0., 1., 2.,
         3., 0., 1., 2., 3.]], dtype=torch.float64)

Same as previous example just made this one a one-liner with data_type set to float64

In [10]:
# Example 3 - breaking (to illustrate when it breaks)
torch.tril_indices(4, 3, 2, float)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-10-27db1613a6d8> in <module> 1 # Example 3 - breaking (to illustrate when it breaks) ----> 2 torch.tril_indices(4, 3, 2, float) TypeError: tril_indices() takes from 2 to 3 positional arguments but 4 were given

The error shown is self-explanatory, we need to specify the optional arguments we are providing, i.e., by default torch.tril_indices(row, col, offset=0, dtype=torch.long, device='cpu', layout=torch.strided) → Tensor only have 2 or 3 positional arguments, provinding more than that wihout specifying will cause TypeError

Applications - This functions can be used to find out the indices of retained elements for easily accessing the value of the retained elements.

Note - Replacing l in torch.tril_indices with u will give same result but for upper triangular matrix

Function 4 - torch.addbmm(input, batch1, batch2, *, beta=1, alpha=1, out=None) → Tensor

Performs a batch matrix-matrix product of matrices stored in batch1 and batch2, with a reduced add step (all matrix multiplications get accumulated along the first dimension). input is added to the final result.

batch1 and batch2 must be 3-D tensors each containing the same number of matrices.

If batch1 is a (b×n×m) tensor, batch2 is a (b×m×p) tensor, input must be broadcastable with a (n×p) tensor and out will be a (n×p) tensor.

               b-1
out= βinput + α(∑ batch1 @ batch2 )
               i=0      i        i

For inputs of type FloatTensor or DoubleTensor, arguments beta and alpha must be real numbers, otherwise they should be integers.

Parameters

  • input (Tensor) – matrix to be added
  • batch1 (Tensor) – the first batch of matrices to be multiplied
  • batch2 (Tensor) – the second batch of matrices to be multiplied
  • beta (Number, optional) – multiplier for input (β)
  • alpha (Number, optional) – multiplier for batch1 @ batch2 (α)
  • out (Tensor, optional) – the output tensor.
In [11]:
# Example 1 - working
ip = torch.randn(3, 6)
batch1 = torch.randn(20, 3, 5)
batch2 = torch.randn(20, 5, 6)
torch.addbmm(ip, batch1, batch2)
Out[11]:
tensor([[  3.9890,  -5.1335,  -6.9768,  -4.6171,   0.8772, -10.7906],
        [ -1.0072, -21.3228,   5.4748, -21.9994,   2.9079,  -9.0580],
        [ 13.9873,   8.4095,   9.0454,  -1.1092,  -3.1338, -17.1364]])

ip is the input matrix batch1 and batch2 contains batches of 20 matrices each with 3 * 5 and 5 * 6 dimensions repectively. These matrices in the batches are filled with random values.

The function does matrix multiplication of each matrix from batch1 and respective matrix from batch2 and then adds the ip to the result and outputs it

In [12]:
# Example 2 - working
ip = torch.arange(1., 5.).view(2,2)
batch1 = torch.randn(20, 2, 5)
batch2 = torch.randn(20, 5, 2)
torch.addbmm(ip, batch1, batch2, alpha=2, beta=2)
Out[12]:
tensor([[  6.5363,   3.8822],
        [-12.8039,  24.2070]])

Same as previous example. The changes made are ip matrix now contains values from 1 to 5 (excluding 5) as a 2 * 2 matrix and the function is given α = β = 2

In [13]:
# Example 3 - breaking (to illustrate when it breaks)
ip = torch.arange(1., 5.)
batch1 = torch.randn(20, 2, 5)
batch2 = torch.randn(20, 5, 2)
torch.addbmm(ip, batch1, batch2, alpha=3, beta=2)
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-13-63071ffa5567> in <module> 3 batch1 = torch.randn(20, 2, 5) 4 batch2 = torch.randn(20, 5, 2) ----> 5 torch.addbmm(ip, batch1, batch2, alpha=3, beta=2) RuntimeError: The expanded size of the tensor (2) must match the existing size (4) at non-singleton dimension 1. Target sizes: [2, 2]. Tensor sizes: [4]

The ip matrix is a linear matrix while the mattrix obtained as a result of matrix multiplication of matrices in batch 1 and batch 2 is a 2 * 2 matrix.

Due to the difference in dimensions of matrices to be added the example shows an error.

In [14]:
# Example 4 - breaking (to illustrate when it breaks)
ip = torch.arange(1., 5.).view(2, 2)
batch1 = torch.randn(20, 2, 5)
batch2 = torch.randn(20, 5, 6)
torch.addbmm(ip, batch1, batch2, alpha=3, beta=2)
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-14-342b3b13e72d> in <module> 3 batch1 = torch.randn(20, 2, 5) 4 batch2 = torch.randn(20, 5, 6) ----> 5 torch.addbmm(ip, batch1, batch2, alpha=3, beta=2) RuntimeError: The expanded size of the tensor (6) must match the existing size (2) at non-singleton dimension 1. Target sizes: [2, 6]. Tensor sizes: [2, 2]

The matrices in batch 1 are of dimension 2 * 5 and in batch 2 are of 5 * 6 and matrix multiplication needs the col value of first matrix to be equal to row value of second matrix.

Due to the difference in value of col of first matrix and value of row of second matrix the example shows an error.

In [15]:
# Example 5 - breaking (to illustrate when it breaks)
ip = torch.arange(1., 5.).view(2, 2)
batch1 = torch.randn(20, 2, 5)
batch2 = torch.randn(120, 5, 2)
torch.addbmm(ip, batch1, batch2, alpha=3, beta=2)
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-15-698a4e4ed66e> in <module> 3 batch1 = torch.randn(20, 2, 5) 4 batch2 = torch.randn(120, 5, 2) ----> 5 torch.addbmm(ip, batch1, batch2, alpha=3, beta=2) RuntimeError: invalid argument 2: equal number of batches expected, got 20, 120 at /Users/distiller/project/conda/conda-bld/pytorch_1587428061935/work/aten/src/TH/generic/THTensorMath.cpp:342

The no. of matrices in each batch is not same so, the matrix multiplication is not possible, and hence we see the error.

Application - It seems to be a use specific function and is used when you need to do addition of a matrix to result of matrix multiplication of matrices in 2 or more batches.

Function 5 - torch.dot(input, tensor) → Tensor

Computes the dot product (inner product) of two tensors.

NOTE

This function does not broadcast.

In [16]:
x = torch.tensor([3, 5])
y = torch.tensor([2, 7])
torch.dot(x, y)
Out[16]:
tensor(41)

Calculates dot product of 2 input tensors - x and y

In [17]:
# Example 2 - working
torch.dot(torch.tensor([12, 15]), torch.tensor([20, 10]))
Out[17]:
tensor(390)

One-liner for calculating dot product of 2 tensors.

In [18]:
# Example 3 - breaking (to illustrate when it breaks)
torch.dot(torch.tensor([7, 3, 5]), torch.tensor([9, 3]))
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-18-0e09f76c4688> in <module> 1 # Example 3 - breaking (to illustrate when it breaks) ----> 2 torch.dot(torch.tensor([7, 3, 5]), torch.tensor([9, 3])) RuntimeError: inconsistent tensor size, expected tensor [3] and src [2] to have the same number of elements, but got 3 and 2 elements respectively

The error shown is pretty clear in itself, both the tensors needs to be of same size to calculate there dot product.

Applications - Used to calculate dot product of 2 tensors.

Conclusion

5 of the many different functions availablein PyTorch were covered in this notebook. 4 of them were related to matrices and last one was related to dot product.

Reference Links

Provide links to your references and other interesting articles about tensors

In [19]:
!pip install jovian --upgrade --quiet
In [20]:
import jovian
In [ ]:
jovian.commit()
[jovian] Attempting to save notebook..
In [ ]: