Learn data science and machine learning by building real-world projects on Jovian

Assignment 1

A sequential series of interesting functions

PyTorch is a machine learning module for Python. Developed primarily by Facebook for the Python language, it also has C++ interoperability. The library stresses functionality with NumPy and an open source system for deep learning using tape based autodiff.

This document contains a highlight of functions I found of interest on first contact with the library, and will be added to as knowledge of data science grows.

  • f1: from_numpy
  • f2: logspace
  • f3: poisson
  • f4: topk
  • f5: lerp
In [1]:
# Import torch and other required modules
import numpy as np
import torch
## Function 1 - torch.fromnumpy

Demonstrating the interopability of the library, this function converts numpy arrays and matrices to Torch tensors. The expected input is a numpy.ndarray (n-dimensional array) and the output will share the same memory space.
In [2]:
# Example 1 - instantiation
n_array = np.array([0,3,5,7,9])
t_tensor = torch.from_numpy(n_array)
t_tensor
Out[2]:
tensor([0, 3, 5, 7, 9])

As can be seen, the output maintains the same dimensions, size, and contents, but has become a tensor.

In [5]:
# Example 2 - dimensions and editing
n2 = np.array([[0,2],[3,4]])
t2 = torch.from_numpy(n2)
print(t2)
t2[0,0] = 1
t2
tensor([[0, 2], [3, 4]])
Out[5]:
tensor([[1, 2],
        [3, 4]])

From this, multidimensional arrays transform just as well, and can be edited using Python standard slice operations for multidimensional lists. This shows the object acessability of the language and module.

In [6]:
# Example 3 - Errors
n3 = {3:4,5:6,7:8}
t3 = torch.from_numpy(n3)
t3
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-6-0ad81429f274> in <module> 1 # Example 3 - Errors 2 n3 = {3:4,5:6,7:8} ----> 3 t3 = torch.from_numpy(n3) 4 t3 TypeError: expected np.ndarray (got dict)

If the input is not in the expected format (in this instance a dictionary rather than a np.ndarray) then the function will throw an error.

This function should be used when importing data which has already been parsed into a NumPy ndarray format. CSV's, and raw data not in standard form should be first converted before imputting using this function.

## Function 2 - torch.logspace

The function takes the arguments: start, end, steps=100, base=10.0, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False

and outputs a 1d tensor of (step) points, logarithmically spread in a preset base, and in a given span. The defaults are shown above, but output can be tailored to specific datatypes, layout, and can be set to requires_grad for usage in gradient functions. For multi-processor or hybrid operations, the tensor can be defaulted to a particular runtime device.
In [8]:
# Example 1 - default, short
tl1 = torch.logspace(start = 0, end = 10, steps = 10)
tl1
Out[8]:
tensor([1.0000e+00, 1.2915e+01, 1.6681e+02, 2.1544e+03, 2.7826e+04, 3.5938e+05,
        4.6416e+06, 5.9948e+07, 7.7426e+08, 1.0000e+10])

As shown from the output, the tensor has dimensions 1x10, and comprises of evenly spread logs of base 10.

In [9]:
# Example 2 - other bases
tl2 = torch.logspace(start=-2, end = 8, steps = 20, base = 2.0)
tl2
Out[9]:
tensor([2.5000e-01, 3.6006e-01, 5.1858e-01, 7.4688e-01, 1.0757e+00, 1.5493e+00,
        2.2313e+00, 3.2136e+00, 4.6284e+00, 6.6661e+00, 9.6008e+00, 1.3828e+01,
        1.9915e+01, 2.8683e+01, 4.1310e+01, 5.9497e+01, 8.5690e+01, 1.2341e+02,
        1.7775e+02, 2.5600e+02])

Other bases are supported, base 2 being an obvious example for binary systems.

In [12]:
# Example 3 - errors
tl3 = torch.logspace(start=1, end=-3, steps = 3, base = np.sqrt(-1))
tl3
/srv/conda/envs/notebook/lib/python3.7/site-packages/ipykernel_launcher.py:2: RuntimeWarning: invalid value encountered in sqrt
Out[12]:
tensor([nan, nan, nan])

Input values must be real rather than imaginary, or errors are thrown, and invalid tensors outputted.

This function could be used to create spans of values to use as learning rate to adjust the regression functions during the stochastic gradient descent process. A range of small values would be required for ascertaining the efficiency of the process.

Function 3 - torch.poisson

The poisson function modifies input tensors to a tensor of the same size, with each element sampled from a poisson distribution defined by the element itself.

In [15]:
# Example 1 - prebuilt
a1 = np.array([[0,0.1],[1.2,1.3],[2.4,2.5]])
ttemp = torch.from_numpy(a1)
print(ttemp)
p1 = torch.poisson(ttemp)
p1
tensor([[0.0000, 0.1000], [1.2000, 1.3000], [2.4000, 2.5000]], dtype=torch.float64)
Out[15]:
tensor([[0., 0.],
        [2., 0.],
        [4., 1.]], dtype=torch.float64)

Dimensions are retained, but the elements used as rates to determine the generation of the poisson sample.

In [17]:
# Example 2 - random
ran1 = torch.rand(9,9)*4 # randomly generate rate paramenters between 0 and 4
print(ran1)
p2 = torch.poisson(ran1)
p2
tensor([[3.6803, 3.8725, 3.6442, 3.7891, 3.1042, 3.1249, 3.9296, 2.5792, 2.3546], [3.9961, 2.1462, 3.9884, 0.2278, 0.5966, 0.6280, 2.0477, 1.9546, 3.0229], [2.5418, 3.3211, 3.7102, 3.1386, 1.0724, 1.9750, 0.2267, 2.9545, 2.2982], [0.6873, 3.7455, 2.0195, 2.0759, 3.2619, 0.2647, 3.3168, 0.1077, 3.6809], [2.1439, 1.9245, 3.3398, 3.0691, 0.9339, 0.9902, 1.2317, 3.1335, 2.3144], [3.7257, 2.1171, 1.0752, 3.7600, 1.6500, 2.5766, 1.9876, 0.1398, 2.5168], [1.4570, 0.3975, 0.6329, 3.5321, 1.8946, 0.9837, 2.1877, 2.2225, 3.8710], [1.4682, 1.7233, 3.1722, 2.8318, 0.6266, 3.5368, 0.7054, 3.4038, 1.7050], [3.8963, 0.7931, 1.8661, 0.2279, 3.4955, 0.7878, 1.4015, 1.7845, 2.4964]])
Out[17]:
tensor([[2., 1., 1., 0., 9., 4., 7., 0., 5.],
        [2., 2., 3., 0., 2., 4., 6., 3., 3.],
        [5., 3., 3., 3., 1., 2., 0., 3., 1.],
        [3., 5., 2., 2., 1., 1., 3., 0., 4.],
        [2., 3., 8., 2., 1., 0., 1., 8., 5.],
        [2., 1., 2., 6., 1., 1., 3., 0., 3.],
        [2., 0., 3., 3., 4., 0., 3., 1., 2.],
        [3., 3., 1., 1., 0., 5., 2., 4., 4.],
        [3., 0., 4., 1., 4., 1., 2., 2., 1.]])

Works for square and skewed sets, and for random data as well. The output size is always in line with the original.

In [19]:
# Example 3 - errors
t4 = torch.tensor([[1,2],[3,4],[5,6]])
print(t4)
p3 = torch.poisson(t4)
p3
tensor([[1, 2], [3, 4], [5, 6]])
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-19-b3807fc00697> in <module> 2 t4 = torch.tensor([[1,2],[3,4],[5,6]]) 3 print(t4) ----> 4 p3 = torch.poisson(t4) 5 p3 RuntimeError: "poisson_cpu" not implemented for 'Long'

Some data implementations require correctly configured hardware to use, or must be run on GPU rather than CPU setups.

If sufficient metadata is understood about input datasets, a specific distribution (such as python) might be used for the randomly generated aspects of the model. This can speed up the training process, or optimise the data flow for particular hardware constraints.

Function 4 - torch.topk

Takes the following arguments: input, k, dim=None, largest=True, sorted=True, out=None -> Tensor, LongTensor

For a given input tensor will output the (k) top elements, sorted, starting with either the smallest or largest, along a given dimension of the input. If none is specified it will read from the final. The output also contains the positions of these values in the original list.

In [21]:
# Example 1 - simple
tn1 = torch.tensor([1,2,3,4,9,8,7,6,5])
k1 = torch.topk(tn1, 2)
k1
Out[21]:
torch.return_types.topk(
values=tensor([9, 8]),
indices=tensor([4, 5]))

The two greatest values are outputted, along with their location as indeces of the original tensor.

In [33]:
# Example 2 - dimension specific
tn2 = torch.tensor([[[1,2,3,4],[5,6,7,8]],[[9,10,11,12],[13,14,15,16]]])
print(tn2)
k2 = torch.topk(tn2,1,-2)
k2
tensor([[[ 1, 2, 3, 4], [ 5, 6, 7, 8]], [[ 9, 10, 11, 12], [13, 14, 15, 16]]])
Out[33]:
torch.return_types.topk(
values=tensor([[[ 5,  6,  7,  8]],

        [[13, 14, 15, 16]]]),
indices=tensor([[[1, 1, 1, 1]],

        [[1, 1, 1, 1]]]))

For multidimensional tensors, the dimension read along can be specified, and the output will retain that slice of the initial input. In this instance, the largest single dimension of the top and bottom halves were retained.

In [34]:
# Example 3 - errors
tn3 = torch.tensor([1,2,3,4])
k3 = torch.topk(tn3,1,2)
k3
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-34-dc153506a78d> in <module> 1 # Example 3 - errors 2 tn3 = torch.tensor([1,2,3,4]) ----> 3 k3 = torch.topk(tn3,1,2) 4 k3 IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 2)

The specified dimension must be within the range of the input tensor.

This function could be used equally to pull out highlights of data for presentation and display, or to focus particular slices or epochs of the training process for different treatment. Say for example the top values of a given array were particularly relevant to outstanding or unusual conditions within the dataset, or produced particularly desirable results.

Function 5 - torch.lerp

This function takes the arguments: input, end, weight, out=None

It is used to run a linear interpolation of given input and endpoints for a given weight, and return a new dataset.

In [38]:
# Example 1 - single dimensional
s1 = torch.arange(1.,10.)
print(s1)
e1 = torch.empty(9).fill_(11)
print(e1)
l1 = torch.lerp(s1,e1,0.05)
l1
tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.]) tensor([11., 11., 11., 11., 11., 11., 11., 11., 11.])
Out[38]:
tensor([1.5000, 2.4500, 3.4000, 4.3500, 5.3000, 6.2500, 7.2000, 8.1500, 9.1000])

The new data fits the weighted relation between the start and end points.

In [39]:
# Example 2 - randomised multidimensional
s2 = torch.rand(3,3) * 3 #random generation of a 3x3 array between 0-3
print(s2)
e2 = torch.empty(3,3).fill_(4)
print(e2)
l2 = torch.lerp(s2,e2,0.8)
l2
tensor([[2.1147, 0.1992, 1.3576], [2.7839, 1.0169, 0.1321], [2.6306, 1.3754, 0.0152]]) tensor([[4., 4., 4.], [4., 4., 4.], [4., 4., 4.]])
Out[39]:
tensor([[3.6229, 3.2398, 3.4715],
        [3.7568, 3.4034, 3.2264],
        [3.7261, 3.4751, 3.2030]])

Multidimensional data is also accepted, and the weighting can be changed to fit the model utilised.

In [40]:
# Example 3 - errors
s3 = torch.arange(1,6)
e3 = torch.rand(2,5)*2
l3 = torch.lerp(s3,e3,0.5)
l3
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-40-d32fb474efb1> in <module> 2 s3 = torch.arange(1,6) 3 e3 = torch.rand(2,5)*2 ----> 4 l3 = torch.lerp(s3,e3,0.5) 5 l3 RuntimeError: expected dtype long int for `end` but got dtype float

This example should throw two errors:

  1. The data types of the two tensors must match in order for the function to run.
  2. The dimension size of the two tensors must match in order for the function to run.

This function can be used when a model as been generated in order to expand the dataset and for the purposes of testing. Interpolated data can be produced, and its real-world equivalent checked for the discern the accuracy of the modelling.

Conclusion

In this notebook a quick look was taken at some of the basic functions of the PyTorch library, in a a set of inter-relating functionality. In this way it can be seen the range of options available for the generation and manipulation of tensors and data objects within the package.

To progress greater understanding of the datascience behind the functions must be researched, and hands on experience with the package and its functionalities is needed to fully appreciate their usage.

Detailed below are some of the links I found interesting during my learning.

In [17]:
!pip install jovian --upgrade --quiet
In [18]:
import jovian
In [ ]:
jovian.commit()
[jovian] Attempting to save notebook..
In [ ]: