Deep Learning/Pytorch

[Pytorch] Deep Learning Frameworks

jstar0525 2022. 1. 13. 17:29
반응형

* 참고 : 이 글은 colab의 .ipynb 파일을 통하여 실험하였습니다.

 

1. Deep Learning Basics

Deep Learning

  • Learn hierarchical representations and the prediction model simultaneously, that are optimal for given task.
  • Deep in the sense that it has multiple levels of non-linear feature transformations.

Artificial Neural Networks

  • Systems of interconnected neurons which learns parameters that non-linearly converts some inputs to an output.
  • Each neuron combines inputs from the lower layer and non-linearly transforms them using an activation function 𝑓, and then feed the outputs into upper-layer neurons.

Deep Neural Networks

  • A deep neural network is simply a neural network with multiple hidden layers.
  • Trades off training time with better expressive power.
  • Requires large amount of training data in order not to overfit.
  • There was a resurgence of neural networks after a breakthrough in mid-2000, when greedy layer-by-layer learning method was introduced.
  • Gained even more popularity after its success on speech and visual recognition.

Success of Deep Learning:

Increase in Data Size

  • The size of the benchmark datasets have greatly increased over time, which enabled the training of very large deep networks.

Increase in Model Size

  • ANNs have doubled in size roughly every 2.4 years.
  • Language models are becoming extremely large (human brain: 100 trillion synapses).

Advances in Computing Devices

  • Training and evaluating DNNs have been expedited greatly by GPUs and other accelerator chips.

Developments of deep learning frameworks

  • Deep learning has become highly accessible thanks to open source machine learning frameworks such as Tensorflow, Caffe and Pytorch, and code sharing via GitHub.

 

2. Deep Learning Frameworks

What is frameworks?

  • A structure containing basic concepts that can be used to solve complex problems or implement software in computer programming.
  • Basically, it can be understood as an external library in which various functions and classes are defined.

Tensorflow

  • It was developed by the Google Brain Team for research and product development, and its compatible languages include C++, Python, JavaScript, and Swift.
  • After the 1.x and 2.x versions, the stable 2.4 version has been released.
  • With the release of 2.x versions, the use of Session is almost unnecessary, and Keras is built-in.

Pytorch

  • A library developed by Facebook AI Research Lab for research purposes. Compatible languages include C++ and Python.
  • In the beginning, the community was relatively small, but it has exploded and surpassed Tensorflow in research areas.
  • PyTorch is useful for creating flexible models by designing and inferring models based on dynamic graphs
  • It is advantageous for developing various deep learning models such as RNN, CNN, and GAN, because it is possible to conduct experiments with data of different sizes in real time through dynamic graphs

 

3. Pytorch

Tensors

  • Tensors are a specialized data structure that are very similar to arrays and matrices.
  • In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model's parameters.
  • Tensors are similar to NumPy's ndarrays, except that tensors can run on GPUs or other hardware accelerators.
  • Tensors are also optimized for automatic differentiation.

Initializing a Tensor

  • Directly from data
    • Tensors can be created directly from data. The data type is automatically inferred.
import torch

data = [[1,2], [3,4]]
x_data = torch.tensor(data)
x_data

#  tensor([[1, 2],
#          [3, 4]])
  • From a NumPy array
    • Tensors can be created from NumPy arrays (and vice versa)
import numpy as np

np_array = np.array(data)
x_np = torch.from_numpy(np_array)
x_np

#  tensor([[1, 2],
#          [3, 4]])

Attributes of a Tensor

  • Tensor attributes describe their shape, datatype, and the device on which they are stored.
tensor = torch.rand(3,4)

print(f"Shape of tensor: {tensor.shape}")  # Shape of tensor: torch.Size([3, 4])
print(f"Datatype of tensor: {tensor.dtype}")  # Datatype of tensor: torch.float32
print(f"Device tensor is stored on: {tensor.device}")  # Device tensor is stored on: cpu

Operations on Tensors

  • Over 100 tensor operations, including arithmetic, linear algebra, matrix manipulation(transposing, indexing, slicing), sampling.
  • Each of these operations can be run on the GPU (at typically higher speeds than on a CPU).
    • If you’re using Colab, allocate a GPU by going to Runtime > Change runtime type > GPU.
  • By default, tensors are created on the CPU
  • We need to explicitly move tensors to the GPU using .to method.
if torch.cuda.is_available():
    tensor = tensor.to('cuda')

print(f"Shape of tensor: {tensor.shape}")  # Shape of tensor: torch.Size([3, 4])
print(f"Datatype of tensor: {tensor.dtype}")  # Datatype of tensor: torch.float32
print(f"Device tensor is stored on: {tensor.device}")  # Device tensor is stored on: cuda:0

Bridge with Numpy

  • Tensors on the CPU and NumPy arrays can share their underlying memory locations, and changing one will change the other
t = torch.ones(5)
print(f"t: {t}")  # t: tensor([1., 1., 1., 1., 1.])
n = t.numpy()
print(f"n: {n}")  # n: [1. 1. 1. 1. 1.]
t.add_(1)
print(f"t: {t}")  # t: tensor([2., 2., 2., 2., 2.])
print(f"n: {n}")  # n: [2. 2. 2. 2. 2.]

Automatic Differentiation

  • When training neural networks, the most frequently used algorithm is back propagation.
  • In this algorithm, model parameters are adjusted according to the gradient of the loss function with respect to the given parameters.
  • To compute those gradients, PyTorch has a built-in differentiation engine called torch.autograd.
  • It supports automatic computation of gradient for any computational graph.
  • Consider the simplest one-layer neural network, with input x, parameters w and b, and some loss function.
x = torch.ones(5) # input tensor
y = torch.zeros(3) # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
  • In this network, w and b are parameters, which we need to optimize.
  • Thus, we need to be able to compute the gradients of loss function with respect to those variables.
  • In order to do that, we set the requires_grad property of those tensors.
  • You can set the value of requires_grad when creating a tensor, or later by using x.requires_grad_(True) method.
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
  • A function that we apply to tensors to construct computational graph is in fact an object of class Function.
  • This object knows how to compute the function in the forward direction, and also how to compute its derivative during the backward propagation step.
  • Every operation performed on Tensors creates a new function object, that performs the computation, and records that it happened

Computing Gradients

  • To optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters under some fixed values of x and y.
  • To compute those derivatives, we call loss.backward(), and then retrieve the values from w.grad and b.grad.
  • We can only obtain the grad properties for the leaf nodes of the computational graph, which have requires_grad property set to True.
loss.backward()
print(w.grad)

'''
tensor([[0.1498, 0.0086, 0.1783],
        [0.1498, 0.0086, 0.1783],
        [0.1498, 0.0086, 0.1783],
        [0.1498, 0.0086, 0.1783],
        [0.1498, 0.0086, 0.1783]])
'''

print(b.grad)  # tensor([0.1498, 0.0086, 0.1783])

Disabling Gradient Tracking

  • All tensors with requires_grad=True are tracking their computational history and support gradient computation.
  • However, there are some cases when we do not need to do that, for example, when we have trained the model and just want to apply it to some input data.
  • We can stop tracking computations by surrounding our computation code with torch.no_grad() block.
z = torch.matmul(x, w)+b
print(z.requires_grad)  # True

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)  # False
  • Another way to achieve the same result is to use the detach() method on the tensor:
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)  # False

Example

  • Find x which satisfies y=2x^2+3 for y=[3,4].
x = torch.tensor(data=[2.,3.], requires_grad=True)
target = torch.tensor([3.,4.])
for i in range(50):
    y = 2*(x**2) + 3
    print('y    : ', y)
    loss = torch.sum(torch.abs(y-target))
    print('loss : ', loss.item())
    loss.backward()
    grad_x = x.grad
    print('grad : ', grad_x)
    x_ = x - 0.01*grad_x
    x = torch.tensor(data=x_, requires_grad=True)
    
'''
y    :  tensor([11., 21.], grad_fn=<AddBackward0>)
loss :  25.0
grad :  tensor([ 8., 12.])
y    :  tensor([10.3728, 19.5888], grad_fn=<AddBackward0>)
loss :  22.96160125732422
grad :  tensor([ 7.6800, 11.5200])
y    :  tensor([ 9.7948, 18.2882], grad_fn=<AddBackward0>)
loss :  21.083011627197266
grad :  tensor([ 7.3728, 11.0592])
y    :  tensor([ 9.2621, 17.0896], grad_fn=<AddBackward0>)
loss :  19.351703643798828
grad :  tensor([ 7.0779, 10.6168])
y    :  tensor([ 8.7711, 15.9850], grad_fn=<AddBackward0>)
loss :  17.75613021850586
grad :  tensor([ 6.7948, 10.1922])
y    :  tensor([ 8.3187, 14.9670], grad_fn=<AddBackward0>)
loss :  16.28565216064453
grad :  tensor([6.5230, 9.7845])
y    :  tensor([ 7.9017, 14.0288], grad_fn=<AddBackward0>)
loss :  14.930456161499023
grad :  tensor([6.2621, 9.3931])
y    :  tensor([ 7.5174, 13.1641], grad_fn=<AddBackward0>)
loss :  13.681507110595703
grad :  tensor([6.0116, 9.0174])
y    :  tensor([ 7.1632, 12.3673], grad_fn=<AddBackward0>)
loss :  12.530477523803711
grad :  tensor([5.7711, 8.6567])
y    :  tensor([ 6.8368, 11.6329], grad_fn=<AddBackward0>)
loss :  11.469688415527344
grad :  tensor([5.5403, 8.3104])
y    :  tensor([ 6.5360, 10.9560], grad_fn=<AddBackward0>)
loss :  10.492064476013184
grad :  tensor([5.3187, 7.9780])
y    :  tensor([ 6.2588, 10.3323], grad_fn=<AddBackward0>)
loss :  9.591086387634277
grad :  tensor([5.1059, 7.6589])
y    :  tensor([6.0033, 9.7574], grad_fn=<AddBackward0>)
loss :  8.760746002197266
grad :  tensor([4.9017, 7.3525])
y    :  tensor([5.7678, 9.2277], grad_fn=<AddBackward0>)
loss :  7.99550199508667
grad :  tensor([4.7056, 7.0584])
y    :  tensor([5.5508, 8.7394], grad_fn=<AddBackward0>)
loss :  7.290255069732666
grad :  tensor([4.5174, 6.7761])
y    :  tensor([5.3509, 8.2894], grad_fn=<AddBackward0>)
loss :  6.640299320220947
grad :  tensor([4.3367, 6.5050])
y    :  tensor([5.1666, 7.8747], grad_fn=<AddBackward0>)
loss :  6.041299819946289
grad :  tensor([4.1632, 6.2448])
y    :  tensor([4.9967, 7.4926], grad_fn=<AddBackward0>)
loss :  5.489261150360107
grad :  tensor([3.9967, 5.9950])
y    :  tensor([4.8402, 7.1403], grad_fn=<AddBackward0>)
loss :  4.980503082275391
grad :  tensor([3.8368, 5.7552])
y    :  tensor([4.6959, 6.8157], grad_fn=<AddBackward0>)
loss :  4.511631011962891
grad :  tensor([3.6834, 5.5250])
y    :  tensor([4.5629, 6.5166], grad_fn=<AddBackward0>)
loss :  4.079519271850586
grad :  tensor([3.5360, 5.3040])
y    :  tensor([4.4404, 6.2409], grad_fn=<AddBackward0>)
loss :  3.6812849044799805
grad :  tensor([3.3946, 5.0919])
y    :  tensor([4.3275, 5.9868], grad_fn=<AddBackward0>)
loss :  3.314272403717041
grad :  tensor([3.2588, 4.8882])
y    :  tensor([4.2234, 5.7526], grad_fn=<AddBackward0>)
loss :  2.9760336875915527
grad :  tensor([3.1284, 4.6927])
y    :  tensor([4.1275, 5.5368], grad_fn=<AddBackward0>)
loss :  2.6643123626708984
grad :  tensor([3.0033, 4.5050])
y    :  tensor([4.0391, 5.3379], grad_fn=<AddBackward0>)
loss :  2.377030849456787
grad :  tensor([2.8832, 4.3248])
y    :  tensor([3.9576, 5.1546], grad_fn=<AddBackward0>)
loss :  2.112271547317505
grad :  tensor([2.7678, 4.1518])
y    :  tensor([3.8825, 4.9857], grad_fn=<AddBackward0>)
loss :  1.868269681930542
grad :  tensor([2.6571, 3.9857])
y    :  tensor([3.8134, 4.8300], grad_fn=<AddBackward0>)
loss :  1.643397569656372
grad :  tensor([2.5508, 3.8263])
y    :  tensor([3.7496, 4.6866], grad_fn=<AddBackward0>)
loss :  1.436155080795288
grad :  tensor([2.4488, 3.6732])
y    :  tensor([3.6908, 4.5543], grad_fn=<AddBackward0>)
loss :  1.2451605796813965
grad :  tensor([2.3509, 3.5263])
y    :  tensor([3.6367, 4.4325], grad_fn=<AddBackward0>)
loss :  1.0691399574279785
grad :  tensor([2.2568, 3.3852])
y    :  tensor([3.5867, 4.3202], grad_fn=<AddBackward0>)
loss :  0.9069192409515381
grad :  tensor([2.1666, 3.2498])
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:10: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  # Remove the CWD from sys.path while we load stuff.
y    :  tensor([3.5407, 4.2167], grad_fn=<AddBackward0>)
loss :  0.7574167251586914
grad :  tensor([2.0799, 3.1198])
y    :  tensor([3.4983, 4.1213], grad_fn=<AddBackward0>)
loss :  0.6196351051330566
grad :  tensor([1.9967, 2.9950])
y    :  tensor([3.4593, 4.0334], grad_fn=<AddBackward0>)
loss :  0.49265575408935547
grad :  tensor([1.9168, 2.8752])
y    :  tensor([3.4233, 3.9524], grad_fn=<AddBackward0>)
loss :  0.47091078758239746
grad :  tensor([ 1.8402, -2.7602])
y    :  tensor([3.3901, 4.0301], grad_fn=<AddBackward0>)
loss :  0.42015981674194336
grad :  tensor([1.7665, 2.8706])
y    :  tensor([3.3595, 3.9493], grad_fn=<AddBackward0>)
loss :  0.4101886749267578
grad :  tensor([ 1.6959, -2.7558])
y    :  tensor([3.3313, 4.0268], grad_fn=<AddBackward0>)
loss :  0.35809803009033203
grad :  tensor([1.6281, 2.8660])
y    :  tensor([3.3053, 3.9463], grad_fn=<AddBackward0>)
loss :  0.35906362533569336
grad :  tensor([ 1.5629, -2.7514])
y    :  tensor([3.2814, 4.0235], grad_fn=<AddBackward0>)
loss :  0.3049006462097168
grad :  tensor([1.5004, 2.8615])
y    :  tensor([3.2593, 3.9433], grad_fn=<AddBackward0>)
loss :  0.3160884380340576
grad :  tensor([ 1.4404, -2.7470])
y    :  tensor([3.2390, 4.0202], grad_fn=<AddBackward0>)
loss :  0.2592334747314453
grad :  tensor([1.3828, 2.8569])
y    :  tensor([3.2203, 3.9402], grad_fn=<AddBackward0>)
loss :  0.28003358840942383
grad :  tensor([ 1.3275, -2.7426])
y    :  tensor([3.2030, 4.0170], grad_fn=<AddBackward0>)
loss :  0.21996331214904785
grad :  tensor([1.2744, 2.8523])
y    :  tensor([3.1871, 3.9372], grad_fn=<AddBackward0>)
loss :  0.24985527992248535
grad :  tensor([ 1.2234, -2.7382])
y    :  tensor([3.1724, 4.0137], grad_fn=<AddBackward0>)
loss :  0.18612885475158691
grad :  tensor([1.1745, 2.8477])
y    :  tensor([3.1589, 3.9342], grad_fn=<AddBackward0>)
loss :  0.22466683387756348
grad :  tensor([ 1.1275, -2.7338])
y    :  tensor([3.1464, 4.0105], grad_fn=<AddBackward0>)
loss :  0.15691208839416504
grad :  tensor([1.0824, 2.8432])
'''
반응형