Deep Learning/Pytorch

Classification and regression

jstar0525 2022. 1. 13. 17:42
반응형

Categorization of machine learning

Supervised Learning

  • Given a set of labeled example, D={(xt,yt)}Nt=1 , learn a mapping f:XY which minimizes L(Y^=f(X),Y)

Classification

  • The desired outputs yt are discrete class labels (categorical information)

  • The goal is to classify new inputs correctly

Regression

  • The desired outputs yt are continuous values

  • The goal is to predict the output values for new inputs

Unsupervised Learning

  • Given a set of unlabeled example, D={(xt)}Nt=1, learn a meaningful representation of the data

Clustering

  • Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters)
  • K-means, spectral clustering, etc.

Dimensionality reduction

  • Transformation of data from a high-dimensional space into a lowdimensional space so that the low-dimensional representation retains some meaningful properties of the original data

 

Image classification

  • Image classification : a core task in machine learning
    • An image is a tensor of integers between [0. 255]
    • e.g 800x600x3, 3 channels RGB
  • Problem : semantic gap

challenges in image classification

  • viewpoint variation
  • background clutter
  • illumination
  • occlusion
  • deformation
  • intraclass variation

 

Classifier

Image classifier

  • Unlike a list of numbers (e.g. temperature, humidity, wind speed, etc.)
  • No obvious way to hard-code the algorithm for recognizing a cat, or other classes.

Previous attempts

  • Edges and local features
  • Feature representation for patch images

Data-driven approach

  1. Collect a dataset of images and labels
  2. Use Machine Learning algorithms to train a classifier
  3. Evaluate the classifier on new images

A simple classifier: k-NN method

  • K-Nearest Neighbors : Instead of copying label from nearest neighbor, take majority vote from K closest points
  • Distance Metric

  • Hyperparameters
    • What is the best value of k to use?
    • What is the best distance to use?
    • Very problem/dataset-dependent
    • Must try them all out and see what works best
  • Hyperparameter setting (e.g. K)
    1. Use only train data : Choose hyperparameters that work best on the training data
      • BAD : K=1 always works perfectly on training data
    2. Split data into train, val : Choose hyperparameters that work best on val data
      • BAD : No idea how algorithm will perform on new data
    3. Split data into train, val, test : Choose hyperparameters that work best on val and evaluate on test
      • Better!
    4. Cross-Validation : Split data into folds, try each fold as validation and average the results
      • Useful for small datasets, but not used too frequently in deep learning
  • Limitations of k-NN method
    • Distance metrics on pixels are not informative
    • Very Slow at test time
    • Curse of dimensionality

Another simple classifier: Linear classifier

  • f(x,W)(10x1) = W(10x3072)x(3072x1) +B(10x1)
    • x : image, 32x32x3, = 3072
    • W : parameters or weights
    • b : bias
    • f(x,W) : 10 classes

  • Neural Network can be viewed as stacked linear classifiers

Regression

Linear regression

  • the relationships between variables and responses are modeled using linear functions whose unknown model parameters are estimated from the data.
  • Y^=θ0+θ1x1++θnxn
    • x1,,xn : feature values
    • θ1,,θn : model parameters
    • Y^ : predicted value

 

반응형