DL implementation study – Jan 21, 2024

Notes

  • What can be turned into a torch.tensor?
    • a py list or a sequence
    • a dimensional request can be passed into it and a corresponding tensor to be constructed. Like this:
    • torch.zeros([2, 4], dtype=torch.int32) or torch.randn((2,2,2,2,2,2,2,2,2,2))
  • in bengio 2003 nlp paper, why is the embedding used in the input layer for those 17000 words? – AN: one word, efficiency. The traditional one-hot encoded word representation is very sparse and inefficient and it’s a curse of dimensionality. This embedding being efficient enables processing of larger vocabularies.
    • In here, we say it’s a curse of dimensionality. And “dimension” refers to the length of a one-hot vector, which is the number of all words in the vocabulary. It doesn’t refer to the dimension of the training set, which is always 2D.
  • what exactly is the curse of dimensionality? – AN: many ml problems become exceedingly difficult as the number of variables (aka dimensions) increases because more examples are needed to adequately fill up/explain the data space that becomes larger as number of dimensions increases.
  • Two different interpretations of the embedding layer:
    • w/o one-hot encoding: simply a matrix value retrieval process where the embedding value is retrieved from the matrix C when an index is given
    • w one-hot encoding: a linear layer in the ann where C represents the weight matrix. The one-hot encoded char, which looks like [0,0,0,1,0,0…], is matrix multiplied by C (i.e., a linear operation) to get the output of this layer.

Qs:

  • what would happen if I unbind the last dimension? torch.unbind(emb, 2)? Do I get torch.Size([32, 3]) or torch.Size([32, 3,1])? My guess is the second. – The correct answer is torch.Size([32, 3]) because torch.unbind REMOVES the specified dimension.

One response to “DL implementation study – Jan 21, 2024”

  1. […] is the embedding layer used? I wrote about the two interpretations of the embedding layer in this study note: it is either a matrix value retrieval or a matrix multiplication, depending on whether we view it […]

    Like

Leave a reply to DL weekly study notes: building a wavenet – Gang Fang's Blog Cancel reply