Building Micrograd and Reading Deep Learning with PyTorch
- A gradient accumulates when a parameter is used more than once.
- Backprop happens on a computation graph, not necessarily an ANN
- Backward passes can be implemented at any level, it doesn’t matter if a mathematical operation is composite (e.g., tanh) or atomic (addition or multiplication). This is because the derivative of the output in terms of a particular input doesn’t change as the level of composition changes: (d tanh / dx) is the result of opening up tanh and chaining all the derivative of atomic operations.
- PyTorch tensors: multidimensional array as the fundamental data structure whose superpowers are fast ops on GPUs, distributed computing and so on.
Leave a comment