DL implementation study – Jan 18, 2024

Building Micrograd and Reading Deep Learning with PyTorch

A gradient accumulates when a parameter is used more than once.
Backprop happens on a computation graph, not necessarily an ANN
Backward passes can be implemented at any level, it doesn’t matter if a mathematical operation is composite (e.g., tanh) or atomic (addition or multiplication). This is because the derivative of the output in terms of a particular input doesn’t change as the level of composition changes: (d tanh / dx) is the result of opening up tanh and chaining all the derivative of atomic operations.
PyTorch tensors: multidimensional array as the fundamental data structure whose superpowers are fast ops on GPUs, distributed computing and so on.

Gang Fang's Blog