building micrograd
Notes
- a derivative tells us the impact to L when a change happens to a parameter
- one sentence summary of backprop by Andrej: recursive application of the chain rule, backward through the computation graph
- The computation graph Andrej drew at the beginning of the lecture is not a ANN but a graph of an arbitrary loss function instead
- Concept flow of the lecture: derivatives -> chain rule -> backprop -> one optimization step -> neuron (gradients are computed manually up to here) -> step-wise gradient automation -> graph-wide gradient automation
Leave a comment