DL implementation study – Jan 17, 2024

building micrograd

Notes

a derivative tells us the impact to L when a change happens to a parameter
one sentence summary of backprop by Andrej: recursive application of the chain rule, backward through the computation graph
The computation graph Andrej drew at the beginning of the lecture is not a ANN but a graph of an arbitrary loss function instead
Concept flow of the lecture: derivatives -> chain rule -> backprop -> one optimization step -> neuron (gradients are computed manually up to here) -> step-wise gradient automation -> graph-wide gradient automation

Gang Fang's Blog