DL implementation study – Jan 17, 2024

building micrograd

Notes

  • a derivative tells us the impact to L when a change happens to a parameter
  • one sentence summary of backprop by Andrej: recursive application of the chain rule, backward through the computation graph
  • The computation graph Andrej drew at the beginning of the lecture is not a ANN but a graph of an arbitrary loss function instead
  • Concept flow of the lecture: derivatives -> chain rule -> backprop -> one optimization step -> neuron (gradients are computed manually up to here) -> step-wise gradient automation -> graph-wide gradient automation

Leave a comment