Building Micrograd
Complete Micrograd notebook: https://github.com/gangfang/micrograd/blob/main/micrograd.ipynb
- 4 different representations of an ANN: graphical, mathematical, in human language and in code
- This explains why this lecture can involve so LITTLE math because the entire concept is represented and conveyed through graphs and code
- The computation graph obtained with draw_dot() doesn’t LOOK the same as the ANN it represents, even though they are the same thing. The difference is the draw_dot graph expands the ANN graph and shows computation at the atomic level.
- One minor note is that we don’t run backward() on this ANN draw_dot graph because we want the derivative of L instead of that of ypred. I want to point this out because multiple graphs have been used
- A follow-up note to the above: the computation graph of the Loss is actually built on top of the ANN graph because the `sum((ygt – yout)**2 for ygt, yout in zip(ys, ypred))` uses ypred and ypred is the result of the computation of the ANN
- Implementing all atomic math operations in the Value() class is to enable the construction of a computation graph using Value() as the building block ENTIRELY, so that backward() can be run on the output of the computation graph, which is also a Value object and the derivatives of this output in terms of ALL previous Value objects can be obtained in one go.
- IMPORTANT NOTE: Several years ago I read a comment from a scientist who dismissed the success of ANN as merely the tweaking of topology. I shared the same sentiment. Now I have a change of mind: sometimes the HOW of arranging existing things is all that matter. And that is topology. We design inductive biases into computations by inventing new topology.
Leave a comment