Category: AI
-
DL implementation study – Feb 16, 2024
makemore repo with the completed bigram model (lecture 2): https://github.com/gangfang/makemore Notes: Questions:
-
DL implementation study – Jan 5, 2024
building makemore before part 2: # Doing the same thing but with a ANN # imagine how the model looks like before watching video: # input is a char, output is the next # one-hot encoding to make each output neuron a bin classifier # softmax to get the prob dist # N is not…
-
DL study – Feb 3, 2024
Building Makemore The No Free Lunch Theorem and OOD generalization I was wondering if The NFL Theorem is an implication of the impossibility of OOD generalization. NFL theorem talks about the impossibility of creating a universal learning algorithm, which sounds similar to the goal of out-of-dist generalization. But I think there are nuances that make…
-
Generalization and the Scaling laws – Jan 31, 2024
Traditionally, the central challenge in ML is generalization, or out of distribution generalization. The problem of generalization only occurs when the model is used to predict on previously unobserved inputs. However, with the current approach of LLM training, which uses the “entire Internet”, there doesn’t seem to be any more unobserved data and therefore, OOD…
-
DL and intelligence – Jan 28, 2024
This talk has been phenomenally interesting and here are some notes:
-
DL study – Jan 25, 2024
The scaling law I have been hearing about, how much truth is there to it? I think to look at it from the opposite angle, the question is: how much neural architecture matters? Or does all those invariants or biases that different architectures possess are simply a “shortcut”, which facilitates more complex problem solving with…