DL study – Jan 25, 2024

The scaling law I have been hearing about, how much truth is there to it? I think to look at it from the opposite angle, the question is: how much neural architecture matters? Or does all those invariants or biases that different architectures possess are simply a “shortcut”, which facilitates more complex problem solving with limited scale? To put the question in a concrete manner: can a infinitely massive MLP solve, everything?

TODO: I will experiments to get to my own answer.

This is the exact experiment I want to run: https://arxiv.org/abs/2306.13575 & https://github.com/gregorbachmann/scaling_mlps. I will reproduce this.


Applying the concept of flow to explain improvisation isn’t just helpful for artists. I am increasingly realizing that it is useful in the context of understanding consciousness. Self-observation of thoughts is immensely interesting, and what I have been observing about daydreaming is the sequential nature of it: what I am daydreaming about simply comes one after another, and they are all related and connected in some way. Like this morning before getting out of bed, the sequence I observed was:

The transformer arch -> GFlowNets vs. transformers as two diff arch -> will GFlowNets become a new hot thing -> explaining why it can be: it combines multiple ML ideas and I wasn't sure if unsupervised learning is one -> I need to get up and read the GFN blog post again -> I need to watch Ilya's lecture on unsupervised learning on Saturday -> I imagined doing that at the UofT library -> using those non-standing desks might make my upper back hurts -> how can I minimize the hurting -> I need a laptop stand on desk -> why did I throw away my old stand.. -> I could simply use books -> but I don't have a secondary trackpad -> I reimbursed it though through company -> should I buy a new one, nope -> should I take the one at the office -> I would have to carry my computer -> I would have to carry my heavier work computer next week because I would be oncall -> I would be doing two oncall shifts in Feb... -> why did I not accept my colleague offer to swap... -> I guess I wanted to be kind -> Am I sacrificing myself too much -> it would eat up time for the outreach project -> it would add more stress -> OMG, I realized all these thoughts were sequential -> I had similar realizations before.

These sequential thoughts are more expansive than they appear on writing. They are mostly in the form of imaginations and visualizations so there is a focal point of each but there are also peripherals.

Another characteristics of daydreaming, as opposed to reasoning is the sequence doesn’t appear to be compositional, they are not intentionally put together to achieve some conclusion, they simply flow from one to the next, connected by some common properties in the imagination.


Building Makemore exercise:

  • For a character predicting model, an example in the dataset should be interpreted as such: take the name “isabella”, it tells us that “i” is likely the first char, “s” is likely to follow “i”, “a” is likely to follow “is”, and etc., finally, “a” is likely to follow the sequence “isabell” and is likely the last char in a sequence.
  • Bigram: only modeling local structure of two adjacent characters

Leave a comment