DL study – Jan 27, 2024

Building makemore

  • Q: Why is a loss function needed for a generative model like makemore? And how does that look like? Asking because the end goal seems to be simply to sample from the model.

Lecture: https://youtu.be/-u_5ukgYyhg?si=i2mXgp-U95shYtUz

  • Bengio thinks the confidently wrong answers produced by LLMs are a symptom of overfitting.
  • He thinks the current LLM training paradigm, which uses a single model, overfits the world model and underfit the inference machine
  • Because the world model is usually a lot smaller, think Physics as a world model, and inference machine has to answer all specific questions, implying the question space is infinite, and therefore has to be much bigger
  • Therefore, the separation of world model and inference machine makes sense since it enables the difference levels of capacity required by both
  • This seems to go by or be related to “Model-based machine learning”, whose key ideas are bayesian inference, factor graphs and probabilistic programming

Leave a comment