I have been challenging myself my notion that in order for a system to conduct deliberate thinking, its design has to have some certain inductive biases that underpin such an ability in human. But then I read that these days neuroscience’s inspirations on DL are more limited than I used to think, I started to doubt if these inductive biases are necessary.
What if the success of LLMs ultimately comes down to scale? That is a possibility. It isn’t impossible that the Transformer architecture isn’t as critical as scale and we might achieve the same with a simpler LLM given enough scale. Who knows? And also, what is the attention mechanism? How did that contribute to the Transformer architecture? These are all questions I need to answer. (I will get to learn the Transformer arch during the last Andrej’s lecture)
——
On a different topic, two other approaches that aim at achieving AGI that I heard recently are LLM self-improvement (video) and Hyperdimensional computing.
——
A different way of looking at System 2 AI, per Bengio, is to view it as an AI scientist. It has two components:
- Understanding machine ~= P(theta|D)
- Question-answering machine ~= P(y|x, D)
Bengio claims that such decomposition makes sure there won’t be agency or autonomy in the system, whereas current LLMs are end-to-end trained, meaning a world model and an inference machine are mixed. The mixture makes it harder for human to both understand and control the behaviors of LLMs. I think this theory is certainly plausible, but the assumption of the argument isn’t necessarily valid because there might not be 2 components to begin with, not to mention the way they relate (i.e., mixed or separate). Therefore, unless the science into LLMs can clearly show those models perform both functions (i.e., understanding and inference), the comparison between AI scientist and LLMs might not be entirely appropriate.
I also appreciate the rephrasing of the two components from “world model” to “understanding machine” and from “inference machine” to “question-answering machine”. These new terms are more descriptive in terms of what those components do.
Leave a comment