Now I have finished studying Peter Spirtes’ Introduction to Causal Inference. I am coming back to these big questions.
The bigger question to ask when I study causal inference: what principles I can draw from the field of causal inference to build causal understanding into Yoshua Bengio’s world models?
My Thought: To see their relationship, I need to first articulate each of them. For CI, I have problem 3 and 4/5, which is to construct causal models and then to use them to predict effects of (counterfactual) manipulations. For the world models, I don’t quite know how to describe that statistically. But let me imagine…
If I press the button “down arrow” on the standing desk, I can expect the desk to be lower because the probability of it is so overwhelmingly large. Other outcomes exist, but they are so exceptional that they are ignored in day to day life. But we can indeed model this as P(Outcome|World || Pressing button). And this model can be applied to everything in the world, which means the world can be represented probabilistically, everything is a probability density.
My Answer: (last of today): the question isn’t phrased in a way that fits this answer. But the idea is to learn P(Y|X) where X is represented by a causal Bayesian Network over abstract variables. Because the causal models are learned generatively and the discriminative function (X->Y) is already learnt, this P(Y|X) can do better at OOD generalization.
How is a world model even represented statistically?
My answer: P(Outcome|World || Pressing button)
My interpretation of Yoshua Bengio’s answer: P(Y|X) with X being the causal Bayesian Network at the abstract level, learned by DL.
Notes
“Using GFlowNet as an ideal tool, we can bring that (the system 2 inductive biases psychology and cognitive science have discovered) into the design of probabilistic machine learning based on deep learning as the building block”
“One of those inductive biases is causality. Human think in a causal way. And causality uses a graphical language. The reason why GFlowNet is the ideal tool for the job is it is good at representing distributions and sampling over graphs. And GFlowNet will help us with causal discovery, while remaining as a DL system.”
Note:
Discriminative models vs generative models
Discriminative models differ from generative models. While discriminative models focus on the boundary between classes, generative models focus on modeling the distribution of individual classes.
Leave a comment