CausalML leads to better OOD generalization.
The canonical representation of causal relationship is a causal directed acyclic graph, also called a causal diagram. It can encode a priori assumptions about the causal structure of interest.
The definition of Bayesian networks is a probabilistic graphical model representing probabilistic relationships between random variables.
One reason why graphs are helpful for modeling probability distributions over random variables in both probabilistic and causal modeling is that they allow one to conveniently express how a joint distribution over a set of random variables factorizes
Conditional Independence Relationships: Two random variables (X) and (Y) are conditionally independent given a third variable (Z) if the probability distribution governing (X) and (Y) factorizes in this manner:
[ P(X, Y | Z) = P(X | Z) \times P(Y | Z) ]
In other words, knowing the value of (Z), any information about (X) does not provide any additional information about (Y), and vice versa.
The chain rule of probability is a fundamental concept in probability theory and is particularly useful in settings like Bayesian networks and probabilistic graphical models.
Predictive modeling:
If the insurance company is not attempting to change anyone’s behavior then the question of whether drinking the right amount of red wine prevents heart attacks is irrelevant to their concerns; the only relevant question is whether the amount of red wine that someone drinks predicts heart attack rates. It is possible that people who drink an average of between 1 and 2 glasses of red wine per day for 5 years have lowered rates of heart attacks because of (confounding:) socio-economic factors that both cause average daily consumption of red wine and other life-style factors that prevent heart attacks. But even if moderate red wine consumption does not prevent heart attacks, the insurance company can still use the conditional probability to help determine the rates to charge.
The question: does the causal relationship of interest emerges through the comparison between the unmanipulated density P(O) and the manipulated one P(O||M)?
I know it’s not the comparison between the conditional density P(O|C) and the original P(O) because of possible confounding factors.
A manipulation is reflected by the corresponding manipulated density and that is why the mathematical representation of a manipulation is P’
Leave a comment