GFlowNets study notes

GFlowNet is really MANY THINGS IN ONE THING. And I need to put all aspects together in order to fully understand it. Like, how is this a stochastic policy in the light of it being a generative model i.e., P(y,x)?

Generative active learning, reinforcement learning, stochastic policy, generative model, energy-based probabilistic modelling, variational models and inference, non-parametric Bayesian modelling, and unsupervised or self-supervised learning of abstract representations

GFlotNets high level:

What I find exciting is that they open so many doors, but in particular for implementing the system 2 inductive biases I have been discussing in many of my papers and talks since 2017, that I argue are important to incorporate causality and deal with out-of-distribution generalization in a rational way. They allow neural nets to model distributions over data structures like graphs, to sample from them as well as to estimate all kinds of probabilistic quantities which otherwise look intractable.

Firstly, “a GFlowNet is a stochastic policy”.

What is a stochastic policy? – https://ai.stackexchange.com/a/12275

A stochastic policy is a family of conditional probability distributions, 𝜋𝕤(𝐴∣𝑆). It’s a family because of the set of states S, with each state s, we have a probability distribution for the set of actions A. This differs from a deterministic policy due to the action taken given a state is probabilistic instead of fixed.

Why a GFlowNet is a stochastic policy?

Because compositional objects can be constructed sequentially. And this sequential construction can be achieved when a stochastic policy is applied. And the policy samples the compositional objects and each of the forward-going constructive actions.

Below summarizes a GFlowNet and its training objective:

“The last state s_n of a complete trajectory \tau is an object x \in \cal X that the GFlowNet can sample, and the training objective aims at making it sample x with probability proportional to R(x).”

But this is also reward sparsity. This is addressed in their GFlowNet Foundations paper.

Why a GFlowNet is a generative model:

Because it generates action samples given the state, after training.

Energy Functions:

R is not an external quantity (like in typical RL) but an internal quantity (e.g. corresponding to an energy function in a world model).

Why is an energy function in a world model an internal quantity? And what are energy functions?

Trainings

Training the GFlowNet can be done by repeatedly querying that function. And this leads to:

the size of the data set doesn’t matter anymore irt to the quality of the training
the size of the neural net can be as large as compute can afford w/o worrying overfitting

Importantly, when the reward function R represents the product of a prior (over some random variable) times a likelihood (measuring how well that choice of value of the random variable fits some data), the GFlowNet will learn to sample from the corresponding Bayesian posterior.

neural net architectures

the simplest GFlowNet architecture is one where we have a neural net that outputs a stochastic policy \pi(a_t|s_t).

In regular GFlowNets, choosing a_t from s_t deterministically yields some s_{t+1}, which means that we can also write \pi(a_t|s_t)=P_F(s_{t+1}|s_t) for that policy.

A GFlowNet is a stochastic policy but why is “deterministic” used here? My interpretation is: from state s_t to state s_{t+1} is probabilistic. But once the action is chosen, that action deterministically, as in “always”, leads to a particular s_{t+1}.

Since the state is a variable-size object, the neural net better have an appropriate architecture for taking such objects in input.

The state s_t is the input of the GFlowNet stochastic policy.

A trained GFlowNet is both a sampler (to generate objects x with probability proportional to R(x)) and an inference machine (it can be used to answer questions and predict probabilities about some variables in x given other variables, marginalizing over the others)

This describes what a trained GFlowNet outputs.

Gang Fang's Blog