Stat/prob/ml concepts

Latent and observable variables

In statistics, latent variables (from Latin: present participle of lateo, “lie hidden”) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured.[1]

Latent variables may correspond to aspects of physical reality. These could in principle be measured, but may not be for practical reasons. In this situation, the term hidden variables is commonly used (reflecting the fact that the variables are meaningful, but not observable).

The use of latent variables can serve to reduce the dimensionality of data.

https://en.wikipedia.org/wiki/Latent_and_observable_variables

Bayesian statistics

It’s underpinned by Bayes’ rule but not it. It’s a statistical theory based on the Bayesian interpretation of probability where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event.

Bayesian statistical methods use Bayes’ theorem to compute and update probabilities after obtaining new data.

Bayes’ theorem: Posterior = Likelihood × Prior ÷ Evidence

I.I.D.: Independent and Identical Distribution

In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent.[1] This property is usually abbreviated as i.i.d., iid, or IID. IID was first defined in statistics and finds application in different fields such as data mining and signal processing.

Marginal distribution and Marginalization

In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables.

These concepts are “marginal” because they can be found by summing values in a table along rows or columns, and writing the sum in the margins of the table.[1] The distribution of the marginal variables (the marginal distribution) is obtained by marginalizing (that is, focusing on the sums in the margin) over the distribution of the variables being discarded, and the discarded variables are said to have been marginalized out.

Marginalization refers to the process of deriving the marginal probability of a subset of the variables from a larger set.

Variational inference

Variational inference (VI), a method from machine learning that approximates probability densities through optimization

TODO: study

https://arxiv.org/abs/1601.00670

Active learning

Active learning is a special case of machine learning where a learning algorithm can interactively query the user (or some other information source) to obtain the desired outputs at new data points. In other words, the algorithm is not just passively learning from a given dataset; instead, it is actively seeking out the data it needs to learn effectively.

A simple example:

Suppose we’re building a machine learning model to classify emails as either “spam” or “not spam”. Initially, we don’t have any labeled data. Instead of randomly choosing emails for our human expert to label, which could be time-consuming and inefficient, we can use active learning.

First, we might start by choosing a small random sample of emails for our expert to label. We then train our initial model on this small set. Next, the active learning part comes in. Our model makes predictions on the rest of the unlabeled emails and identifies the ones it’s least confident about. These uncertain cases are likely to be those where it will learn the most.

For instance, the model might be unsure about emails that contain certain words or phrases that it hasn’t seen often in its training data. It will then ask the expert to label these uncertain cases. This newly labeled data is added to the training set, and the model is retrained. This process is repeated, with the model getting progressively better as it learns from the most informative examples.

Gang Fang's Blog