Distributed representation is at the core of why ANNs work.
What is distributed representation in ML?
Distributed representation in machine learning refers to a way of representing data where each input is described by numerous elements or features, and each feature can contribute to the representation of many inputs. In other words, the information is distributed across the elements of the representation.
Why is distributed representation distributed?
“Distributed representation” is named so because the representation of each data point (e.g., a word, an image, etc.) is spread or “distributed” across many dimensions or features, rather than being tied to a single dimension. This is in contrast to “local” or “sparse” representations like one-hot encoding where each data point corresponds to a single dimension in the representation space.
History:
The term “distributed representation” in the context of machine learning and artificial intelligence was first prominently used by Geoffrey Hinton, a pioneer in the field of deep learning. He started discussing these concepts in his work as early as the 1980s.
An example:
For instance, consider the case of representing words in natural language processing. A simple form of representation is one-hot encoding, where each word is represented by a unique, high-dimensional vector of 0s with a single 1 at the position that corresponds to that word’s index in the vocabulary. This type of representation is not distributed because each word is represented by a single feature (the position of the 1 in the vector).
On the other hand, in distributed representations like word embeddings (e.g., Word2Vec, GloVe), each word is represented by a dense vector where each dimension can contribute to the meaning of many different words. For example, a certain dimension could be related to the grammatical role of the word, another could be related to its sentiment, etc.
Why is that important?
- Efficiency: Distributed representations often have lower dimensionality than their one-hot counterparts, making them more memory and computationally efficient.
- Semantic richness: Distributed representations can capture semantic relationships between inputs. For instance, in the case of word embeddings, similar words have similar vectors. This property makes distributed representations more expressive and useful for many tasks, such as semantic search, machine translation, and more.
- Ability to handle unseen data: With distributed representations, the model can make reasonable inferences even about previously unseen inputs. For example, even if a model trained with word embeddings encounters an unfamiliar word, it can still generate a meaningful representation for that word based on its similarity to other words.
- Generalization: They are capable of generalizing from the training data to the test data since they can express more complex structures. For instance, if a machine learning model is trained on some data using distributed representations, it might still perform well on test data that differs somewhat from the training data.
Overall, distributed representations are a fundamental concept in machine learning and are instrumental in tasks such as image recognition, natural language processing, and other areas where high-dimensional inputs need to be effectively modeled.
Is distributed representation simply feature engineering?
While distributed representation and feature crafting are related in the sense that they both involve creating ways to represent data, they are not the same thing.
Feature crafting, also known as feature engineering, is a process of manually creating or selecting features for a machine learning model based on domain knowledge. It’s more about transforming raw data into a suitable format or creating new variables from existing ones to improve model performance. For example, from a timestamp, we can derive the hour of the day, day of the week, etc.
On the other hand, distributed representation is a specific type of data representation where each element in the input contributes to many features, and each feature is associated with many inputs. This representation is typically learned by the model itself rather than being manually engineered.
Paper to read: http://stanford.edu/~jlmcc/papers/PDP/Chapter3.pdf
Leave a comment