Autoencoder, contrarian thought on AGI and GenAI’s impacts on Software engineering, Interpretability

Autoencoder:

used to learn efficient codings of unlabeled data (unsupervised learning)
a kind of ANN, with below structure ![[Pasted image 20240625074119.png]]
A simple autoencoder might look like this: Input (784 nodes) -> Hidden (128 nodes) -> Bottleneck (32 nodes) -> Hidden (128 nodes) -> Output (784 nodes)
- This architecture compresses 784-dimensional data to 32 dimensions and then reconstructs it.
training:
- loss function: reconstruction error
- training data: input data serves as both model input and target output
- everything else is standard
sparse autoencoder
- used in https://transformer-circuits.pub/2023/monosemantic-features/index.html
  - many neurons in the original model are polysemantic
  - they used a sparse autoencoder to learn the original model’s “learned features” that are more monosemantic (having a single, clear meaning)
- sparsity is enforced to encourage the network to activate only a small number of neurons in the hidden layers, particularly in the bottleneck layer
- techniques to enforce sparsity such as:
  - L1 regularization
  - KL Divergence Penalty
  - …

Contrarian thought on AGI

It’s possible that AGI will never arrive. AI practitioners complain about benchmarks being pushed further and further whenever they make progress. This situation makes sense when viewed from the lens of Bayesian framework. The prior changed, which leads to changes in expectations. Inferring from this, I speculate that it is a possibility that we will never build AGI because as we continue to push the envelope, we evolve with it and expect more that we are not expecting now.

People adjust and the society adjust to new realities.

GenAI’s impacts on Software engineering

https://stackoverflow.blog/2024/06/10/generative-ai-is-not-going-to-build-your-engineering-team-for-you

“There is a magical moment with any young technology where the boundaries between roles are porous and opportunity can be seized by anyone who is motivated, curious, and willing to work their asses off.”
- In that sense, DL is not a young technology. Opportunities are limited
the author has a humanistic bend
the author takes the perspective of team and industry, which involve humans. How do you get senior engineers if there are no juniors?
Also, a team functions better when it’s diverse in levels/experiences and strengths, on which I agree
coding is one part of a software engineer job, probably the easiest part. Software engineering can’t be automated because systems, rather than just code, are complex and non-deterministic.
Also, even if we narrowly focus on writing code, the quality of the code and the trustworthiness of GenAI are a big barrier to replacement of human coders by these tools
I agree with the author overall. But is there anything I disagree on?

Anthropic work on Interpretability

Previously, we made some progress matching patterns of neuron activations, called features, to human-interpretable concepts. We used a technique called “dictionary learning”, borrowed from classical machine learning, which isolates patterns of neuron activations that recur across many different contexts. In turn, any internal state of the model can be represented in terms of a few active features instead of many active neurons. Just as every English word in a dictionary is made by combining letters, and every sentence is made by combining words, every feature in an AI model is made by combining neurons, and every internal state is made by combining features.

…

The fact that manipulating these features causes corresponding changes to behavior validates that they aren’t just correlated with the presence of concepts in input text, but also causally shape the model’s behavior. In other words, the features are likely to be a faithful part of how the model internally represents the world, and how it uses these representations in its behavior.

…

The features we found represent a small subset of all the concepts learned by the model during training, and finding a full set of features using our current techniques would be cost-prohibitive (the computation required by our current approach would vastly exceed the compute used to train the model in the first place). Understanding the representations the model uses doesn’t tell us how it uses them; even though we have the features, we still need to find the circuits they are involved in. And we need to show that the safety-relevant features we have begun to find can actually be used to improve safety.

A feature can be a concrete thing, an abstract concept or even a behavioral tendency like “sycophancy”

Gang Fang's Blog