-
Weekly DL study notes: building GPT from stretch, part 2
Parallelization: By removing sequential dependencies, Transformers could be trained much more efficiently on parallel hardware like GPUs.
-
Belly of the beats and Just write me something honest
It is incredibly difficult to be honest. Even with this writing, I am tempted to say stuff that shows off my knowledge that’s why I am writing in obsidian and not on my blog before I want to publish. So yesterday I went to the belly of the beats battle, I wasn’t physically in the…
-
Weekly DL study notes: building GPT from scratch
Updated on Jun 14, 2024 Residual Networks (ResNets) Attention
-
DL weekly study notes: building a wavenet
Code reproduction of Andrej Karpathy’s “building makemore part 5” lecture: https://github.com/gangfang/makemore/blob/main/makemore_part5.ipynb Study notes:
-
DL weekly study notes: manual backprop i.e., w/o loss.backward()
Code reproduction of Andrej Karpathy’s “building makemore part 4” lecture: https://github.com/gangfang/makemore/blob/main/makemore_part4_manual_backprop.ipynb
-
DL weekly study notes: batch normalization, PyTorch’s APIs and distribution visualization
Updated on May 3. Code reproduction of Andrej Karpathy’s “building makemore part 3” lecture can be found at https://github.com/gangfang/makemore/blob/main/makemore_part3.ipynb This weekly notes include some older notes. They have been written for Andrej’s Building makemore Part 3: Activations, Gradients & BatchNorm YT video. Principles: We want stable gradients (neither exploding nor vanishing) of non-linearity throughout the…
-
Reproducing Paper “GPT-4 Can’t Reason”
(updated on Apr 22) Introduction The higher level cognitive abilities of ChatGPT has always been fascinating to me. This topic has sparked numerous debates since OpenAI’s launch but most comments are one-sided. Recently I came across Konstantine Arkoudas’s pre-print paper GPT-4 Can’t Reason (arxiv) and was amazed by the clever scoping of the problem statement,…
-
Another step towards convergence
It strikes me that history says the opposite. ChatGPT represents another stride toward the broader trend of convergence, aligning with how the internet, industrialization, and agriculture have historically embodied the same momentum. This convergence is characterized by a gradual reduction in uncertainties that, while reducing risks and damages, also diminishes “interestingness” in human experiences. Essentially,…