Category: DL skill training
-
Weekly DL study note: GFlowNet Code Tutorial (completed)
Completed code: https://github.com/gangfang/littlegfn/blob/main/face_generator.ipynb Pre-requisites Flow Networks (Nothing about training yet) IMPORTANT – MAIN IDEA OF GFN The main idea behind GFlowNet is to interpret the DAG as a flow network, and to think of each edge as a pipe through which some amount of water, or particles, flows. We then want to find a flow where, (a) flow is…
-
Weekly study note: GFlowNet Code Tutorial 2022
Flow Networks (Nothing about training yet) IMPORTANT – MAIN IDEA OF GFN The main idea behind GFlowNet is to interpret the DAG as a flow network, and to think of each edge as a pipe through which some amount of water, or particles, flows. We then want to find a flow where, (a) flow is preserved, (b) the flow…
-
Experiments with a 2M-param transformer
Jupyter notebook: https://github.com/gangfang/nanogpt/blob/main/gpt_dev.ipynb
-
Weekly DL study notes: building GPT from stretch, part 2
Parallelization: By removing sequential dependencies, Transformers could be trained much more efficiently on parallel hardware like GPUs.
-
Weekly DL study notes: building GPT from scratch
Updated on Jun 14, 2024 Residual Networks (ResNets) Attention
-
DL weekly study notes: building a wavenet
Code reproduction of Andrej Karpathy’s “building makemore part 5” lecture: https://github.com/gangfang/makemore/blob/main/makemore_part5.ipynb Study notes:
-
DL weekly study notes: manual backprop i.e., w/o loss.backward()
Code reproduction of Andrej Karpathy’s “building makemore part 4” lecture: https://github.com/gangfang/makemore/blob/main/makemore_part4_manual_backprop.ipynb
-
DL weekly study notes: batch normalization, PyTorch’s APIs and distribution visualization
Updated on May 3. Code reproduction of Andrej Karpathy’s “building makemore part 3” lecture can be found at https://github.com/gangfang/makemore/blob/main/makemore_part3.ipynb This weekly notes include some older notes. They have been written for Andrej’s Building makemore Part 3: Activations, Gradients & BatchNorm YT video. Principles: We want stable gradients (neither exploding nor vanishing) of non-linearity throughout the…