The Complete Mathematics of Neural Networks and Deep Learning

Published: 28 February 2021
on channel: Adam Dhalla
490,945
25k

A complete guide to the mathematics behind neural networks and backpropagation.

In this lecture, I aim to explain the mathematical phenomena, a combination of linear algebra and optimization, that underlie the most important algorithm in data science today: the feed forward neural network.

Through a plethora of examples, geometrical intuitions, and not-too-tedious proofs, I will guide you from understanding how backpropagation works in single neurons to entire networks, and why we need backpropagation anyways.

It's a long lecture, so I encourage you to segment out your learning time - get a notebook and take some notes, and see if you can prove the theorems yourself.

As for me: I'm Adam Dhalla, a high school student from Vancouver, BC. I'm interested in how we can use algorithms from computer science to gain intuition about natural systems and environments.

My website: adamdhalla.com
I write here a lot: adamdhalla.medium.com
Contact me: [email protected]

Two good sources I recommend to supplement this lecture:

Terence Parr and Jeremy Howard's The Matrix Calculus You Need for Deep Learning: https://arxiv.org/abs/1802.01528

Michael Nielsen's Online Book Neural Networks and Deep Learning, specifically the chapter on backpropagation http://neuralnetworksanddeeplearning....

ERRATA----
I'm pretty sure the Jacobians part plays twice - skip it when you feel like stuff is repeating, and stop when you get to the part about the "Scalar Chain Rule" (00:24:00).

And, here are the timestamps for each chapter mentioned in the syllabus present at the beginning of the course.

PART I - Introduction
--------------------------------------------------------------
00:00:52 1.1 Prerequisites
00:02:47 1.2 Agenda
00:04:59 1.3 Notation
00:07:00 1.4 Big Picture
00:10:34 1.5 Matrix Calculus Review
00:10:34 1.5.1 Gradients
00:14:10 1.5.2 Jacobians
00:24:00 1.5.3 New Way of Seeing the Scalar Chain Rule
00:27:12 1.5.4 Jacobian Chain Rule

PART II - Forward Propagation
--------------------------------------------------------------
00:37:21 2.1 The Neuron Function
00:44:36 2.2 Weight and Bias Indexing
00:50:57 2.3 A Layer of Neurons

PART III - Derivatives of Neural Networks and Gradient Descent
--------------------------------------------------------------
01:10:36 3.1 Motivation & Cost Function
01:15:17 3.2 Differentiating a Neuron's Operations
01:15:20 3.2.1 Derivative of a Binary Elementwise Function
01:31:50 3.2.2 Derivative of a Hadamard Product
01:37:20 3.2.3 Derivative of a Scalar Expansion
01:47:47 3.2.4 Derivative of a Sum
01:54:44 3.3 Derivative of a Neuron's Activation
02:10:37 3.4 Derivative of the Cost for a Simple Network (w.r.t weights)
02:33:14 3.5 Understanding the Derivative of the Cost (w.r.t weights)
02:45:38 3.6 Differentiating w.r.t the Bias
02:56:54 3.7 Gradient Descent Intuition
03:08:55 3.8 Gradient Descent Algorithm and SGD
03:25:02 3.9 Finding Derivatives of an Entire Layer (and why it doesn't work well)

PART IV - Backpropagation
--------------------------------------------------------------
03:32:47 4.1 The Error of a Node
03:39:09 4.2 The Four Equations of Backpropagation
03:39:12 4.2.1 Equation 1: The Error of the last Layer
03:46:41 4.2.2 Equation 2: The Error of any layer
04:03:23 4.2.3 Equation 3: The Derivative of the Cost w.r.t any bias
04:10:55 4.2.4 Equation 4: The Derivative of the Cost w.r.t any weight
04:18:25 4.2.5 Vectorizing Equation 4
04:35:24 4.3 Tying Part III and Part IV together
04:44:18 4.4 The Backpropagation Algorithm
04:58:03 4.5 Looking Forward


Watch video The Complete Mathematics of Neural Networks and Deep Learning online without registration, duration hours minute second in high quality. This video was added by user Adam Dhalla 28 February 2021, don't forget to share it with your friends and acquaintances, it has been viewed on our site 490,94 once and liked it 25 thousand people.