Reinforcement Learning, by the Book

Published: 24 October 2022
on channel: Mutual Information

121,532

4.2k

The machine learning consultancy: https://truetheta.io
Join my email list to get educational and useful articles (and nothing else!): https://mailchi.mp/truetheta/true-the...
Want to work together? See here: https://truetheta.io/about/#want-to-w...

Part one of a six part series on Reinforcement Learning. If you want to understand the fundamentals in a short amount of time, you're in the right place.

SOCIAL MEDIA

LinkedIn :   / dj-rich-90b91753
Twitter :   / duanejrich
Github: https://github.com/Duane321

Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon:   / mutualinformation

SOURCES

[1] R. Sutton and A. Barto. Reinforcement learning: An Introduction (2nd Ed). MIT Press, 2018.

[2] H. Hasselt, et al. RL Lecture Series, Deepmind and UCL, 2021,    • DeepMind x UCL | Deep Learning Lectur...

[3] D. Silver, Lecture 1: Introduction to Reinforcement Learning, Deepmind, 2015,    • RL Course by David Silver - Lecture 1...

[4] Y. Wang, Pricing at Lyft, Lyft, 2022, https://eng.lyft.com/pricing-at-lyft-...

[5] A. Irpan, Deep Reinforcement Learning Doesn't Work Yet, 2018, https://www.alexirpan.com/2018/02/14/...

[6] D. Silver, et al., Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Deepmind, 2017.

[7] J. Schrittwieser, et al. Mastering Atari, Go, Chess and Shogi by Planning with a
Learned Model, Deepmind, 2020.

[8] R. Roy, et al. PrefixRL: Optimization of Parallel Prefix Circuits using Deep
Reinforcement Learning, Nvidia, 2022

[9] J. Degrave, et al. Magnetic control of tokamak plasmas through deep reinforcement learning, Nature, 2021.

SOURCE NOTES

[1] is my primary source for this series. It largely determined the notation and the set of topics and motivated much of the commentary. This series could not be what it is without the comprehensive and consistent presentation of this vast subject provided by this text. If you're interested in learning more, I highly recommend this text.

In preparation, I also took Deepmind's course [2], with the intend to understand a different perspective. The material is similar, though not entirely. Notably, Deepmind's problem statement involves agents receiving observations, which are summarized into an agent-specific state. This is a more application-ready problem statement. Early drafts of the series included this version of the problem, but it was cut, since it complicated following topics. Overall, I learned about RL substantially from this course.

The Maze demonstration of the value function comes from David Silver's 2015 lecture [3].

[4] - [9] are sources for all papers and blogs referenced at the start of the video. It includes a blog [4] where Lyft describes how RL is used in their pricing algorithms.

NOTES

With regards to the statement "RL won't be the same revolution that Neural Networks were. That's OK - NNs are quite a high bar". I should elaborate. I anticipate a response of, "You can't separate RL and NN, since much of the most impactful applications of RL involve NNS". Yes, comparing these technologies is fraught. In effect, I'm presuming that if a widespread subscription to RL's problem statement is enabled by more performant NNs, that is part of the RL migration. That's not fair if we're to attribute successes to either RL or NN. In my view, this attribution isn't important. I'm merely making the claim that RL will become a primary component in many production systems. The comparison of RL vs NNs is an accidental symptom of how I pitched the RL trend. In retrospect, I would have phrased this differently.

TIMESTAMP
0:00 The Trend of Reinforcement Learning
2:46 A Six Part Series
3:24 A Finite Markov Decision Process and Our Goal
9:02 An Example MDP
12:49 State and Action Value Functions
15:00 An Example of a State Value Function
16:28 The Assumptions
17:58 Watch the Next Video!

CORRECTIONS
1) In the MASE, I have a -16 where there should be a -14 (Thank you rogiervdw).

Watch video Reinforcement Learning, by the Book online without registration, duration hours minute second in high quality. This video was added by user Mutual Information 24 October 2022, don't forget to share it with your friends and acquaintances, it has been viewed on our site 121,532 once and liked it 4.2 thousand people.

00:00:00