A comparison of Temporal-Difference(0) and Constant-α Monte Carlo methods on the Random Walk Task
<p>The Monte Carlo (MC) and the Temporal-Difference (TD) methods are both fundamental technics in the field of reinforcement learning; they solve the prediction problem based on the experiences from interacting with the environment rather than the environment’s model. However, the TD method is a combination of MC methods and Dynamic Programming (DP), making it differs from the MC method in the aspects of the update rule, the bootstrapping, and the bias/variance. TD methods are also proven to have better performance and faster convergence compared to the MC in most cases.</p>
<p>In this post, we’ll compare TD and MC, or more specifically, the TD(0) and constant-α MC methods, on a simple grid environment and a more comprehensive Random Walk [2] environment. Hoping this post can help readers interested in Reinforcement Learning better understand how each method updates the state-value function and how their performance differs in the same testing environment.</p>
<p>We will implement algorithms and comparisons in Python, and libraries used in this post are as follows</p>
<p><a href="https://towardsdatascience.com/a-comparison-of-temporal-difference-0-and-constant-%CE%B1-monte-carlo-methods-on-the-random-walk-task-bc6497eb7c92">Website</a></p>