A New Model of Reinforcement Learning, Algorithms

RL doen’t need prior knowledge, it can autonomously get optional policy with the knowledge obtained by trial-and-error and continuously interacting with dynamic environment. Its characteristics of self- improving and online learning make reinforcement learning become one of intelligent agent’s core technologies. In this article, we firstly literature the model and theory of reinforcement learning. Then, we roundly present the main reinforcement learning algorithms, including Sarsa, temporal difference, Q-learning and function approximation. Finally, we briefly introduce some applications of reinforcement learning and point out some future research directions of reinforcement learning.

Keywords

Reinforcement Learning; SARSA; temporal difference; Q-learning; function approximation

Rummery G, Niranjan M. On-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department, 1994.

Sutton R S. Learning to Predict by th methods of temporal differences. Machine Learning,1988,3:9~44.

Watkins C㧚Q-Learning [J]㧚Machine Learning㧘1992㧘8 (3)㧦279-292㧚

Singh S, Jaakkola T, Jordan M I. Reinforcement learning with soft state aggregation. In: Tesauro G, Touretzky D, Advances in Neural Information Processing Systems, 7. Morgan Kaufmann: MIT Press, 1995.361~368.

Crites R H, Barto A G. Elevator group control using multiple reinforcement learning agents. Machine Learning, 1998,33(3),235~262.

McCallum A K. Reinforcement learning with selective perception and hidden State Ph. D. dissertation]. Department CS, University Rochester,1996.

Sutton R S. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Touretzky D, Mozer M, Hasselno M, Advances in Neural Information Processing Systems, B. NY: MIT Press, 1996 1038~1044.

Anderson C W. Learning to control an inverted pendulum using neural network [J] . IEEE Control System Magazine , 1989 , 30 ( 4) :31 – 36.

Whitley D ,Dominic S ,Das R and Aanderson C W. Genetic reinforcement learning for neurocontrol problems [J ] . Machine Learning ,1993 ,13 :259 – 284.

Berebji H R. Learning and tuning fuzzy logic controllers through reinforcements [J]. IEEE Trans . on Neural Networks , 1992 , 3 (5)

Khan E. Reinforcement control with unsupervised learning [A]. Int.Joint Conference on Neural Network [ C] ,Beijing ,1992 ,88 – 93.

N.R.Jennings,J.Corera,I.Laresgoti,.H.mamdani,F.Perriolat,P.Skare k and L.Z.Varga.using ARCHON to develop real-world DAI applications for electricity transportation management and Particle acceleration control[J].IEEE Exert,1996,11(6):60-88,December

Crites R H and Barto A G. Improving elevator performance using reinforcement learning[A]. In: Touretzky D S ,Mozer M C , and M E H. Advances in Neural Information Processing Systems [M]. Cambridge,MAThe MIT Press ,1995 ,1017 – 1023