DSAC (Distributional Soft Actor-Critic)
-
Paper
J. Duan, Y. Guan, S. E. Li, Y. Ren, Q. Sun, and B. Cheng, “Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2021. Download
J. Duan, W. Wang, L.Xiao, J. Gao, S. E. Li, C. Liu, Y. Zhang, B. Cheng, and K. Li, “DSAC-T: Distributional Soft Actor-Critic with Three Refinements”. arXiv preprint arXiv:2310.05858, 2023. Download - Github: https://github.com/Jingliang-Duan/DSAC-T
Prof. Jingliang Duan proposed an RL algorithm, named DSAC, to tackle the overestimation problem encountered by most model-free off-policy RL algorithms. It achieves state-of-the-art (SOTA) performance in most MuJoCo benchmarks, surpassing other model-free RL baselines such as SAC, TD3, and PPO.
GOPS (General Optimal control Problems Solver)
- GOPS Open Source Website: https://gops.readthedocs.io/
- Github: https://github.com/Intelligent-Driving-Laboratory/GOPS
Prof. Jingliang Duan is currently serving as the co-leader in developing GOPS. Solving optimal control problems serves as basic demands of industrial control tasks. Existing methods like model predictive control often suffer from heavy online computational burdens. Reinforcement learning (RL) has shown great promise in computer and board games but has yet to be widely adopted in industrial applications due to lacking accessible and high-accuracy solvers. Therefore, “Intelligent Driving Lab (iDLab)” at Tsinghua University has developed GOPS, an easy-to-use RL solver package that aims to build real-time and high-performance controllers in industrial fields. GOPS is built with a highly modular structure that retains a flexible framework for secondary development. Considering the diversity of industrial control tasks, GOPS also includes a conversion tool that allows for the use of Matlab/Simulink to support environment construction, controller design, and performance validation. To handle large-scale control problems, GOPS can automatically create various serial and parallel trainers by flexibly combining embedded buffers and samplers. It offers a variety of common approximate functions for policy and value functions, including polynomial, multilayer perceptron, convolutional neural network, etc. Additionally, constrained and robust training algorithms for special industrial control systems with state constraints and model uncertainties are also integrated into GOPS.