Qr code
中文
Yong Song

Professor
Supervisor of Doctorate Candidates
Supervisor of Master's Candidates


Other Post:Guest Editor of International Journal of Applied Intelligence,Reviewer of IEEE Access, Advances in Mechanical Engineering, and several other journals
Gender:Male
Alma Mater:Shandong University
Education Level:Postgraduate (Doctoral)
Degree:Doctoral Degree in Engineering
Status:Employed
School/Department:School of Mechanical, Electrical & Information Engeering
Date of Employment:2001-07-01
Discipline:Control Theory and Control Engineering
Business Address:Room 307, South Part of Zhixing Building
Contact Information:0631-5688368
E-Mail:songyong@sdu.edu.cn
Click:Times

The Last Update Time: ..

Current position: Home >> Scientific Research >> Paper Publications

Goal-Conditioned Reinforcement Learning With Adaptive Intrinsic Curiosity and Universal Value Network Fitting for Robotic Manipulation

Hits: Praise

Affiliation of Author(s):机电与信息工程学院

Journal:IEEE Transactions on Industrial Informatics

Abstract:Hindsight experience replay (HER) has greatly
 increased the possibility of using deep reinforcement learn
ing (DRL) for robotic manipulation with sparse rewards.
 However, there are still concerns about low learning effi
ciency and poor performance due to its insufficient explo
ration ability and bias against the initial goal introduced
 by HER. In this article, to solve this problem, a multigoal
 robotic manipulation DRL method based on adaptive in
trinsic curiosity and universal value network fitting (AIC
UVNF)isproposedtofurtherimprovetheexplorationability
 and learning performance. Specifically, this method utilizes
 an improved curiosity mechanism to construct a joint in
trinsic reward and adaptively adjust the proportion, which
 canenhanceexplorationability and avoid excessivepursuit
 of novel states. In addition, a universal value network fitting
 approach is proposed to incorporate the initial goal into
 the value function fitting process, which employs the value
 of the initial goal to eliminate the bias of HER in the algo
rithm update. Combined with the off-policy soft actor-critic
 method, AIC-UVNF is verified on multigoal robotic manip
ulation tasks. The results show that the proposed method
 achieves better convergence efficiency and learning perfor
mance.

All the Authors:Qiangyang Xu,Bao Pang,Rui Song,Yibin Li

First Author:Zihao Sun

Indexed by:Journal paper

Correspondence Author:Xianfeng Yuan,Yong Song*

Document Code:1747462659626520577

Discipline:Engineering

Page Number:12-27

Number of Words:10

Translation or Not:no

Date of Publication:2024-12-01

Included Journals:SCI