한국기술교육대학교 LINK 연구실

Advanced Deep Reinforcement Learning (고급심층강화학습 [240222], Spring Semester, 2025)

“Student-professor relationships are based on trust. Acts, which violate this trust, undermine the educational process. Your classmates and the professor will not tolerate violations of academic integrity.”

1. Course Schedule & Lecture Notes

[공지사항 - 2025.03.04]

본 수업을 수강하는 학생들에게 공지합니다. 본 수업은 심층강화학습(Deep Reinforcement Learning)에 대한 공부를 어느 정도 수행한 학생들을 대상으로 하는 수업입니다.
특히 본 수업은 최신 심층강화학습 기반 4족 보행 로봇, 2족 보행 로봇(휴머노이드) 및 기본 모델 (Foundation Model) 기반 로봇에 대하여 함께 공부할 예정입니다.
모든 학생들은
   1) 본 수업에서 강의하는 내용을 심도있게 공부해야 하며,
   2) 본 수업에서 제시하는 논문들을 직접읽고 발표를 해야 하며,
   3) 기말고사를 통하여 본 수업에서 다룬 논문들 전반에 걸친 이해도를 평가받게 됩니다.
본 수업을 수강하기 위하여 꼭 필요한 선수 지식
   1) 학부과정에서의 자료구조 및 알고리즘 교과목 이수
   2) 파이썬을 활용한 가상 환경 구축 및 다양한 패키지/모 활용 경험
   3) 파이썬을 활용한 Tensorflow 또는 Pytorch 기반으로 딥러닝 관련 코딩 수행 경험
   4) 심층강화학습에 대한 기초 지식
학점은 A+/A, B+/B, C+/C, F 이렇게 총 4개의 그룹으로 나누어 부여할 예정이며, F로 평가될 학생이 없다면 A+/A, B+/B, C+/C 그룹에 대한 학점 분포는 40%, 40%, 20%로 나누어 부여할 예정이지만 강의가 종료된 이후 전반적인 학업성취도를 가늠하여 변경될 수 있습니다.
본 수업을 이수하기 위하여 참고하면 좋은 강화학습 기본 & 최적화 이론 강의
   1) 혁펜하임의 강화학습 강의
   2) 혁펜하임의 최적화 강의

#	Date	Book Presentation	Paper Presentation	Notice
01	03월 03일(월)	대체 공휴일
02	03월 10일(월)	- 강의 소개 - Neural Networks, Loss Functions, and Optimization 강의 노트
03	03월 17일(월)	- Neural Networks, Loss Functions, and Optimization
04	03월 24일(월)	- Information & Entropy 강의 노트
05	03월 31일(월)	- Information & Entropy	[RSS 2020] Learning Agile Robotic Locomotion Skills by Imitating Animals (사아밀러 밀러) 발표 자료 [CoRL 2020] Learning Locomotion Skills for Cassie: Iterative Design and Sim-to-Real (권성민) 발표 자료
06	04월 07일(월)	- Manifold Learning & Autoencoder 강의 노트	[RSS 2021] RMA: Rapid Motor Adaptation for Legged Robots [IROS 2022] Adapting Rapid Motor Adaptation for Bipedal Robots (김중길) 발표 자료
07	04월 14일(월)	- Manifold Learning & Autoencoder	[Science Robotics 2023] Learning Robust Perceptive Locomotion for Quadrupedal Robots in The Wild (유재민) 발표 자료 [ICLR 2022] Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers (최요한) 발표 자료
08	04월 21일(월)	- Variational AE (VAE) and ELBO 강의 노트	[ICRA 2023] DribbleBot: Dynamic Legged Manipulation in the Wild [CoRL 2023] GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots (석영준) 발표 자료 [Science Robotics 2023] Learning Quadrupedal Locomotion on Deformable Terrain (조근우) 발표 자료
09	04월 28일(월)	- Variational AE (VAE) and ELBO	[ICRA 2023] DreamWaQ: Learning Robust Quadrupedal Locomotion With Implicit Terrain Imagination (김민준) 발표 자료 [IJRR 2024] Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control (유연휘) 발표 자료 [CoRL 2023] ViNT: A Foundation Model for Visual Navigation
10	05월 05일(월)	공휴일 (휴강)
11	05월 12일(월)	- DQN to PPO 강의 노트	[CoRL 2022] RT-1: Robotics Transformer for Real-World Control at Scale (조재민) 발표 자료 [IROS 2022] COMPASS: Contrastive Multimodal Pretraining for Autonomous Systems (장연재) 발표 자료
12	05월 19일(월)	- DQN to PPO	[DeepMind 2022] GATO: A Generalist Agent (아셀) 발표 자료 [CoRL 2022] DayDreamer: World Models for Physical Robot Learning (윤호영) 발표 자료	Imitation Learning의 목표 시연을 통해 에이전트가 특정 작업이나 행동을 학습하도록 함. 시연 데이터는 관찰(observations)과 행동(actions) 간의 매핑을 학습하는 데 사용. - 참고 자료
13	05월 26일(월)	- DDPG, TD3, SAC 강의 노트	[NIPS 2022] Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos (최대준) 발표 자료 [DeepMind 2023] RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation (이태민) 발표 자료	- Deep Deterministic Policy Gradient (DDPG) Continuous Control with Deep Reinforcement Learning - Twin Delayed Deep Deterministic Policy Gradient (TD3) Addressing Function Approximation Error in Actor-Critic Methods - Soft Actor-Critic (SAC) Soft Actor-Critic Algorithms and Applications
14	06월 02일(월)	- DDPG, TD3, SAC	[CoRL 2023] RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (남태민) 발표 자료 [ICML 2023] LIV: Language-Image Representations and Rewards for Robotic Control (김재홍)
15	06월 09일(월)	- 기말고사	[IROS 2023] PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training (살램 아니사) 발표 자료 [ICRA 2024] RT-2-X: Open X-Embodiment: Robotic Learning Datasets and RT-X Models	- Imitation Learning A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges

4족보행 로봇 제어 관련 논문

~~Learning Agile Robotic Locomotion Skills by Imitating Animals~~
Xue Bin Peng, et al. - RSS, 2020
논문 링크
RMA: Rapid Motor Adaptation for Legged Robots
Kumar, Ashish, et al. - RSS, 2021
논문 링크
~~Learning Robust Perceptive Locomotion for Quadrupedal Robots in The Wild~~
Miki, Takuma, et al. - Science Robotics, 2022
논문 링크
~~Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers~~
Yang, Ruihan, et al. - ICLR, 2022
논문 링크
~~GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots~~
Feng, Gilbert, et al. - CoRL, 2022
논문 링크
DribbleBot: Dynamic Legged Manipulation in the Wild
Ji, Yandong, et al. - ICRA, 2023
논문 링크
~~Learning Quadrupedal Locomotion on Deformable Terrain~~
Choi, Sehoon, et al. - Science Robotics, 2023
논문 링크
~~DreamWaQ: Learning Robust Quadrupedal Locomotion With Implicit Terrain Imagination~~
Nahrendra, I Made Aswin, et al. - ICRA, 2023
논문 링크

휴머노이드 (이족보행) 로봇 제어 관련 논문

~~Learning Locomotion Skills for Cassie: Iterative Design and Sim-to-Real~~
Xie, Zhaoming, et al. - PMLR, 2020
논문 링크
~~Adapting Rapid Motor Adaptation for Bipedal Robots~~
Kumar, Ashish, et al. - IROS, 2022
논문 링크
~~Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control~~
Li, Zhongyu, et al. - 2024
논문 링크

기본 모델 (Foundation Model) 기반 로봇 제어 관련 논문

~~RT-1: Robotics Transformer for Real-World Control at Scale~~
Brohan, Anthony, et al. - CoRL, 2022
논문 링크
~~COMPASS: Contrastive Multimodal Pretraining for Autonomous Systems~~
Ma, Shuang, et al. - IROS, 2022
논문 링크
~~GATO: A Generalist Agent~~
Reed, Scott, et al. - DeepMind, 2022
논문 링크
~~DayDreamer: World Models for Physical Robot Learning~~
Wu, Philipp, et al. - CoRL, 2022
논문 링크
~~Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos~~
Baker, Bowen, et al. - NIPS, 2022
논문 링크
~~RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation~~
Google DeepMind Team - Google DeepMind, 2023
논문 링크
ViNT: A Foundation Model for Visual Navigation
Shah, Dhruv, et al. - CoRL, 2023
논문 링크
~~RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control~~
Brohan, Anthony, et al. - CoRL, 2023
논문 링크
~~LIV: Language-Image Representations and Rewards for Robotic Control~~
Yecheng Jason Ma, et al. - ICML, 2023
논문 링크
~~PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training~~
Bonatti, Rogerio, et al. - IROS, 2023
논문 링크
RT-2-X: Open X-Embodiment: Robotic Learning Datasets and RT-X Models: Open X-Embodiment Collaboration
Brohan, Anthony, et al. - ICRA, 2024
논문 링크

2. Course Information

- Lecturer: 한연희 교수 (Rm. 2공학관 423호, Email: yhhan@koreatech.ac.kr)
- Classes: 월요일 (19:00 ~ 21:50)
- Lecture Room: 2공학관 121A호
- Prerequisites: 머신러닝 및 딥러닝 기본 지식, 파이썬 기반의 PyTorch/Tensorflow 기본 코딩 경험

3. Presentation Evaluation

- BOOK: 내용 이해도 (60%), 발표 자료 충실도 (40%) - 반드시 모든 팀원이 모두 골고루 발표, 팀원들에게 동일한 점수 부여
- PAPER: 내용 이해도 (50%), 발표 자료 충실도 (30%), 발표 역량 (20%)

4. Home Work Guide

- 추후 구체적인 숙제 가이드 라인 제공
- 교재를 기반으로 강화학습 코딩 및 실험 결과 담은 리포트 제출

5. References

[주교재]

- 밑바닥부터 시작하는 딥러닝 4 (파이썬으로 직접 구현하며 배우는 강화 학습 알고리즘)
- 수업 시간 PDF로 제공
- 심층강화학습 핵심 논문

[부교재]

- 심층 강화학습 인 액션 : 기본 개념부터 파이썬 기반의 최신 알고리즘 구현까지
- 파이썬 기반 강화학습 알고리듬 DP, Q-Learning, AC, DQN, TRPO, PPO, DDPG, TD3 | Imitation Learning, ESBAS 알아보기
- 심층강화학습 주요 논문 모음:
- PyTorch 튜토리얼:

6. Logistics

- Attendance: one class absence will result in the deduction of two points out of 100 points. Five absences will not result in ten points deduction, but "failure" (i.e., grade 'F') in this course.
- Homework: much intensive homework will be set. Any cheating (or copying) will result in grade 'F'.
- Exam: there will be the final examination for the evaluation of the knowledge learned from the class.

7. Lecture Evaluation

Attendance (10%), Paper Presentation (30%), Homework Reports (20%), Final Exam. (40%)