NT 590 特價中
在本課程中將學習並實現一種新的令人難以置信的聰明的人工智慧模型,稱為雙延遲 DDPG( Twin-Delayed DDPG ),它結合了人工智慧領域的最新技術,包括連續雙深度 Q 學習( Double Deep Q-Learning )、政策梯度( Policy Gradient )和 Actor Critic。 這個模型是如此強大,以至於在我們的課程中,我們第一次能夠解決最具挑戰性的虛擬人工智慧應用程式(訓練一隻螞蟻 / 蜘蛛和一個半人形機器人在田野中行走和奔跑)。
https://softnshare.com/deep-reinforcement-learning/
同時也有1部Youtube影片,追蹤數超過80萬的網紅Science Experiments with Physics Engine,也在其Youtube影片中提到,強化学習で人に二足歩行を覚えさせました。「proximal policy optimization (PPO)」というアルゴリズムを使っています。 Proximal Policy Optimization Algorithms https://arxiv.org/abs/1707.06347 T...
reinforcement learning policy 在 國立陽明交通大學電子工程學系及電子研究所 Facebook 的最佳解答
交通大學IBM中心特別邀請到ECE Department at New York University 的 Prof. H. Jonathan Chao 前來為我們演講,歡迎有興趣的老師與同學免費報名參加!
演講標題:CFR-RL: Traffic Engineering with Reinforcement Learning in SDN
演 講 者:Prof. H. Jonathan Chao (ECE Department at New York University)
時 間:2020/01/20(一) 15:00 ~ 17:00
地 點:交大工程四館816室
活動報名網址:https://forms.gle/k5txEfTX6jM7PBR98
聯絡方式:曾紫玲 Tel:03-5712121分機54599 Email:tzuling@nctu.edu.tw
Abstract:
Traffic Engineering (TE) is one of important network features for Software-Defined Networking (SDN) with an aim to help Internet Service Providers (ISPs) optimize network performance and resource utilization by configuring the routing across their backbone networks. Although TE solutions can achieve the optimal or near-optimal performance by rerouting as many flows as possible, they do not usually consider the negative impact, such as packet out of order, when frequently rerouting flows in the network. To mitigate the impact of network disturbance, one promising TE solution is forwarding the majority of traffic flows using Equal-Cost Multi-Path (ECMP) and selectively rerouting a few critical flows using SDN to balance link utilization of the network. However, critical flow rerouting is not trivial because the solution space for critical flow selection is immense. Moreover, it is impossible to design a heuristic algorithm for this problem based on fixed and simple rules, since rule-based heuristics are unable to adapt to the changes of the traffic matrix and network dynamics. In this talk, we describe a Reinforcement Learning (RL)-based scheme, called CFR-RL, that learns a policy to select critical flows for each given traffic matrix automatically. It then reroutes these selected critical flows to balance link utilization of the network by formulating and solving a simple Linear Programming (LP) problem. Extensive evaluations show that CFR-RL outperforms the best heuristic by 7.4% - 12.2% and reroutes only 10% - 21.3% of total traffic.
Biography:
H. Jonathan Chao is Professor of Electrical and Computer Engineering (ECE) at NYU, where he joined in January 1992. He is currently Director of High-Speed Networking Lab. He was Head of ECE Department from 2004-2014. He has been doing research in the areas of software defined networking, network function virtualization, datacenter networks, packet processing and switching, network security, and machine learning for networking. He holds 63 patents and has published more than 265 journal and conference papers. During 2000–2001, he was Co-Founder and CTO of Coree Networks, NJ, where he led a team to implement a multi-terabit router with carrier-class reliability. From 1985 to 1992, he was a Member of Technical Staff at Bellcore, where he was involved in network architecture designs and ASIC implementations, such as the world’s first SONET-like Framer chip, ATM Layer chip, Sequencer chip (the first chip handling packet scheduling), and ATM switch chip. He is a Fellow of National Academy of Inventors (NAI) for “having demonstrated a highly prolific spirit of innovation in creating or facilitating outstanding inventions that have made a tangible impact on quality of life, economic development, and the welfare of society.” He is a Fellow of the IEEE for his contributions to the architecture and application of VLSI circuits in high-speed packet networks. He received Bellcore Excellence Award in 1987. He is a co-recipient of the 2001 Best Paper Award from the IEEE Transaction on Circuits and Systems for Video Technology. He coauthored three networking books. He worked for Telecommunication Lab in Taiwan from 1977 to 1981. He received his B.S. and M.S. degrees in electronics engineering from National Chiao Tung University, Taiwan, in 1977 and 1980, respectively, and his Ph.D. degree in electrical engineering from The Ohio State University in 1985.
reinforcement learning policy 在 Science Experiments with Physics Engine Youtube 的最佳貼文
強化学習で人に二足歩行を覚えさせました。「proximal policy optimization (PPO)」というアルゴリズムを使っています。
Proximal Policy Optimization Algorithms
https://arxiv.org/abs/1707.06347
Twitter:https://twitter.com/physics_engine0
BGM:
「Trick or treat」written by GT-K
「Halloween Monsters」written by ISAo.
#物理エンジンくん