The newly born robot dog masters walking by itself after rolling for an hour, the latest achievement of Wu Enda’s great disciple

Welcome to the WeChat subscription number of “Sina Technology”: techsina

Wen / Mingmin

Source: Qubit (ID: QbitAI)

Now, let the mechanical dog roll on its own for an hour, and it can learn to walk!

The gait looks quite similar:

Can also hold a big stick of madness:

Even if he fell, he turned over and stood up again:

In this way, training a mechanical dog is really no different from ordinary dog ​​training.

This is the latest achievement brought by UC Berkeley, allowing robots to train and learn directly in the actual environment, no longer relying on simulators.

Using this method, the researchers trained four robots in a short period of time.

For example, the mechanical dog that learned to walk in 1 hour at the beginning;

There are also 2 robotic arms, which perform close to human level after 8-10 hours of actual grasping;

And a small robot with computer vision, after groping for 2 hours, it can smoothly scroll to the designated position.

The research was proposed by Pieter Abbeel et al. Pieter Abbeel is the first doctoral student of Andrew Ng, who just recently won the 2021 ACM Prize in Computing.

Currently, all software infrastructure for this approach has been open sourced.

An algorithm called “The Visionary”

The pipeline of this method can be roughly divided into 4 steps:

The first step is to put the robot in a real environment and collect data.

The second step is to transfer these data to the Replay Buffer. This step is to use historical data for training, “summarize experience”, and efficiently use the collected samples.

In the third step, World Model will learn from the existing experience, and then “brain supplement” to come up with a strategy.

In the fourth step, the Actor Critic algorithm is used to improve the performance of the policy gradient method.

Then the cycle goes back and forth, and the methods that have been refined are applied to the robot, and finally a feeling of “learning by groping for yourself” is achieved.

Specifically, the core link here is the World Model.

World Models is a fast unsupervised learning method proposed by DAVID HA et al. in 2018, and won the Oral Presentation of NIPS 2018.

Its core idea is that human beings form a mental model of the world based on prior experience, and our decisions and actions are based on this internal model.

For example, when a human is playing baseball, the speed of reaction is much faster than that of visual information conveyed to the brain. The reason why the ball can be returned correctly in this case is because the brain has already made an instinctive prediction.

Previously, based on the “brain supplement” learning method of the World Model, Google proposed a scalable reinforcement learning method called Dreamer.

The method proposed this time is based on this, called DayDreamer.

(It seems to be called a dreamer?

Specifically, the World Model is an agent model.

It includes a visual perception component that compresses the seen image into a low-dimensional representation vector as model input.

There is also a memory component that can make predictions about future representation vectors based on historical information.

Finally, it also includes a decision component, which can decide what action to take based on the visual perception component and the representation vector of the decision component.

Now, we return to the method proposed by the UC Berkeley scholars this time.

It is not difficult to find that the logic of the World Model Learning part is a process of experience accumulation, and the Behavior Learning part is a process of action output.

The method proposed in this paper mainly solves two problems in robot training:

efficiency and accuracy.

In general, the conventional method for training robots is reinforcement learning, where the operation of the robot is adjusted through trial and error.

However, this method often requires a very large number of tests to achieve good results.

Not only is it inefficient, but the cost of training is not low.

Later, many people proposed to train robots in simulators, which can improve efficiency and reduce costs.

However, the author of this paper believes that the performance of the simulator training method is not good enough in terms of accuracy, and only the real environment can make the robot achieve the best results.

From the results, in the process of training the robot dog, it only took 10 minutes for the robot dog to adapt to its own behavior.

Compared with the SAC method, the effect is significantly improved.

The new method also overcomes the challenges of visual localization and sparse rewards during robotic arm training, significantly outperforming other methods within a few hours of training.

research team

It is worth mentioning that the members of the research team who brought new results this time are also very impressive.

Among them, Pieter Abbeel is Wu Enda’s founding disciple.

He is now a UC Berkeley professor of electrical engineering and computer science, director of the Berkeley Robotics Learning Lab, co-director of the Berkeley AI Institute, and formerly of OpenAI.

Not long ago, he also received the 2021 ACM Prize in Computing for his contributions to robotics learning.

At the same time, he is also the co-founder of the AI ​​robotics company Covariant.

Another Ken Goldberg, is also a top expert in the field of AI.

He is now a professor of engineering at UC Berkeley, and his research interests include reinforcement learning and human-computer interaction.

In 2005, he was elected as an IEEE Fellow.

At the same time, Goldberg was an artist and the founder of the UC Berkeley Symposium on Arts, Technology and Culture.

In addition, Philipp Wu, Alejandro Escontrela, and Danijar Hafner work together.

Among them, Philipp Wu is only a senior student at UC Berkeley.

One More Thing

While watching a video of robotic dog training, we discovered that the Unitree robotic dog used by the researchers,

This brand comes from the Chinese company Yushu Technology, and the machine calf that has been on the Spring Festival Gala before also comes from its home.

Moreover, the video of the Go1 test conducted by the Yushu robot dog collectively was recently exposed, and it has also become popular abroad.

Paper address:

https://ift.tt/ir7xCwq

Reference link: https://ift.tt/pHL4Bu7


(Disclaimer: This article only represents the author’s point of view and does not represent the position of Sina.com.)

This article is reproduced from: http://finance.sina.com.cn/tech/csj/2022-06-30/doc-imizirav1299704.shtml
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment