Let the robot play table tennis, can it catch the ball with both forehand and backhand, and spin? Netizens: See you at the Olympics!

Let the robot play table tennis, can it catch the ball with both forehand and backhand, and spin? Netizens: See you at the Olympics!

Written by | Ma Xuewei

Preface

Robots can now play table tennis and have reached the level of intermediate human players!

Without further ado, let's see how it wreaks havoc on human novices.

According to reports, this robot was created by the Google DeepMind research team and won 45% (13/29) of the 29 robot-human competitions . It is worth noting that all human players were new to the robot.

While the robot lost all of its matches against the top players, it beat 100% of the beginners and 55% of the intermediate players.

Photo|Playing table tennis with a professional coach.

In response, professional table tennis coach Barney J. Reed said, "It's amazing to watch the robot compete with players of all levels and styles. Our goal is to get the robot to an intermediate level. I think this robot even exceeded my expectations."

The related research paper, titled “Achieving Human Level Competitive Robot Table Tennis”, has been published on the preprint website arXiv.

How to make a robot play table tennis?

Currently, table tennis is a major highlight of the Paris Olympics. Table tennis players demonstrate extremely high physical fitness levels, high-speed movement capabilities, precise control of various balls and superhuman sensitivity in the competition.

That’s why researchers have used table tennis as a benchmark for robots since the 1980s, developing many table tennis robots and making progress in returning the ball to the opponent’s half of the court, hitting the target, smashing, cooperative play, and many other key aspects of table tennis. However, no robot has yet played a full table tennis game against an unseen human opponent.

In this study, through techniques such as hierarchical and modular policy architecture, iterative definition of task distribution, simulation-to-simulation adaptation layer, domain randomization, real-time adaptation to unknown opponents and hardware deployment, the Google DeepMind team achieved amateur human-level performance in competitive table tennis between robots and human players.

Figure | Overview of the method.

1. Hierarchical and modular strategy architecture based on skill library

Low-Level Controllers (LLC) : This library contains various table tennis skills, such as forehand attack, backhand positioning, forehand serve, etc. Each LLC is an independent policy that focuses on the training of a specific skill. These LLCs are learned through neural networks and simulated and trained using the MuJoCo physics engine.

Figure|LLC training library.

High Level Controller (HLC) : The HLC is responsible for selecting the most appropriate LLC based on the current game situation and opponent capabilities. It consists of the following modules:

Style Selection Strategy: This strategy chooses to use either the forehand or backhand depending on the type of ball (serve or attack).

Spin Classifier: This classifier determines whether the incoming ball has topspin or backspin.

LLC Skill Descriptors: These descriptors record each LLC's performance metrics under different ball conditions, such as hit rate and ball placement.

Strategy selection module: This module generates a candidate list of LLCs based on LLC skill descriptors, match statistics, and opponent capabilities.

LLC preference (H-value): This module uses the gradient bandit algorithm to learn the preference value of each LLC online and selects the final LLC based on the preference value.

Figure | Once the ball is hit, the HLC first determines which LLC to return the ball to by applying a style policy to the current ball state to determine forehand or backhand (forehand is chosen in this example).

2. Techniques for achieving zero-sample simulation to reality

Iteratively define task distribution: This method collects initial ball state data from human-human game data and trains LLC and HLC in a simulated environment. The data generated by the simulated training is then added to the real-world dataset and the process is repeated to gradually refine the training task distribution.

Simulation-to-simulation adaptation layer: To solve the problem caused by the difference in model parameters of upspin and downspin in the simulation environment, the paper proposes two solutions: rotation-allowing and simulation-to-simulation adaptation layer. Rotation-allowing is solved by adjusting the training dataset of LLC, while the simulation-to-simulation adaptation layer uses the FiLM layer to learn the mapping relationship between upspin and downspin.

Domain randomization: During training, the paper randomizes parameters such as observation noise, delay, table and racket damping, friction, etc. in the simulated environment to simulate the uncertainty in the real world.

Figure | Zero-shot simulation-to-real transformation.

3. Adapt to unknown opponents in real time

Real-time tracking of game statistics: HLC tracks game statistics in real time, such as the scores and turnovers of the robot opponent and the opponent, and adjusts LLC's preference values ​​based on this data to adapt to changes in the opponent.

Online learning of LLC preferences: Through the gradient bandit algorithm, HLC can learn the preference value of each LLC online and select a more suitable LLC according to the opponent's weaknesses.

Figure|Hierarchical control.

The research team collected a small amount of human-on-human play data to initialize task conditions. Then, they used reinforcement learning (RL) to train the agent in simulation and adopted a variety of techniques to deploy the policy zero-shot to real hardware. This agent played against human players to generate more training task conditions, and then repeated the training-deployment cycle. As the robot improved, the criteria for the game became more and more complex while still being based on real-world task conditions. This hybrid simulation-real cycle created an automated task curriculum that allowed the robot's skills to improve over time.

How was the fight?

To evaluate the agent’s skill level, the robot played competitive matches against 29 table tennis players of different skill levels—beginner, intermediate, advanced, and advanced+—as determined by a professional table tennis coach.

Against all opponents, the bot won 45% of its matches and 46% of its rounds. Breaking it down by skill level, we see that the bot won all its matches against beginners, lost all its matches against advanced and advanced+ players, and won 55% of its matches against intermediate players. This strongly suggests that the agent has reached the level of an intermediate human player in rounds.

Figure | Against all opponents, the robot won 45% of matches and 46% of games, winning 100% of matches against beginners and 55% of matches against intermediate players.

Study participants enjoyed playing with the robot, rating it highly for being “fun” and “engaging.” These ratings were consistent across skill levels and regardless of whether participants won or lost. They also overwhelmingly responded that they would “definitely” play with the robot again. When given free time to play with the robot, they played for an average of 4:06, for a total of 5 minutes.

Advanced players were able to exploit weaknesses in the robot's strategy, but they still enjoyed playing with it. In post-match interviews, they considered it a more dynamic practice partner than a ball-serving machine.

Figure | Participants enjoyed playing with the robot and rated it highly for being "fun" and "engaging."

Shortcomings and Prospects

The research team said that this robot learning system still has some limitations , such as limited response to fast balls and low balls, low rotation detection accuracy, and lack of multi-ball strategies and tactics.

Future research directions include improving the robot's ability to handle various balls, learning more complex strategies, and improving motion capture technology.

The research team also stated that the hierarchical strategy architecture and zero-sample simulation-to-real conversion method proposed in this study can be applied to other robot learning tasks. In addition, real-time adaptation technology can help robots better adapt to changing environments and tasks. In addition, system design principles are also crucial for developing high-performance and robust robot learning systems.

<<:  Excessive sun protection will affect the synthesis of vitamin D and cause osteomalacia? Dermatologists say

>>:  How many steps are there to trim an elephant's feet?

Recommend

It’s cold, do you feel the static electricity?

In our daily life, you must have felt static elec...

The efficacy and function of Aconitum bicolor

There are many types of Chinese medicine. When we...

What are the effects of drinking winter mulberry leaves soaked in water?

In the southern region, mulberry trees are a very...

What are the effects and functions of Toosendan seeds?

Many of my friends may not have heard of Toosenda...

Alien spacecraft? Asteroid? Five years ago, its arrival puzzled scientists

No romance can compare to the sea of ​​stars Peop...

Is it true that food nowadays is not as delicious as before?

In the past, when food resources were scarce, it ...

The efficacy and function of Iris odorata[picture]

The development of Western medicine has brought s...

What is the best herbal medicine for bull whip stew?

There is a saying in China that what you eat will...

What is the medicinal value of Northeastern ginseng?

There are three treasures in Northeast China: gin...

Which ginseng nourishes the kidney

Ginseng can be said to be a very common tonic for...

Inner Mongolia Cistanche deserticola

We all know that Inner Mongolia is a remote area ...

The efficacy and function of the acanthus bed

As a traditional Chinese medicine, Acanthus colum...