Let the robot play table tennis, can it catch the ball with both forehand and backhand, and spin? Netizens: See you at the Olympics!

Written by | Ma Xuewei

Preface

Robots can now play table tennis and have reached the level of intermediate human players!

Without further ado, let's see how it wreaks havoc on human novices.

According to reports, this robot was created by the Google DeepMind research team and won 45% (13/29) of the 29 robot-human competitions . It is worth noting that all human players were new to the robot.

While the robot lost all of its matches against the top players, it beat 100% of the beginners and 55% of the intermediate players.

Photo｜Playing table tennis with a professional coach.

In response, professional table tennis coach Barney J. Reed said, "It's amazing to watch the robot compete with players of all levels and styles. Our goal is to get the robot to an intermediate level. I think this robot even exceeded my expectations."

The related research paper, titled “Achieving Human Level Competitive Robot Table Tennis”, has been published on the preprint website arXiv.

How to make a robot play table tennis?

Currently, table tennis is a major highlight of the Paris Olympics. Table tennis players demonstrate extremely high physical fitness levels, high-speed movement capabilities, precise control of various balls and superhuman sensitivity in the competition.

That’s why researchers have used table tennis as a benchmark for robots since the 1980s, developing many table tennis robots and making progress in returning the ball to the opponent’s half of the court, hitting the target, smashing, cooperative play, and many other key aspects of table tennis. However, no robot has yet played a full table tennis game against an unseen human opponent.

In this study, through techniques such as hierarchical and modular policy architecture, iterative definition of task distribution, simulation-to-simulation adaptation layer, domain randomization, real-time adaptation to unknown opponents and hardware deployment, the Google DeepMind team achieved amateur human-level performance in competitive table tennis between robots and human players.

Figure | Overview of the method.

1. Hierarchical and modular strategy architecture based on skill library

Low-Level Controllers (LLC) : This library contains various table tennis skills, such as forehand attack, backhand positioning, forehand serve, etc. Each LLC is an independent policy that focuses on the training of a specific skill. These LLCs are learned through neural networks and simulated and trained using the MuJoCo physics engine.

Figure｜LLC training library.

High Level Controller (HLC) : The HLC is responsible for selecting the most appropriate LLC based on the current game situation and opponent capabilities. It consists of the following modules:

Style Selection Strategy: This strategy chooses to use either the forehand or backhand depending on the type of ball (serve or attack).

Spin Classifier: This classifier determines whether the incoming ball has topspin or backspin.

LLC Skill Descriptors: These descriptors record each LLC's performance metrics under different ball conditions, such as hit rate and ball placement.

Strategy selection module: This module generates a candidate list of LLCs based on LLC skill descriptors, match statistics, and opponent capabilities.

LLC preference (H-value): This module uses the gradient bandit algorithm to learn the preference value of each LLC online and selects the final LLC based on the preference value.

Figure | Once the ball is hit, the HLC first determines which LLC to return the ball to by applying a style policy to the current ball state to determine forehand or backhand (forehand is chosen in this example).

2. Techniques for achieving zero-sample simulation to reality

Iteratively define task distribution: This method collects initial ball state data from human-human game data and trains LLC and HLC in a simulated environment. The data generated by the simulated training is then added to the real-world dataset and the process is repeated to gradually refine the training task distribution.

Simulation-to-simulation adaptation layer: To solve the problem caused by the difference in model parameters of upspin and downspin in the simulation environment, the paper proposes two solutions: rotation-allowing and simulation-to-simulation adaptation layer. Rotation-allowing is solved by adjusting the training dataset of LLC, while the simulation-to-simulation adaptation layer uses the FiLM layer to learn the mapping relationship between upspin and downspin.

Domain randomization: During training, the paper randomizes parameters such as observation noise, delay, table and racket damping, friction, etc. in the simulated environment to simulate the uncertainty in the real world.

Figure | Zero-shot simulation-to-real transformation.

3. Adapt to unknown opponents in real time

Real-time tracking of game statistics: HLC tracks game statistics in real time, such as the scores and turnovers of the robot opponent and the opponent, and adjusts LLC's preference values based on this data to adapt to changes in the opponent.

Online learning of LLC preferences: Through the gradient bandit algorithm, HLC can learn the preference value of each LLC online and select a more suitable LLC according to the opponent's weaknesses.

Figure｜Hierarchical control.

The research team collected a small amount of human-on-human play data to initialize task conditions. Then, they used reinforcement learning (RL) to train the agent in simulation and adopted a variety of techniques to deploy the policy zero-shot to real hardware. This agent played against human players to generate more training task conditions, and then repeated the training-deployment cycle. As the robot improved, the criteria for the game became more and more complex while still being based on real-world task conditions. This hybrid simulation-real cycle created an automated task curriculum that allowed the robot's skills to improve over time.

How was the fight?

To evaluate the agent’s skill level, the robot played competitive matches against 29 table tennis players of different skill levels—beginner, intermediate, advanced, and advanced+—as determined by a professional table tennis coach.

Against all opponents, the bot won 45% of its matches and 46% of its rounds. Breaking it down by skill level, we see that the bot won all its matches against beginners, lost all its matches against advanced and advanced+ players, and won 55% of its matches against intermediate players. This strongly suggests that the agent has reached the level of an intermediate human player in rounds.

Figure | Against all opponents, the robot won 45% of matches and 46% of games, winning 100% of matches against beginners and 55% of matches against intermediate players.

Study participants enjoyed playing with the robot, rating it highly for being “fun” and “engaging.” These ratings were consistent across skill levels and regardless of whether participants won or lost. They also overwhelmingly responded that they would “definitely” play with the robot again. When given free time to play with the robot, they played for an average of 4:06, for a total of 5 minutes.

Advanced players were able to exploit weaknesses in the robot's strategy, but they still enjoyed playing with it. In post-match interviews, they considered it a more dynamic practice partner than a ball-serving machine.

Figure ｜ Participants enjoyed playing with the robot and rated it highly for being "fun" and "engaging."

Shortcomings and Prospects

The research team said that this robot learning system still has some limitations , such as limited response to fast balls and low balls, low rotation detection accuracy, and lack of multi-ball strategies and tactics.

Future research directions include improving the robot's ability to handle various balls, learning more complex strategies, and improving motion capture technology.

The research team also stated that the hierarchical strategy architecture and zero-sample simulation-to-real conversion method proposed in this study can be applied to other robot learning tasks. In addition, real-time adaptation technology can help robots better adapt to changing environments and tasks. In addition, system design principles are also crucial for developing high-performance and robust robot learning systems.

<<: Excessive sun protection will affect the synthesis of vitamin D and cause osteomalacia? Dermatologists say

>>: How many steps are there to trim an elephant's feet?

The efficacy and function of Camellia thorn fruit

Recommend

Being beaten in a dream and waking up with a black eye? Revealing the terrifying interpretation of dreams!

Author: Zhou Fenli, deputy chief physician, Pekin...

The Long March 5B rocket successfully launched the Mengtian experimental module! China's Tiangong is just around the corner

At 15:37 on October 31, 2022, the Mengtian Experi...

Let the robot play table tennis, can it catch the ball with both forehand and backhand, and spin? Netizens: See you at the Olympics!

The efficacy and function of Camellia thorn fruit

What are the therapeutic effects of saffron

The seemingly alien "vampire" is actually a "living fossil" on Earth that existed much earlier than the dinosaurs

What are the traditional Chinese medicines that can promote blood circulation and remove blood stasis?

Did the moon fall really happen?

Effects and functions of red peony root

New quantum charging technology: Fully charging an electric car in as fast as 9 seconds?

The efficacy and function of small-leaved goldenrod

How can humans achieve "banana freedom"? First, cure the "cancer" of bananas!

Effects and functions of Baihe Lingzhi grass

Recommend

Being beaten in a dream and waking up with a black eye? Revealing the terrifying interpretation of dreams!

The Long March 5B rocket successfully launched the Mengtian experimental module! China's Tiangong is just around the corner

To celebrate New Year's Eve, first understand the aroma of sauce before drinking

Are the colorful fish meant to attract natural enemies to eat them?

Smart autumn harvest | Strawberries grown in the sky are sweet

Where does our "body odor" come from? Eating habits can actually change body odor

More than 60% of consumers have misunderstandings about these 4 issues regarding food additives!

The efficacy and function of Korean ash branches

What are the benefits of wormwood shampooing

Walruses don't eat with their teeth, they eat with their butts

What are the effects, functions and contraindications of French Pinellia?

The efficacy and function of owl bones

The efficacy and function of large-fruited glass grass

The differences between 7 types of commonly used oral hypoglycemic drugs are summarized and recommended for collection!

Does Huaijiao Pills work for treating hemorrhoids?