in ,

It took two years for Google to use reinforcement learning to build 23 robots to help sort garbage

Reinforcement learning (RL) allows robots to learn complex behaviors by interacting with them through trial and error and get better over time.

Some work at Google has explored how RL can enable robots to master complex skills such as grasping, multi-task learning, and even playing ping pong. Although reinforcement learning for robots has made great strides, we still don’t see robots powered by reinforcement learning in everyday environments. Because the real world is complex and diverse, and constantly changing over time, it poses great challenges for robotic systems. However, reinforcement learning should be an excellent tool for these challenges: through practice, continuous improvement, and learning on the job, robots should be able to adapt to a changing world.

In Google’s paper “Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators”, the researchers explored how to solve this problem through the latest large-scale experiments. They deployed a fleet of 23 over two years. A swarm of RL-enabled robots for waste sorting and recycling in a Google office building. The robotic system used combines scalable deep reinforcement learning from real-world data with guided and assisted object-aware inputs from simulated training to improve generalization while retaining end-to-end training benefits, by training 240 garbage stations 4800 evaluation trials to verify.

question setting

If people don’t sort their trash properly, batches of recyclables can become contaminated and compost can be improperly disposed of in landfills. In Google’s experiments, robots roamed around office buildings looking for “dumpsters” (recyclable bins, composting bins, and other waste bins). The task of the robot is to go to each dump station for waste sorting, to transport items between bins so that all recyclables (cans, bottles) are placed in recyclable bins, all compostable items (cardboard containers, paper cups) ) in the compost bin and everything else in the other bins.

Actually this task is not as easy as it seems. Just the subtask of picking up different items that people throw in the trash can is already a huge challenge. The robot must also identify the appropriate bin for each object and sort them as quickly and efficiently as possible. In the real world, robots encounter a variety of unique situations, such as the following example from a real office building:

learn from different experiences

Continuous learning on the job is helpful, but before getting to that point, a robot needs to be guided with a basic set of skills. To this end, Google used four sources of experience: (1) simple hand-designed strategies, which have a low success rate but help to provide initial experience; (2) simulated training frameworks, which use simulated-real transfers to provide some initial experience. (3) “robot classrooms”, the robot uses representative garbage stations to practice continuously; (4) real deployment environment, the robot practices in an office building with real garbage.

Schematic illustration of reinforcement learning in this large-scale application. The launch of the strategy is guided using the data generated by the script (top left). Then train a model from simulation to reality, generating additional data in the simulation environment (top right). At each deployment cycle, the data collected in the “robot classrooms” (bottom right image) is added. Deployment and collection of data in an office building (bottom left).

The reinforcement learning framework used here is based on QT-Opt, which is also used in the grasping of different garbage in the laboratory environment and a series of other skills. Boot from a simple scripted policy in a simulation environment, apply reinforcement learning, and use a CycleGAN-based transfer method to make simulated images look more realistic with RetinaGAN.

This is the start of entering the “robot classrooms”. While an actual office building can provide the most realistic experience, the throughput of data collection is limited – some times there will be a lot of trash to sort, other times not so much. Robots gain most of their experience in “robot classrooms”.

While the robots are being trained in “robot classrooms,” other robots are learning simultaneously at 30 waste stations across three office buildings.

classification performance

In the end, the researchers collected 540,000 test data from “robot classrooms” and 325,000 test data in the actual deployment environment. With the continuous increase of data, the performance of the whole system has been improved. The researchers evaluated the final system in “robot classrooms” for controlled comparisons, setting scenarios based on what the robot would see in real-world deployments. The average accuracy of the final system was about 84%, with performance steadily improving as more data was added. In the real world, the researchers recorded statistics for actual deployments in 2021 and 2022 and found that the system could reduce pollutants in trash cans by 40% to 50% by weight. In the paper, the Google researchers provide deeper insights into the design of the technology, weakening studies of various design decisions, and more detailed statistics of the experiments.

Conclusions and prospects for future work

The experimental results show that the system based on reinforcement learning can enable the robot to handle practical tasks in a real office environment. The combination of offline and online data enables robots to adapt to widely varying situations in the real world. At the same time, learning in a more controlled “classroom” environment, both in simulation and in the real world, can provide a powerful kick-start mechanism for the RL “flywheel” to start turning, enabling adaptation.

While important results have been achieved, much work remains: the final reinforcement learning strategy is not always successful, and more powerful models are needed to improve their performance and extend them to a wider range of tasks. Beyond that, other sources of experience, including from other tasks, other robots, and even Internet videos, may also further complement the priming experience gained from simulations and “classrooms.” These are issues that need to be addressed in the future.

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

At present, human beings have stepped out of the Milky Way, and the possible technologies include gravitational traction, wormhole travel and dark energy

Powering the Future: China’s Superionic Hydride Ion Conductor Breakthrough