deep science: robots, meet world

The Rapid Pace of Machine Learning Research
The volume of research papers published today makes it impossible for any single person to stay fully informed, particularly within the rapidly evolving field of machine learning. This field now impacts, and generates research within, nearly every industry and company. This analysis aims to highlight recent, significant discoveries – primarily in artificial intelligence – and explain their relevance.
This installment focuses on the intersection of AI, robotics, and the physical world. While many applications of these technologies involve real-world interactions, this research specifically addresses the inherent challenges arising from the limitations present on both the virtual and real sides of this interface.
The Challenge of Real-World Speed in Robotics
A consistent issue in robotics is the disparity between simulation speed and real-world execution time. While some robots demonstrate superhuman speed and agility in controlled environments, most require frequent comparisons between observations and their internal world model.
Consequently, even simple tasks like grasping and placing an object can consume several minutes. Addressing this challenge is a key focus of a Google project aimed at maximizing the value derived from each hour of real-world robot testing.
The team detailed in a technical blog post the complexities of integrating data from multiple robots engaged in diverse tasks. They’ve developed a unified system for task assignment and evaluation, dynamically adjusting future assignments based on performance.
Essentially, success in one task enhances a robot’s ability to perform others, even if dissimilar. This mirrors human learning – proficiency in throwing a ball aids in learning to throw a dart. Optimizing real-world training remains crucial, and this work demonstrates the potential for further improvements.
Enhancing Simulations for Realistic Robot Training
Another approach involves improving the fidelity of simulations to better reflect real-world conditions. The Allen Institute for AI’s THOR training environment, and its latest addition, ManipulaTHOR, exemplify this strategy.
Simulators like THOR provide a virtual environment where AI agents can acquire fundamental skills, such as navigating a room to locate a specific object – a surprisingly complex undertaking. These simulators balance realism with computational efficiency, allowing robots to accumulate thousands of virtual hours of experience without the constraints of physical maintenance.
ManipulaTHOR introduces a physical dimension to the simulation, enabling realistic interaction with objects like drawers. This allows researchers to explore questions like: What is the most effective way for a household robot to search for a pen? How can it open drawers without causing disruption? How should it grasp the pen and close the drawer afterward? Such scenarios are best explored through physical simulation within platforms like AI2-THOR.
Real-World Data: Evaluating Prosthetics and Exoskeletons
However, some insights can only be gained from real-world data, particularly when evaluating the use of prosthetics or exoskeletons. Simulation data is insufficient in these cases; actual user experience is paramount.
An Army Research Laboratory project is investigating how an ankle-supporting “exoboot” can interpret complex body signals to provide adaptive assistance. In a recent study, researchers collected brain and muscle signals alongside motion tracking data.
Their goal is to create a vocabulary of body states that the boot can recognize algorithmically, eliminating the need for manual input like “I’m tired” or “I’m carrying a heavy load.” Automated understanding of these states could be the difference between a helpful tool and an unwieldy burden.
Natural Language Communication Between Soldiers and Robots
A related ARL project focuses on developing conversational models to facilitate natural and efficient communication between soldiers and robots in the field. Effective interaction is vital, not only on the battlefield but also in critical environments like nuclear power plants.
The requirements for conversational agents in these settings differ significantly from those found in smartphones or home speakers, necessitating further research despite the substantial investments made by companies like Google, Apple, and Amazon.
Coordinated Drone Flight: Avoiding Collisions
Safe and collaborative operation is also essential for groups of robots. Coordinating a swarm of 5-10 drones to prevent collisions with each other and the surrounding environment is a challenging and ongoing problem.
A study from EPFL demonstrates that a relatively simple set of rules and observations can enable flying drones to avoid obstacles and other drones, while also anticipating their movements for coordinated flight.
For example, if drone A must choose between navigating left or right around an obstacle and observes ample space to avoid drone B to its left, it must also consider whether drone B is forced to move right around its own obstacle.
Failing to account for this could lead to a collision if drone B cannot react quickly enough. However, anticipating drone B’s maneuver allows drone A to choose a slightly less efficient path, ensuring smoother collective progress.
Leveraging "Dumb" Robots for Collaborative Tasks
A Georgia Tech study explored how to deploy robots with minimal intelligence to accomplish complex or teamwork-oriented tasks. Combining real-world observation with simulation, researchers found that these simple robots, equipped with magnets, naturally form collaborative clusters capable of moving objects heavier than any single robot could manage.
This approach could prove valuable for tasks requiring minimal supervision and utilizing inexpensive, basic robot agents.
Building a 3D Model of Cities for Autonomous Vehicles
The ultimate collaborative task may be managing the complex ecosystem of autonomous vehicles within a city. A crucial component of this is creating a detailed and up-to-date model of the urban environment.
While Google Street View cars have been collecting data for years, EPFL is developing a more comprehensive system with its ScanVan, featuring omnidirectional capture technology.
“The goal is to take advantage of a device that is able to see the full sphere surrounding it to capture every aspect of the scene within a single shot,” explained researcher Nils Hamel. Efficiently capturing and integrating 3D and RGB imagery is valuable in itself, but the team emphasizes the importance of capturing data over time to track changes in lighting, population, foliage, and traffic patterns.
Recognizing the potential for misuse as a surveillance tool, the team designed the system to obscure identifying information of individuals and vehicles from the outset.