Interlocking AI: Faster Robot Pick and Place

Robots are well-suited for repetitive “pick and place” tasks frequently found in warehouses, yet human performance still surpasses robotic capabilities in this area. Researchers at UC Berkeley are accelerating progress with a combination of machine learning models that enable a robot arm to determine its grasp and trajectory in a matter of milliseconds.
Humans perform the simple act of lifting an object and relocating it without conscious thought—it’s a skill honed through years of daily practice, supported by our highly developed senses and brain function. We don’t consider inefficient movement options like sharply jerking an object upwards, sideways, and then slowly lowering it; the paths we choose are typically efficient and limited.
Robots, however, lack this inherent understanding and intuitive reasoning. Without an immediately apparent solution, they must assess numerous potential movement paths for lifting and transferring an object, a process that involves calculating forces, anticipating potential collisions, and determining the appropriate grip type.
While robots can execute a chosen action quickly, the initial decision-making process is time-consuming—taking several seconds at best, and potentially much longer depending on the complexity of the situation. Fortunately, researchers at UC Berkeley have developed a solution that reduces this time by approximately 99 percent.
This system employs two machine learning models working sequentially. The first model rapidly generates a range of possible paths for the robot arm, drawing upon a vast dataset of example movements. It produces multiple options, from which a second machine learning model, trained to identify the optimal path, makes a selection. This initial path requires some refinement by a dedicated motion planner, but providing the planner with a pre-defined general path significantly reduces its workload.
Diagram showing the decision process – the first agent creates potential paths and the second selects the best. A third system optimizes the selected path.Without this initial step, the motion planner typically requires between 10 and 40 seconds to complete its task. However, with the “warm start” provided by the machine learning models, it rarely takes more than one-tenth of a second.
This represents a performance benchmark in a controlled environment, and real-world warehouse conditions will present additional challenges. While the actual task execution speed remains a factor, even reducing the motion planning phase from several seconds to near zero can yield substantial improvements in overall efficiency.
“Every second is valuable. Current systems dedicate up to half of their operational time to motion planning, so this technique has the potential to significantly increase the number of picks per hour,” explained Ken Goldberg, the lab director and senior author of the study. He also noted that improved computer vision technology is concurrently addressing the time required for accurate environmental sensing.
Currently, robots performing pick and place operations do not match the efficiency of human workers, but incremental improvements will progressively enhance their competitiveness, and ultimately surpass human capabilities. This work is particularly important as the current human-performed tasks are often hazardous and physically demanding, yet are essential to meet the demands of the expanding online retail sector.
The team’s findings have been published this week in the journal Science Robotics.