Situation awareness for safe robot-human cohabitation in production lines
One of the main goals of STAR is to ensure the optimization of a production line to increase the efficiency of the manufacturing process. We start from the assumption that efficiency and safety go hand in hand in a complex environment such as the production lines, in which operators, robots and automatic systems share dynamically the same physical workspace.
The aim of this module is to take advantage of modern computer vision approaches in order to recognize postures and motion of workers, locate them as well as the items occupying the environment. The main output will be an “average spatial heatmap” representing a probabilistic occupancy of the production lines based on fixed RGB cameras deployed in the factory. The purpose of this module is to feed a “planificator” indicating dynamically which areas should be avoided by robots’ fleet operating in the production lines.
The solution we imagine is conceived by merging the following technologies:
Skeleton extraction by human pose detection convolutional neural network (CNN)
Dynamic object detection via CNN
3D-localization and motion in the infrastructure and estimation of human-robot distances using the geometric calibration of fixed RGB cameras
Heterogeneous and homogeneous multi-sensor fusion merging video analytics results coming from cameras dispatched in the production lines including other localization sensor data.
More specfically, the skeleton detection algorithms allow to track human poses by detecting and estimating the position of the characteristic points defining human postures. These points are parts of the human body (feet, knees, shoulders, neck, nose, eyes). The approach based on a neural network called “OpenPose” creates heat maps for joint extraction and extracts affinity fields considering all the detected joints in order to infer the link between them and, consequently, allow the detection of human limbs.
Once humans or other items are detected from video footprint, they are located in the infrastructure. This absolute positioning of the elements of the scene requires the camera to be calibrated, in order to associate each pixel of the image space with absolute 3D coordinates system. Once an element is detected, it is projected on the ground, taking into account reference measurements (i.e. height of the body, robot dimension). The projection on the ground allows to estimate the actual 3D position and then the distances between any other elements of the images.
The RGB cameras play a fundamental role and have the advantage to be cheap, extremely robust, and long-term stable.
We think that this AI-based technology will open the door to future optimization of collaborative human-robot approach to safely share workspaces. The real-time evaluation of the occupancy level of the workspace plays an important role to define the paths robots could cross safely.