Skip to main content
Visual Scene Analysis Components

Safety Zones

To improve human robot cohabitation, a first prototype of a software to detect dynamically security or empty zones throughout the infrastructure using a global situation assessment was developed. For this, we implement AI based algorithms to analyze the scene using the global point of view of a static camera network already deployed in the factory.

Video analytics allows to exploit automatically the video streams in real time to detect anomalies and to raise immediately an alarm. To this end, the algorithms detect, track and localise elements of interest (such as people, robot and new object occupying the scene) over the time. This information will be integrated in another engine to alert the robots of the presence of any obstacles in the surrounding area, in such a way that the system will decide whether a new robot ‘path should be calculated to reach the docking station or to stop completely to avoid any collision.

More in depth, the Safety Zones Detection System exploits video footprints as input and will deliver the spatial heatmaps as results of the analytics. This process combines 2 main components as presented in Figure 1:

  • The elements extraction module;
  • The 3D object localisation.

Visual Scene Analysis Components

Figure 1 : Visual Scene Analysis Components

Particularly the elements extractor engine merges two deep learning algorithms, either for the skeleton reconstruction in order to follow the human gesture and pose and the other for the detection and classification of non-static object in the scene, with a background subtraction module.

This latter method allows to assess the difference between the background model and the current image in order to infer moving elements in the scene under observation. The 3D object localization takes as input the results of the elements extractor in order to localize them using Euclidian reference system. To archive this task a calibration software were developed in order to make a correspondence between the camera pixels and the physical word.