Skip to main content
Blog R2M

Occupational Information Databases: data sources for the capacity building

The United Nations defines Capacity Building as "the process of developing and strengthening the skills, instincts, abilities, processes and resources that organizations and communities need to survive, adapt, and thrive in a fast-changing world". In the manufacturing context, this concept refers to the idea that individuals should be able to identify their strengths and abilities, and find ways to enhance them, while adapting to the ever-evolving technological landscape.

This need manifests itself at various stages of a professional career. For example, when attempting to enter the sector, it is important to have a clear idea of the requirements to be eligible for a particular role or to understand what skills are required to perform tasks of interest. If the individual has been employed in the sector for an extended period of time, they may wish to progress to a new role or even change their profession completely. Therefore, it is beneficial to be aware of the similarities between current and future occupations. Furthermore, if the individual is seeking a new role, it is also beneficial to understand how to adapt to the new requirements. In our fast-changing world, learning new skills is rather important for workers. It helps them to keep up with the latest technology, making sure they stay good at their jobs and can handle new challenges.

One of the questions that arises when trying to discover these capabilities and needs is where to find reliable and relevant data. It is true that there is an infinite amount of information spread all over the Internet. It is also true that the presence of AI-powered conversational agents and chatbots can provide information of interest. But there is always the question of whether this information is relevant, based on reliable data or whether it is a hallucination of the AI model.

For this reason, it is key that AI tools that help in capacity building are based on reliable and relevant data sources. One of them is the well-known occupational information database O*NET (, which has been developed under the sponsorship of the US Department of Labor/Employment and Training Administration and is the primary source of occupational information in the United States. With more than 20 years since its first version, with periodic updates and a well-defined data collection program that includes both workers and employers, and with the certainty that being used as a source for training programs and labor studies in several countries, it provides high-quality and credible occupational information.

This database includes a considerable amount of occupational information, including occupations and their alternative names, and entities related to skills, tech skills, knowledge, and tasks. This is the main and most relevant information included in O*NET, and also the most used by researchers and application developers.

ESCO (, European Skills, Competences, Qualifications and Occupations, is the version adapted to the European labour market, also including information on education and training. Among many other functions, ESCO is being used by public employment services in different European countries to design multilingual job profiles or for matchmaking between work experience and skills. The European Commission updates ESCO regularly, including mappings to international classifications such as ISCO-08 (

In conclusion, these databases are being increasingly used to develop AI tools for the manufacturing industry. Tools that can help in the knowledge of individual skills and allow workers to train and adapt to changes. In fact, they are of great importance so that the data and suggestions offered by AI based software tools are aligned with reliable and reference sources.

At STAR Project, as part of the research tasks on Natural Language Processing, conversational agents for capacity building and the development of the worker training portal, several applications have been developed based on these databases. Specifically, both the STAR virtual interviewer and the occupational information chatbot are based on O*NET as the main data source, and are extended through different AI and NLP techniques so that the user can interact with this reference information in a natural way.

By: Diego Reforgiato Recupero / University of Cagliari & R2M Solution, Antonello Meloni / University of Cagliari, Danilo Dessì / GESIS - Leibniz Institute for the Social Science and Rubén Alonso / R2M Solution