Greenwashing occurs when companies exaggerate positive sustainability efforts and downplay negative impacts, which can mislead stakeholders. As part of the shared task for SwissText 2023, we collected self- and peer-reported data from companies. Participants used state-of-the-art natural language processing methods to identify greenwashing signals and provide objective assessments of companies.
Greenwashing has been high on the corporate social agenda for the past two years. It happens that companies publish too much positive data about their sustainability efforts while downplaying the negative impacts of their operations. This is due to the subjective nature of most environmental, social and governance (ESG) information, which gives a distorted picture of the company’s ESG performance and misleads investors. Data from external media can help address this issue. Third-party content providers often have no interest in promoting the sustainability efforts of a particular company. By considering data from a variety of media, a more objective and even critical picture of a company can be painted.
The organizers of the shared task for SwissText 2023 are Dr. Janna Lipenkova, CEO of Equintel GmbH, Germany, Susie Xi Rao, PhD candidate and researcher at ETH Zurich, and me, Dr. Guang Lu, lecturer in Data Science at the Lucerne University of Applied Sciences and Arts. With this task, we aim to better understand the nature of greenwashing through large-scale text analysis, uncover gaps that indicate greenwashing when comparing corporate communications and external data, and analyze whether certain ESG topics modeled in terms of the Sustainable Development Goals (SDGs) are more susceptible to greenwashing.
A total of 25 master’s students solved the shared task. They all come from the Applied Information and Data Science master’s program at the Lucerne University of Applied Sciences and Arts. The shared task was also included as a final project in the Computational Language Technologies course (Dr. Diego Antognini, Dr. Guang Lu). The dataset contains ESG documents of about 11k external and internal documents on 38 DAX companies (2021 – 2023) and 17 UN SDGs, which serve as a blueprint for social, economic and environmental challenges. The task was divided into five phases designed to encourage participants to use state-of-the-art natural language processing (NLP) techniques. This refers to data processing and exploratory text analysis (ETA), text sentiment annotation with large language models (LLMs), sentiment analysis of internal and external documents, alignment with the SDGs, and summary and report. Project phases 1 and 2 were solved individually by each student. Project phases 3 to 5 were solved by students in a group of two to three. Ten groups were formed for phases 3-5.
Some representative observations are that Sustainable Development Goals (SDGs) such as “Affordable and clean energy”, “Industry, innovation and infrastructure” and “Responsible consumption and production” are very relevant for companies, while “Gender equality” is the least relevant. In addition, the SDGs are of varying importance to companies in the automotive sector. It is worth noting that these conclusions were drawn based on the limited data and were also influenced by the data annotation approaches and LLMs used to detect text sentiment. Future work is therefore needed to further expand the data scope and dataset and to verify the results obtained. It is believed that the established text analysis pipeline in this shared task will provide a solid foundation for future related research.
Lecturer at IKM, Lucerne University of Applied Sciences and Arts