Blog / Events

FOSS United at the UN Workshop on Data Commons

Taking part in the "Workshop on Data Commons" by the UN Office for Digital and Emerging Technologies, co-hosted with the CDPI and The Takshashila Institution.

January 17, 2025 · 5 min read

On the 13th of January 2025, I participated in an invite-only Workshop on Data Commons that included the United Nations Office for Digital and Emerging Technologies, co-hosted with the Centre for Digital Public Infrastructure (CDPI) and The Takshashila Institution. The workshop was meant to facilitate discussion around “Data Commons” - the need to define it, what is part of the “data commons” and what shouldn’t be, how it can be governed, and so on.

Participants in the workshop came from diverse backgrounds - from the legal profession, from think tanks, from funding agencies, from incubators, and so on. A few of the challenges highlighted by the participants at the workshop include

The non-zero financial cost of making Open Data available for access via the public internet. It takes money to store Open Data in a data storage facility e.g. a database or an object store and it costs more money to provide a way for people to access the data via the internet e.g. a server that handles HTTP requests for data, a web application to interact with and download data.
The pervasive lack of attribution regarding the usage of Open Data. Organizations in India have worked to curate datasets that can be used to train local-language LLMs. See for example the AIforBharat initiative. The datasets can be downloaded from the public internet, which has led to their inclusion in models trained by the Big Tech industry, without providing the necessary attribution
The sheer uselessness of most Open Data. Not all data is equal and just because data is made available publicly does not mean that it has value. A significant amount of Open Data available at the moment is practically useless and this is potentially because it was made public without any specific problem/use case in mind
The need for problem-specific data sharing. People are privately aware of data that exists in silos e.g. the healthcare industry that if made public under the right circumstances could be used to create incredible value. The data might not necessarily be “open” i.e. available to the entire public but only made accessible to a limited number of people/organizations that work with the relevant data custodian/authority, facilitating the creation of meaningful “open” data
The possibility of a two-way exchange of value i.e. if Open Data provides value to an organization, the organization needs to financially reward the curator of the Open Data. A framework to commercialize “open” data, enabling the curator of the “open” data to become sustainable
The lifecycle of Open Data i.e. the process of creating Open Data, understanding the usage of the Open Data by the public, and improving the Open Data based on their learnings

Now, if you are familiar with the FOSS ecosystem, you will immediately identify parallels in these challenges, with those faced by the FOSS community. I responded to some of the comments highlighted above as follows

FOSS suffers from a pervasive lack of attribution even though FOSS licenses explicitly require it so it is not surprising when Open Data is used without attribution. And the lack of attribution happens both with small and big organizations so this cannot be blamed on the lack of resources. A significant amount of awareness and advocacy is necessary to ensure that organizations that are using FOSS (and Open Data) provide the necessary acknowledgement and/or attribution when they use it
Like Open Data, much of the publicly available software/FOSS is useless. Just like not all data is made equal, not all FOSS is made equal. But, the common characteristic that most or all successful FOSS projects have in common is the fact that they all solve a very specific problem for someone - it could be an individual or it could be a large organization. Solving a specific problem also means that other organizations with the same/similar problem can adopt the project to reap benefits. Most FOSS projects also start small, they solve a small part of the actual problem and slowly evolve to address more and more of the problem domain. This leads to a lifecycle that implicitly accounts for the value generated for the users by incremental changes to these FOSS projects. Similarly, curating Open Data to solve specific problems, and iterating/expanding the Open Data as its’ value is realized by organizations enables the generation of useful Open Data
The value generated by FOSS projects as a whole, and in general software, is greater than the sum of their value and this is because FOSS can work with one another. The ability to draw clear boundaries means valuable domain-specific FOSS gets created. Enabling FOSS to communicate with one another allows for solving higher-order problems spanning multiple domains. Similarly, adopting open standards for data sharing or creating new open standards enables Open Data to be used together to solve higher-order problems

I wanted to make the following additional points but I wasn’t able to due to a lack of time

While hosting Open Data is a non-zero financial cost, this is negligible and we can assume that private players will identify business models to reduce this cost to zero. For example, hosting FOSS (source code) on GitHub has a zero cost, similar to several other source code hosting providers. This is because they offset this cost by providing services to organizations that want to host private source code. Similarly, private players can be expected to identify business models that enable zero-cost Open Data hosting. For this reason, it is unnecessary that governments create Open Data platforms, instead of publishing Open Data on existing popular private platforms
The software community is currently seeing a significant number of experiments being carried out to identify models of sustainability for FOSS. For example, projects are experimenting with new licenses like the Business Source License (BSL) and the Functional Source License (FSL) where the source is relicensed after a set duration into a different, usually permissive, FOSS license. Commercial OSS (COSS) and Open Core usually involve a “core” being released under a FOSS license while peripheral code is released under a commercial license, code that is usually specific to enterprise needs. We hope that the Open Data/Data Commons community learns from such experiments instead of having to relive them

Overall, FOSS and Open Data are part of the broader Knowledge Commons ecosystems. As such, it is understandable that they both share significant similarities. We hope the Open Data/Data Commons community continues engaging with the FOSS community to ensure that both communities can learn from one another and enable higher-order solutions together.

P.S. You’ll see me lurking in the background of some of the pictures shared publicly by the CDPI, the UN Office for Digital and Emerging Technologies, and The Takshashila Institution.

FOSS United at the UN Workshop on Data Commons

Poruri Sai Rahul