BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News First Open Source Copyright Lawsuit Chal­lenges GitHub Copi­lot

First Open Source Copyright Lawsuit Chal­lenges GitHub Copi­lot

A class-action law­suit has been filed in a US fed­eral court chal­leng­ing the legal­ity of GitHub Copi­lot and the related OpenAI Codex. The suit against GitHub, Microsoft, and OpenAI claims violation of open-source licenses and could have a wide impact in the world of artificial intelligence.

GitHub previewed Copilot, an OpenAI-powered coding assistant, in the summer of 2021 and announced its general availability last July. Powered by the artificial intelligence model OpenAI Codex, the service is a cloud-based tool to assist developers in writing new code by analyzing existing code and comments on GitHub.

The litigation was submitted by Matthew Butterick, programmer and lawyer, and the law firm Joseph Saveri, a group specialized in antitrust and class actions. According to the pursuers, by train­ing their AI sys­tems on pub­lic repos­i­to­ries the defendants have vio­lated the rights of many developers who posted code under different open-source licenses that require attri­bu­tion, includ­ing the MIT, GPL, and Apache licenses.

In a previous article, Butterick questions how the service had been trained with machine learning using billions of lines of code written by human programmers and argues that the solution should not be a new open-source license:

Some have suggested creating an open-source license that forbids AI training. But this kind of usage-based restric­tion has never been part of the open-source ethos. (...) By the same token, it does not make sense to hold AI systems to a different stan­dard than we would hold human users. Wide­spread open-source license viola­tions should not be shrugged off as an unavoid­able cost.

Alex Champandard, artificial intelligence expert and co-founder of creative.ai, assesses the case:

Reading through the GitHub CoPilot litigation submitted; although it was pulled off quickly — it's a solid piece of work! The defendants (...) are in a very bad position. The documents show how Codex and CoPilot act like databases; they have three different examples of JS code that is recited verbatim — with mistakes — from licensed sources. (...) The documents then proceed to cast doubt on the claim of FairUse, that even if it was applicable here, it wouldn't help circumvent (a) the breach of contract, (b) the privacy issues, and (c) the DMCA.

In a Twitter thread, Giuseppe Bertone, developer advocate at Swirlds Labs, disagrees:

Developers are liable for what they use: their brain, copy from Slack Overflow, AI tools, pen & paper, etc. GitHub Copilot is just a tool - a toy, currently - like many others. Sue developers that use copyrighted code incorrectly, regardless of why and how they did it.

The litigation is considered the first class-action case chal­leng­ing the training and out­put of AI sys­tems and the impacts might not affect only Copilot. Microsoft and GitHub are not the only companies working on ML-powered coding assistants, with AWS unveiling the preview of Amazon CodeWhisperer earlier this year.

According to the Authors Alliance, the lawsuit raises important questions about how researchers can use AI to train and produce outputs using datasets based on copyrighted materials. Jeremy Daly, author of the weekly serverless newsletter Off-by-none, comments:

Who would have thought that AI-generated code that learned from private repositories would result in a lawsuit alleging "software piracy on an unprecedented scale"

Butterick created a separate website with some back­ground information about the case. GitHub, Microsoft, and OpenAI have not yet commented on the lawsuit.

About the Author

Rate this Article

Adoption
Style

BT