GitHub Copilot is a new AI-powered code completion tool that can generate code snippets from natural language descriptions. It is powered by OpenAI Codex, a deep learning system that has been trained on billions of lines of public code. GitHub Copilot claims to be a “copilot, not a pilot”, meaning that it is not intended to write code for you, but rather to help you write code faster and better.
However, some developers have raised concerns about the legal and ethical implications of using GitHub Copilot. One of the main issues is that GitHub Copilot may emit code that is derived from or influenced by code that is licensed under the GNU General Public License (GPL). The GPL is a copyleft license that requires any derivative work to be licensed under the same terms as the original work. This means that if you use GitHub Copilot to generate code that is based on GPL code, you may be obliged to release your entire project under the GPL as well.
This could pose a problem for developers who want to use GitHub Copilot for commercial or proprietary projects, or for projects that are licensed under incompatible licenses. For example, if you use GitHub Copilot to generate code for a project that is licensed under the MIT license, which is a permissive license that allows you to do anything with the code as long as you include the original copyright notice and license, you may be violating the GPL if the generated code contains any GPL code.
How can you tell if GitHub Copilot emits GPL code? Unfortunately, there is no easy way to do so. GitHub Copilot does not provide any information about the source or license of the code it generates. It also does not guarantee that the generated code is original, correct, or free of legal issues. According to its FAQ, “it is your responsibility to use good judgment, as well as review and test the code before using it or making it available for others to use”.
One possible way to check if GitHub Copilot emits GPL code is to compare the generated code with existing public code repositories, such as GitHub itself. However, this may not be feasible or reliable, as there may be millions of lines of code to compare, and there may be subtle differences or variations in the generated code that make it hard to detect. Moreover, this may not cover all possible sources of GPL code, as GitHub Copilot may have been trained on other sources of public code that are not available on GitHub.
Another possible way to check if GitHub Copilot emits GPL code is to use tools that can detect similarities or plagiarism in code, such as MOSS (Measure Of Software Similarity) or JPlag. However, these tools may also have limitations and false positives, as they may not account for common patterns or idioms in programming languages, or they may flag legitimate uses of open source libraries or frameworks.
Ultimately, the best way to avoid using GPL code in your project is to not use GitHub Copilot at all, or to use it only for non-critical or experimental purposes. If you do use GitHub Copilot, you should always review and test the generated code carefully, and consult a lawyer if you have any doubts about its legality or compatibility with your project’s license. You should also respect the rights and wishes of the original authors of the code that GitHub Copilot may have learned from, and give them proper credit and attribution if required by their licenses.