

– but those suggestions are actually fictitious information synthesized from patterns in training data and therefore do not relate to any particular individual. In some cases, the model will suggest what appears to be personal data – email addresses, phone numbers, etc. From our internal testing, we found it to be very rare that GitHub Copilot suggestions included personal data verbatim from the training set. As the developer, you are always in charge.īecause Codex, the model powering GitHub Copilot, was trained on publicly available code, its training set included personal data that was included in that code. Like any other code, code suggested by GitHub Copilot should be carefully tested, reviewed, and vetted. For suggested code, certain languages like Python, JavaScript, TypeScript, and Go might perform better compared to other programming languages. When converting comments written in non-English to code, there may be performance disparities when compared to English. And it may suggest old or deprecated uses of libraries and languages. GitHub Copilot can only hold a very limited context, so it may not make use of helpful functions defined elsewhere in your project or even in the same file. It is designed to generate the best code possible given the context it has access to, but it doesn’t test the code it suggests so the code may not always work, or even make sense. However, GitHub Copilot does not write perfect code. We also found that on average more than 27% of developers’ code files were generated by GitHub Copilot, and in certain languages like Python that goes up to 40%. In a recent evaluation, we found that users accepted on average 26% of all completions shown by GitHub Copilot.
