Skip to Main Content
Apache Events The Apache Software Foundation
Apache 20th Anniversary Logo

ASF Generative Tooling Guidance

Version: 1.0

Can contributions to ASF projects include AI generated content?

The Apache-2.0 license, and the Apache Individual Contribution License Agreement, both remind contributors that they are responsible for disclosing any copyrighted materials in submitted contributions that are not their original creation. This is as true when using generative AI tooling, as it is when using materials from public websites or code from other open-source projects.

When disclosing these materials, contributors should also identify the licensing for these materials. The ASF maintains a 3rd Party Licensing Policy that provides guidance on which licenses are acceptable, along with instructions on the treatment of 3rd Party Works.

While in general, content generated by a non-human (e.g., machine or monkey) is not copyrightable, if content consists of some portions generated by AI and other portions authored by a human, the portions authored by a human may be copyrightable.

As explained by the following U.S. Copyright Office Registration Guidance (3/16/2023):

“For example, a human may select or arrange AI-generated material in a sufficiently creative way that “the resulting work as a whole constitutes an original work of authorship.” Or an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection. In these cases, copyright will only protect the human-authored aspects of the work, which are ‘independent of’ and do ‘not affect’ the copyright status of the AI-generated material itself.”

These portions authored by a human may simply come from the prompt the human provided or subsequent changes they make. However, a prominent concern with generative AI is the risk of reproducing copyrightable portions of materials that they were trained on, some of which may be copyrightable subject matter. Thus, a recommended practice when using generative AI tooling is to use tools with features that identify any included content that is similar to parts of the tool’s training data, as well as the license of that content.

Given the above, code generated in whole or in part using AI can be contributed if the contributor ensures that:

  1. The terms and conditions of the generative AI tool do not place any restrictions on use of the output that would be inconsistent with the Open Source Definition.
  2. At least one of the following conditions is met:
    1. The output is not copyrightable subject matter (and would not be even if produced by a human).
    2. No third party materials are included in the output.
    3. Any third party materials that are included in the output are being used with permission (e.g., under a compatible open-source license) of the third party copyright holders and in compliance with the applicable license terms.
  3. A contributor obtains reasonable certainty that conditions 2.2 or 2.3 are met if the AI tool itself provides sufficient information about output that may be similar to training data, or from code scanning results.

When providing contributions authored using generative AI tooling, a recommended practice is for contributors to indicate the tooling used to create the contribution. This should be included as a token in the source control commit message, for example including the phrase “Generated-by: ”. This allows for future release tooling to be considered that pulls this content into a machine parsable Tooling-Provenance file.

Finally, please note that while the above seems like a reasonable set of guidelines in June 2023, this is a rapidly evolving area. Whatever we recommend to PMCs today, policies will need to be re-evaluated and updated in response to:

We will continue communicating with PMC and ASF members as updates to this FAQ get discussed and merged in.

What about Documentation?

The above text applies to documentation as well. Pay attention to tools that have restrictive licensing for the generated content, caution should be applied, make sure it complies with the 3rd Party Licensing Policy and 3rd Party Works.

What about Images?

As with documentation, the above principles would still apply. Though with images being a non-textual form, the details quickly become complex. We expect this to continue to be a rapidly evolving area.

The tool we want to use may rely on other tools. What terms of use do we need to consider when following this guidance?

Don't second guess vendor's terms of use (TOU). Your usage of their tools is bound by the totality of the given TOU and you are not expected to go outside of the TOU text for further clarifications.

What do we do if a contribution includes AI generated content and some form of tooling has identified materials that have been copied?

Refer to the 3rd Party Licensing Policy as with any other contribution.

Can the ASF provide a list of approved generative AI Tools?

It is not in the interest of the ASF to tell developers what tools to use. You may use whatever tools you wish provided that you follow the guidance in this document.