Can freely accessible training data be used free of charge for AI models in future?

[10:10 Wed,16.October 2024 by Rudi Schmidts]

The study ‘Copyright and training of generative AI models’ at least comes to the conclusion that the training of generative AI models does not fall under the so-called TDM barrier (text and data mining). Although this barrier allows automated data collection, in the case of AI training it is a different type of data processing, as no new insights are gained. Rather, the AI models reproduce data that is similar to the training data, which, according to the authors, requires a different legal framework.

In an interview with the two authors of the study (Tim W. Dornis and Sebastian Stober), they emphasise the need for an in-depth examination of AI technology and copyright law. They criticise the fact that the European AI Act, which was developed before the emergence of generative AI systems, does not provide a sufficient basis for the protection of copyrights. While AI companies argue that their training is covered by the AI Act, the authors contradict this view and call for a closer legal examination.

As the AI industry, especially large corporations such as Meta and Google, benefit from free training on copyrighted content, this leads to an imbalance, as the rights of authors have been largely ignored up to now. Dornis and Stober argue in favour of fairer licensing models and a social debate about the value of data.

Can freely accessible training data be used free of charge for AI models in the future?

You also predict that there will be extensive legal disputes about the training of AI models in the near future, particularly in the USA and Europe, which is why it will ultimately be necessary to find answers to this sometimes paradoxical situation.

In our view, the problem will only be solved by a serious reorganisation of copyright law - which, however, is nowhere in sight.

deutsche Version dieser Seite: Frei zugängliche Trainingsdaten weiter kostenlos für KI-Modelle nutzbar?