Mining the value of ‘Big Data’ in an era of Artificial Intelligence


While dissenting views around the proposed EU Directive on Copyright in the Digital Single Market have focused on the role of online content service providers and press publishers, new exceptions for text and data mining have slipped quietly under the radar. Yet text and data mining is a touchstone issue for the interface between intellectual property rights and access to data in the digital environment.

If adopted by the European Parliament and subsequently endorsed by the Council, the proposed Directive will bring in a number of new measures designed to adapt exceptions and limitations to the digital and cross-border environment. These measures include two new exceptions to facilitate text and data mining (TDM). While this development has been closely followed by stakeholders from the scientific research community, academic libraries and scholarly publishers, it has not been the subject of mainstream debate.

What is text and data mining and why does it matter?

Text and data mining refers to the use of new technologies to carry out automated, computational analysis of information in digital form, with the aim of discovering new insights, patterns and correlations. In an era of “Big Data,” this research technique can be deployed in a wide range of sectors, from analysts wishing to predict movements in financial markets, to marketing companies assessing social media sentiment around a product launch and medical researchers extracting new insights from scholarly articles on specific diseases.

Indeed, the prospect of potential medical breakthroughs has heightened the intensity of emotion around stakeholder discussion on the TDM exceptions. Text and data mining is also of special significance to the development of Artificial Intelligence, where machine learning may require access to large volumes of quality data. Publishers, however, have argued that TDM has to remain a managed process, subject to licensing, to avoid misuse of copyright-protected content and undermining of the integrity of their platforms by uncontrolled mining operations.

What are the TDM exceptions in the proposed directive?

According to the Commission, the proposed text and data mining exceptions address the current situation where universities and other research organisations are confronted with legal uncertainty as to the extent to which they can perform TDM on content. In some cases, TDM may involve acts protected by either copyright or the sui generis database right, notably the reproduction of works and or the extraction of contents from a database.

Article 3.1 of the proposed Directive therefore provides a mandatory exception for reproductions and extractions made by research organisations and cultural heritage institutions, in order to carry out text and data mining of works to which they have lawful access, for the purposes of scientific research. In line with European research policy which encourages public-private partnerships, the new exception should also benefit from the exception when their research activities are carried out in the context of a public-private partnership (provided that there is no preferential access to the results of the research).

How have stakeholders reacted to the new exceptions?

The precise meaning of “lawful access” has been much debated among TDM stakeholders, but recital 9 refers to having a subscription or open access licences, while Recital 11b elaborates that “legal access” should be understood as covering access to content based on open access policy or through contractual arrangements between rights holders and research or cultural heritage institutions. “Legal access” also covers access to content that is “freely available online.”

When the Commission published its original proposal for the Directive back in 2016, the draft TDM exception provoked serious concern among technology companies active in the area of data analytics and artificial intelligence. These companies were concerned that the existence of such an exception would call into practice mining operations that had been carried out for years and even expose them to retrospective litigation.

The proposed solution to this issue has been the inclusion of an additional, mandatory exception (Article 3a) which provides for a broad exception for reproduction and extractions of lawfully accessible works for the purposes of text and data mining. While this second TDM exception sounds sweeping, it includes the possibility for rights holder to opt out. The exception should only apply where there is lawful access to the work, including when it is published online, and insofar as the rights holders have not reserved the rights in an “appropriate manner”. Publishers will have breathed a sigh of relief that they can apparently opt out of this exception via their terms and conditions.

For their part, companies active in the field of data analytics and artificial intelligence will be reassured by the existence of an exception that should reduce legal uncertainty with respect to their mining activities. Nevertheless, this second exception may lead to further questions and possible litigation regarding the precise meaning of “legal access” and what constitutes rights holders reserving their rights in an “appropriate manner”. Moreover, it could be argued that this Article 3a effectively reverses the normal presumption of the subsistence of copyright in a work, so sits uneasily in the existing European copyright framework.

TDM will impact content creators and rights holders across all sectors

While the scientific publishers (and to a lesser extent press publishers) have been involved in the discussions on the two new exceptions, other content sectors are likely to be affected by these new exceptions in the long-term. After all, TDM techniques can be applied not only to text, but to images and sounds. The film, television and music sectors should now consider carefully the implications of the exceptions for their particular business models and potential challenges to their enforcement activities.

The interface between copyright and access to data is likely to be an ongoing source of friction between the producers of information, in all its forms, and those commercial and non-commercial interests wishing to process that data.

See here for further analysis of the proposed new EU Copyright Directive, and access the consolidated text here.

Francine is the Regulatory & Public Affairs Director, based in Bird & Bird's Brussels office. With 20 years' experience of working in the EU and U.S. regulatory environment, in both the private and public sectors, Francine has extensive knowledge of the European regulatory environment for the media, technology and communications sectors. At a time of enormous regulatory challenges that will define the future of the digital economy, she advises clients how to navigate complex EU decision-making processes to achieve specific industry goals.

Leave a Reply