CLAUDETTE: Automating Legal Evaluation of Terms of Service and Privacy Policies using Machine Learning

It is possible to teach machines to read and evaluate terms of service and privacy politics for you.

Have you ever actually read the privacy policy and terms of service you accept? If so, you’re an exception. Consumers do not read these documents. They are too long, too complex, and there are too many of them. And even if they did the documents, they have no way to change them.

Regulators around the world, acknowledging this problem, put in place rules on what these documents must and must not contain. For example, the EU enacted regulations on unfair contractual terms; and recently the General Data Protection Regulation. The latter, applicable since 25th May 2018, makes clear what information must be presented in privacy policies, and in what form. And yet, our research has shown that, despite substantive and procedural rules in place, online platforms largely do not abide by the norms concerning terms of service and privacy policies. Why? Among other reasons, there is just too much for the enforcers to check. With virtually thousands of platforms and services out there, the task is overwhelming. NGOs and public agencies might have competence to verify the ToS and PPs, but lack the actual capability to do so. Consumers have rights, civil society has its mandate, but no one has time and resources to bring them into application. Battle lost? Not necessarily. We can use AI for this good cause.

The ambition of the CLAUDETTE Project, hosted at the Law Department of the European University Institute in Florence, and supported by engineers from the University of Bologna and the University of Modena and Reggio Emilia, is to automate the legal evaluation of terms of service and privacy policies of online platforms, using machine learning. The project’s philosophy is to empower the consumers and civil society using artificial intelligence. Currently artificial intelligence tools are used mostly by large corporations and the state. However, we believe that with efforts of academia and the civil society AI-powered tools for consumers and NGOs can and should be created. Our most technically advanced tool, described in our recent paper, CLAUDETTE: an Automated Detector of Potentially Unfair Clauses in Online Terms of Service, can detect potentially unfair contractual clauses with 80%-90% accuracy. Such tools can be used both to increase consumers’ autonomy (tell them what they accept), and increase efficiency and effectiveness of the civil society’s work, by automating big parts of their job.

Our most recent work has been an attempt to automate the analysis of privacy policies under the GDPR. This project, funded and supported by the European Consumer Organization, has led to the publication of the report: Claudette Meets GDPR: Automating the Evaluation of Privacy Policies Using Artificial Intelligence. Our findings indicate that the task can indeed be automated once a significantly larger learning dataset is created. This learning process was interrupted by major changes in privacy policies undertaken by the majority of online platforms around 25 May 2018, the date when the GDPR started being applicable. Nevertheless, the project led us to interesting conclusions.

Doctrinally, we have outlined what requirements a GDPR-complaint privacy policy should meet (comprehensive information, clear language, fair processing), as well as the ways in which these documents can be unlawful (if required information is insufficient, language unclear, or potentially unfair processing indicated). Anyone – researchers, policy drafters, journalists – can use these “golden standards” to help them asses existing policies, or draft new ones, compliant with the GDPR.

Empirically, we have analyzed the contents of privacy policies of Google, Facebook (and Instagram), Amazon, Apple, Microsoft, WhatsApp, Twitter, Uber, AirBnB, Booking.com, Skyscanner, Netflix, Steam and Epic Games. Our normative study indicates that none of the analyzed privacy policies meet the requirements of the GDPR. The evaluated corpus, comprising 3658 sentences (80.398 words), contains 401 sentences (11.0%) which we marked as containing unclear language and 1240 sentences (33.9%) that we marked as potentially unlawful clauses, i.e. either a “problematic processing” clause or an “insufficient information” clause (under articles 13 and 14 of the GDPR). Hence, there is significant room for improvement on the side of business, as well as for action on the side of consumer organizations and supervisory authorities.

The post originally appeared at the Machine Lawyering blog of the Centre for Financial Regulation and Economic Development at the Chinese University of Hong Kong

Facebook’s exercise of public power

facebook-770688_1280In this post I want to argue that Facebook’s banning of pages, profiles and removing posts is an exercise of public power and as such should be subjected to material and procedural standards of public law and human rights.

Ok, I’m not gonna actually argue that much. But I want to defend a weaker claim: it is not obvious that Facebook’s discretion should not be limited by fundamental rights and freedoms, simply because it is a private company. Same applies to other platforms of equal social importance, like Google, YouTube and Twitter. And many other ‘private’ actors.

Context: one international, and one Polish. You probably all remember Facebook’s removal of the photo of the ‘napalm girl’ and the outcry that followed. Critics where accusing Facebook of the ‘abuse of power’ and ‘censorship’, leading the company to change its initial decision. Arguments of critics involved the fact that the photo is ‘iconic’, and that Facebook’s role in news dissemination is enormous (44% of adults in US get their news from there).

In Poland, the case is of a different political colour. In the last days, a group combating hate speech and xenophobia held a mass-scale action of reporting extreme-right wing Facebook pages, what led to the deletion of dozens of them, including pages of a member of parliament, several nation-wide organisations, some with hundreds of thousands of supporters and followers. This also caused an outcry and even made it to the national tv news in the station currently controlled by the government. Arguments invoked by the critics are essentially the same: freedom of speech, censorship, abuse of power etc. The difference is: this time Facebook’s decision got many supporters, who among other arguments claim that Facebook is a private company, acting for profit, and not only is but also should be allowed to do such things.

Now, there is a clear difference between the two cases. In the case of the ‘napalm girl’, Facebook did a ‘bad’ thing. In case of right-wing pages, it does a ‘good’ thing. There are two reasons for that classification to be widely-shared. Firstly, many of the right-wing pages contained content that might be against the law on hate speech and promoting violence. I will deal with this soon. Secondly, there is an emotional reason. Let me dwell on it first.

It just so happens that Facebook currently has a clear liberal and progressive agenda. And that this agenda suits so many commentators, probably including you and me. However, it is not clear that it will always be so. Today Facebook enjoys quite some freedom. Today liberal and progressive sells. But make two thought experiments. Imagine that Facebook would have a right wing agenda, and block extreme-left pages. Or even just liberal pages, or whatever pages that suit your worldview. Would you still be so sure that what they do is perfectly legit? Secondly, imagine that political winds change. Imagine that Trump wins elections. Imagine that suddenly there is a pressure on Facebook to change the course (‘or else we tax you high’, or ‘we grant people property rights in their personal data’, or anything else that would hurt Fb). And that society at large approves. Will we still defend Facebook’s freedom and full discretion? Or will we then say: hey, but common, everyone uses your services, you shape how people think, you have public responsibility and duty?

Emotions aside: In classical legal thinking, which still prevails in many continental legal traditions, including the Polish one, the world was neat and ordered. There were public bodies, allowed to do only what the law says they can do and holding the monopoly on the use of force; and private bodies, allowed to do everything that the law does not forbid them from doing and not allowed to use physical force against each other. 19th and 20th centuries witnessed a rise of constitutionalism, which led to the human-rights-limitation and control of the exercise of public power by public bodies.

Within that picture, Facebook is indeed a private company. It can do everything that the law does not forbid it from doing. It is not under direct obligation to facilitate freedom of speech, a right to associate, fair trial etc. However, notice three things:

  1. Factually, Facebook’s power is enormous. With billions of people using it, billions of people trusting it with providing news, billions of people using it for organisation and communication, it can easily affect the abovementioned rights and freedoms. It might be a private company, but it holds a ‘public’ position in many senses. Why?
  2. Even assuming that Facebook just deletes what it believes is against the law, it:
    1. interpretes the law by itself, without relying on any court;
    2. executes the law by itself, because it has the factual monopoly on the ‘digital force’. In the tangible world, an owner of a debate club might want to kick out a speaker from his property, but would need police to actually take him or her out. In the tangible world, one might find some banners outrageous, but destroying them would still infringe someone else’s property rights. In the digital world, where there are no ‘bodies’, and people do not hold any property in their digital content, this is legally fine, and factually easy, since Facebook unilaterally controls the platform.
  3. However, Facebook does more than just deleting illegal content. It sets its own rules and standards, often stricter than the law. Moreover, it not only deletes stuff, but through the underlying algorithms it chooses what will be displayed to whom and how often. In this sense, if we look at it as a public space, which it in many senses is (remember social media’s role in the Arab Spring and the Ukrainian Majdan?), it is the sole legislator, the court, and the executor of the ‘law’. I does not hold a public power de lege, but it holds a de facto power perfectly imitating the one we have limited when the state is concerned.

Given all this, I think we need a debate on limiting the discretion of socially important internet platforms when it comes to policing the content displayed/allowed there. Obviously, dozens of questions arise: which ones, who would limit them, is market not enough, how would that impact innovation etc. etc? There are other private parties who exercise other ‘public’ powers elsewhere (think of FIFA, multinational corporations etc.). Should we regulate business at large, or sectors, or what? There is much to be thought through. There is already a lot written on this. Much less read on this. Questions are on the table, and I don’t have tweet-long answers.

But I simply cannot accept the claim that it is perfectly fine that Facebook interprets and executes the law, or actually does whatever it wants, because it is a private company. The power it holds is public in nature, just not yet labelled so by our analog laws. And if that does not convince you, remember: it might soon change that ‘our’ agenda sells. Just like with contracts, we need to make them when everything is fine, but will need them when something goes wrong.