Friday, 17 May 2019

Thoughts on the Harmful Digital Communications Act

With increasing pressure against social media companies, it's worth looking at the Harmful Digital Communications Act in NZ again. The HDCA came into force in 2015 with the aim to "deter, prevent, and mitigate harm" caused to individuals by digital communications and to provide victims with a means of redress. The Act introduced both civil penalties and criminal offences, giving victims different pathways to recourse depending on the type of harm experienced. Netsafe was appointed to operate the civil regime, and is tasked with receiving and resolving complaints in the first instance (analogous to the Office of the Privacy Commissioner for the Privacy Act). Netsafe will also assist with removal of content from the internet where possible, working with content hosts in New Zealand. Police are responsible for criminal cases, which are for more serious cases. 

One of the main aims of the legislation was to produce social change, to make online spaces safer for New Zealanders to participate in. There was particular focus on cyber-bullying, and the impacts of online harm on young people, especially as it contributes to our growing mental health crisis. The procedures were also designed to be accessible for victims, both in terms of speed and cost. While there were concerns at the time over the chilling and suppressive nature the legislation could have on freedom of expression, many MPs said that the pressing harms being perpetrated online far outweighed those concerns; arguably, the Act has not had any tangible effect on freedom of expression in the subsequent years. The legislation has also become clearer over time as case law has been built up, with some clarity being provided around the tests and thresholds for harm.

While the legislation is relatively young, this may be an opportunity to highlight the challenges faced by the Act going into the future, and to make adjustments or corrections to minimise harm sooner.

a) In the subsequent 3 years, [18, 85, 107] people were charged with offences under the HDCA in each year, and [12, 42, 53] people were convicted respectively. The majority of cases have related to revenge pornography, while incidences of hate speech, racial harassment, and incitement to commit suicide have been largely unpursued. (Interestingly, 65-75% of people charged with HDCA offenses plead guilty. Unsurprisingly, 90% of those charged have been men.)

While the principle of proportionality is important, the lack of consequences for harmful digital communications at the lower end of the scale mean that the new Act has little deterrent effect, and arguably has not shifted societal attitudes or behaviours in this area. The Act requires that digital communications be directed at an individual to be deemed harmful, but is there scope to amend the Act to cover other cases where groups of people or sectors of society are suffering harm? Arguably, more harm overall is being perpetrated in cases where it affects many people at once.

b) The need to demonstrate harm has proven to be a difficult barrier, with a number of cases dismissed simply because the prosecution could not show that there was sufficient harm, especially when it is defined ambiguously and subject to interpretation by Judges. What further guidance needs to be given about establishing harm, and what recourse can there be for legitimate victims who do suffer harm but may not meet the statutory threshold? There have been comments by lawyers that what was initially unclear has now become clearer over time, but at what (social and personal) cost did this clarification develop over time?

c) One of the aims is to provide “quick and efficient” redress, but how fast is the NetSafe/District Court process in reality? What incongruities lie between the fast, digital nature of the harm and the slow, physical nature of the recourse process, and could technology be better used to help accelerate these processes?

d) Enforcement has struggled against the prevalence of anonymous perpetrators, leaving victims without recourse. The District Court process can order parties to release the identity of the source of an anonymous communication, but how often/well is this used? Sometimes this is technically impossible (e.g. when people share an internet connection and the identity cannot be confirmed). Is this something that technology can help with?

e) Amongst these issues, it may also be worth re-investigating the notion of safe habour – should content hosts be protected from failing to moderate harmful digital communications? Currently, as long as they respond to complaints within 48 hours, they can get away with the argument of "we are just the messenger and are not responsible for the content". Can we enforce safe harbour requirements on platforms operated by companies overseas? Weak enforceability (or in some cases, social media companies belligerently ignoring the law and saying it doesn't apply to them) challenges notions of big multinational companies taking the law seriously. Do we need to be braver and have stronger penalties?

So we come back to the purpose of the Act - Hon Amy Adams (Minister of Justice at the time) said "this bill will help stop cyber-bullies and reduce the devastating impacts their actions can have." Has it done so, sufficiently, over the course of 3+ years? The HDCA is currently under review by the Ministry of Justice, so hopefully some of these issues are already being looked at. But the review doesn't seem to have a public consultation element, so there isn't much visibility for the rest of us to see what's happening.

Monday, 22 April 2019

Paper Title: The IEEE Conference Proceedings Template Text

It all started with IEC 61499 function blocks - a way of modelling industrial systems using pictorial representations in a standardised, and therefore programmable, way. It is used widely around the world, and a lot of research effort has gone into enhancing its capabilities and making it more usable in real-world applications. The paper "Remote Web-Based Execution of IEC 61499 Function Blocks'' (ID:7090220), published in the 6th Electronics, Computers, and Artificial Intelligence (ECAI) Conference held in 2014, described a prototype that integrated IEC 61499 with web technologies in a safe and secure way. The introduction of the paper suggests that this might allow for computationally expensive tasks like iterative optimisation or image processing to be executed on the cloud, with results used to control specific function blocks. The introduction also suggests that "this template, modified in MS Word 2003 and saved as "Word 97-2003 & 6.0/95 - RTF'' for the PC, provides authors with most of the formatting specifications needed for preparing electronic versions of their papers.''

If the last point seems incongruous with the highly technical subject matter of the paper, that is because it comes from the IEEE Conference Proceedings Template. The authors of that paper used the template to start the writing of their paper, and while they deleted most of the original template text, a large chunk of text was simply forgotten and submitted. In total, 147 words from the introduction section of the IEEE template remain in this IEEE Xplore-published paper. The rest of the paper seemed original and interesting, yet this passage of text clearly should not have been in the final paper. How did a paper with such a large block of text from the IEEE template make it past peer-review, plagiarism checks, and eXpress PDF checks to become indexed and published on IEEE Xplore?

This started a journey into uncovering just how widespread this issue was. Thousands of IEEE Xplore-published papers were discovered that contain at least some text that matches the IEEE conference template. I thought it might be worth documenting this process and describing our journey. This blog article covers how these papers were found, briefly describes how IEEE was informed about the issue and how they responded, and offers some opinions on the systematic failures that have allowed these errors to go unnoticed.

In most cases, I believe that the presence of template text in a paper was just a genuine mistake on the part of the authors. In many of the papers that I read, there is legitimate scientific work being reported that is of value to the academic community, and there may only be a few sentences of template text. It is not my intention to offend or embarrass any of these authors. Therefore, rather than referring to papers by their full title or authors, I mostly refer to them by their IEEE Xplore ID numbers. Readers interested in tracking these papers down can search for the ID number in IEEE Xplore to retrieve publication details.

Data Collection
The methodology was pretty simple - I used Google Scholar to search for papers that match some part of the IEEE conference template text. This was because Google Scholar's exact quote search seemed to be more accurate than the IEEE Xplore search. Each search used quote marks in order to get exact matches only. Google Scholar's search results were restricted to only those matching Google Scholar has an undocumented limit to the length of each search query. Empirically, this appears to be 256 characters, so after taking the site filter into account, each query can be a maximum of 232 characters. After each search, a random sample of the papers were checked to make sure that the search was accurate and that the queried text was in fact in the paper (the examples in the Table below have been manually verified). Unfortunately, since Google Scholar does not offer an API, and scraping the website is against the Terms of Service, all of the data collection was done manually. The Table below shows some of the queries that were run, and gives an indication of the scale of this problem. While this cannot be interpreted as an exact number of papers that contain template text, it is hoped that this analysis gives a sense of the scale of the problem; it is not limited to a handful of papers. Hundreds, if not thousands, of papers have some template text in them. This search was done in June 2017, so the numbers have increased since then (estimated to be approximately 5-10%).

There are two important caveats that prevent us from simply adding up the number of papers to find the total number of papers that match template text. Firstly, it is probable that there are some papers that appear in more than one search. In manual checks, most papers appeared in only one of these searches, meaning that the amount of template text was relatively small in most papers (usually from one of the sections or just one of the sentences), but without further analysis no strong claim can be made here. Secondly, Google Scholar's search may not be perfect, and it is possible that papers may be listed more than once in the search results or listed when there actually is no match. In some cases, the authors have hidden the text so that it is not readable to humans (e.g. making the text white or placing a figure over the text), but is still searchable by computers, leading to an erroneous listing in the search results.

Importantly, there are also reasons to suspect that these numbers may undercount the actual number of papers matching template text. Firstly, these searches only parse papers where the PDFs are text-searchable. A small proportion of conferences have uploaded their papers with scanned PDF files that are essentially images without searchable text, which may not appear in these results. This is more likely to happen for older conferences. Secondly, even slight changes such as an additional word or an extra space could result in the paper not being included in the search results, because exact quotes were sought. It should be noted that although we are primarily interested in papers published on IEEE Xplore, the IEEE conference template is widely used around the world for other publishers as well, and there are large numbers of papers published outside of the IEEE that also contain text from this template, that were excluded because of the site filter used in the search query.

(Some Other) Analysis
The IEEE conference template file also includes seven references. A number of papers have failed to remove these references or re-used some of them, significantly increasing the number of citations for these papers. We can easily assess the magnitude of this issue, because two of the references in the template are not for real publications. K. Elissa's work, "Title of paper if known'' (unpublished), and R. Nicole's work, "Title of paper with only first word capitalized'' (published in the J. Name Stand. Abbrev.), have been cited over a thousand times by IEEE Xplore papers according to Google Scholar (1440 and 1110 respectively). There are some issues with this result, as Google Scholar's citation tracking is not perfect, but I have found IEEE Xplore papers that cite these papers directly, such as ID:5166784 and ID:5012315. Some papers only appear if the reference text is searched directly, as sometimes these placeholder references appear appended to the end of legitimate references, such as ID:6964641. Meanwhile, the other five real references have received an artificial boost to their citation counts - James Clerk Maxwell's "A treatise on electricity and magnetism'' had plenty of citations anyway, but a non-negligible number of these citations, such as ID:6983343, are not genuine.

As far as I can tell, the current IEEE conference template was created around 2002/2003, based on the IEEEtran LaTeX class made by Michael Shell. It therefore makes sense that the earliest paper that was found with template text was from a conference in 2004 (ID:1376936), although at this point in time most papers were still scanned into IEEE Xplore and not text-searchable.

The most egregious case, ID:6263645, was literally just the IEEE conference template in full with the title changed. Even the authors section of the paper was from the template. How was this paper accepted and published? The conference website seems to suggest that only the abstracts were peer-reviewed, with full submission of the papers after notification of acceptance to authors. The conference website includes the text "Failure to present the paper at the conference will result in withdrawal of the paper from the final published Proceedings,'' which implies that a presentation was made since the paper was published to IEEE Xplore. But perhaps, no one checked the uploaded paper itself after the conference.

After this paper was reported to the IEEE, it was removed several months later "in accordance with IEEE policy'', although evidence of the original paper is still available through secondary sources such as ResearchGate and SemanticScholar which carry the original abstract. In fact, the website DocSlide contains a copy of the full text of the paper. It is important to note that this paper appeared in the conference schedule and proceedings table of contents, alongside legitimate papers in a legitimate conference. As stated earlier, my intention in investigating and reporting template text in conference papers is not to punish or embarass the authors who have made these errors, as I believe that in most cases these errors were made unintentionally, and there is still scientific merit in the papers that outweighs the impact of these errors. I am not advocating for papers containing template text to be removed from IEEE Xplore. However, in cases like ID:6263645, where the whole paper is nothing but template text, it is clear that the paper is so flagrantly against the spirit of academic publication, that there is little choice but to remove the paper.

The IEEE Response
Members of our research group first notified IEEE about this in July 2017. After much searching about the correct process for reporting this type of issue, we tried to contact the IEEE Publication Services and Products Board (PSPB) Publishing Conduct Committee. However, no contact details were to be found anywhere, so we e-mailed the Managing Director of IEEE Publications. Eventually our report made its way to the Meetings, Conferences and Events (MCE) team, where the matter was placed under investigation and it began a slow internal process. Every couple of months we would e-mail for an update, and be told that the investigation was ongoing and we would be notified when it was concluded, but that they would be unable to report on each individual instance. IEEE assured us that "IEEE has been fully assessing the situation regarding this circumstance, and putting the appropriate time and resources into investigating this issue thoroughly." To my knowledge, ID:6263645 was the only paper that was removed since it contained no original content other than the title (and I am not advocating for papers that only have a few sentences of template text to be removed).

Since our original report, in May 2018 the following text was added to the IEEE conference template page (partly in bold) and in the actual template files at the end (in red):

IEEE conference templates contain guidance text for composing and formatting conference papers. Please ensure that all template text is removed from your conference paper prior to submission to the conference. Failure to remove template text from your paper may result in your paper not being published.

This is slowly being reflected in copies of the template as it propagates throughout the world for new conferences. Will this action by the IEEE resolve the problem?

In the subsequent year or so (to April 2019), Google Scholar suggests that there are 18 papers published on IEEE Xplore that contain the above warning text. A manual check over these papers reveals that authors have changed the text colour of the warning to white for most of these papers (which makes it invisible to humans, but not to computers), leaving four papers that contain the new template text. This includes ID:8580104, which appears to be a new paper from a 2018 conference that is just the new template published in its entirety (which we have just informed the IEEE about). Maybe the new warning in the template has helped reduce the rate of incidence, but cases are still slipping through.

Systematic Failures?
The IEEE claims to publish conference proceedings for "more than 1,500 leading-edge conference proceedings every year''. While the standards of IEEE are high, it is understandable that with so many papers being published every year, some papers will inevitably slip through the cracks of quality control. It could even be argued that a couple of papers out of the hundreds of thousands published by IEEE each year is relatively insignificant. However, we should still seek to understand why so many papers containing template text, something which should be easily avoidable, have been published in the IEEE Xplore database.

Similarity Checks
The IEEE requires that all papers submitted for publication be checked for plagiarism. It is important to note here that the inclusion of template text in a paper is not generally intentional plagiarism. However, the method for automatically detecting template text, similarity analysis, is more commonly used for identifying plagiarism. In the case of conferences, all organisers are expected to screen their papers for plagiarism. Any papers that are not screened during manuscript submission are checked by the Intellectual Property Rights (IPR) Office before the papers are published on IEEE Xplore. The point to emphasise here is that it is claimed by IEEE that at some point, every paper passes through a standard plagiarism check before publication.

The IEEE has its own portal, CrossCheck, which program chairs and other conference proceeding organisers can use to check for plagiarism. It is essentially an IEEE-branded front-end, with iThenticate running as the back-end engine. iThenticate is arguably the world's leading plagiarism checking service, and is also used by Turnitin, CrossRef, many universities, and others. The strength of CrossCheck in particular is that all participating organisations agree to provide full-text versions of their content, so that they can build up a large corpus of work and increase the probability of catching plagiarised text. It stands to reason that a plagiarism checking service as powerful as this should be able to detect text from the IEEE conference template and alert reviewers/editors/organisers.

However, anecdotally, I have heard that for many conferences the rule of thumb is that a paper should have an overall similarity score of less than 30%, and a similarity score with any single source of less than 7%. If the similarity scores exceed these thresholds, in most cases authors are given an opportunity to edit and reduce their similarity scores, or the paper is rejected. In paper management systems like EDAS, an alert is only generated if the similarity score exceeds a threshold; otherwise it is normally assumed that the paper doesn't have significant plagiarism and can be reviewed.

The template text problem shows an issue with this percentage based approach - one or two sentences can easily fall below these thresholds to avoid automatic detection. In a 6-8 page conference paper, even an entire paragraph of template text may only constitute 1-2% of the overall paper. If the IEEE template appears towards the bottom of the similarity report, then it may likely be missed by publication volunteers and staff, if the similarity report is checked at all.

Perhaps we should recognise that not all sentences are equal, and that some matching sentences are more problematic than others. One possible solution is to develop similarity checks that use two corpora; one corpus that contains the current collection of internet and otherwise published sources, and another corpus that contains privileged text that should never appear in texts passed through the similarity check. If there is any sentence in the paper that exactly matches one in the second corpus, then that should produce an alert at the top of the similarity report. Examples of passages to include in this second corpus include template text from different publishers, lorem ipsum, and other sources that contain text that should never (or very rarely) appear in a published paper. A human reviewer is still required to interpret the results of these similarity reports to ensure that false positives do not hinder or prevent the publication of good papers.

Peer Review
Conference peer-review is generally of lower quality than journal peer-review. There are, of course, exceptions in terms of the highest level conferences and the lowest quality journals, but overall, review expectations are lower for conference publications. The shorter review periods and lower standards disincentive reviewers from spending too much time conducting their reviews. Anecdotally, recruiting reviewers for conferences has become increasingly difficult as the number of publication opportunities grow.

One of the problems with the presence of template text is that there should be no cases where including the template text makes any logical sense in the context of the paper (unless it was a paper about the template text like this one could have been). If a reviewer has read the paper, then this error should be obvious. So why has peer-review failed to detect the template text?

First of all, it appears that some of the papers that are published in IEEE Xplore have not actually been peer-reviewed. In some cases, only conference abstracts are peer-reviewed, and once accepted, the subsequent paper is not reviewed at all. In these cases, the fault does not lie with the reviewers, but demonstrates that this model of publication is flawed and easily exploitable.

Where reviewers do spot template text, there is generally limited opportunity for them to inform authors. There may be a field in the paper review system to enter some comments. If the reviewer is motivated enough, then they might indicate to the authors exactly where the template text in their paper is. But in my experience, conference paper reviewers tend to provide higher-level feedback, looking at the contribution and novelty of the paper, rather than specific grammar or spelling errors. After all, these should be caught during proof-reading.

Even if the reviewer has provided the feedback to the authors that template text is in the paper, there is generally no opportunity for anyone in the process to make sure that the template text has been removed. For many conferences, there is only one round of review, and therefore reviewers do not see the papers again after camera-ready submission. Program Chairs and Publication Chairs cannot be expected to read and check every single paper. So if a paper is accepted but the authors have ignored the feedback provided by the reviewers, then chances are, it will go straight through to publication and appear in IEEE Xplore.

However, following Occam's Razor, the obvious answer here is that not all reviewers are fully reading their assigned papers. It is easy for template text to slip past the review process if no one actually reads the template text in the paper. This is perhaps an uncomfortable truth, and cannot be easily proven (or disproven).

The issues that are discussed here are symptomatic of a wider challenge in scientific peer-review. The issue of predatory open-source journals that publish papers without sufficient (or any) peer-review has been well publicised. One has to wonder if similar issues have affected conferences as well. Solving the issues of peer-review is well above my pay grade, and there is a wide range of literature on the subject across many academic disciplines.

The sheer scale of this problem indicates another major issue - the general apathy of the academic community towards this behaviour. Many of these papers have hundreds of reads, some have even been cited. Apparently we were the first to report these issues to the IEEE. Does this mean that this isn't really a significant problem, and that no one really cares? The impact is probably relatively small, with most readers accessing the paper for the meaningful scientific content, and are probably smart enough to ignore the template text, right? One could have said the same about the authors who published these papers in the first place.

So, maybe the impact of this template text being in published papers is negligible beyond it being a source of some amusement and entertainment. But at the same time, it can be seen as a symptom of the wider issues that face academia. There are more, and more, and more papers being published every year, and peer-review is falling apart. Automated tools that are meant to help detect misconduct are woefully insufficient. The current models of publishing research articles are exploitable. And there is always the uncomfortable question lingering in the background - how much "high quality" research output is genuinely high quality? Meanwhile, no one really has the time to figure out how to fix these issues while under the pressures of Publish or Perish.

I repeat here that this article is not accusing anyone of any intentional plagiarism or misconduct - everyone makes mistakes sometimes, and that's okay. However, a high-quality repository of academic content should have systems in place to catch mistakes and help rectify them. Over time, the problem has grown too large for the IEEE to retrospectively rectify, and realistically that's probably okay. But does this reflect the academic literature that we want to build and share, or is it just the academic literature that we deserve?

The initial instance of template text found was reported by Hammond Pearce, who then brought it to the attention of our research group, which kicked off this whole prosaic journey. This article is informed by discussions between members of the Embedded Systems Research Group, part of the Department of Electrical, Computer, and Software Engineering at the University of Auckland, New Zealand. The IEEE conference template says that "The preferred spelling of the world 'acknowledgment' in America is without an 'e' after the 'g'", but this article isn't being written in America, and the author prefers the 'e' to be in there.

Tuesday, 23 October 2018

Submission to RCEP Negotiators on Algorithmic Bias and Discrimination

In October 2018, I was asked to give a short submission to the Regional Comprehensive Economic Partnership (RCEP) negotiators on algorithmic bias and discrimination (during their Round 24 meeting in Auckland). RCEP is a trade agreement between the ASEAN countries and Australia, China, India, Japan, Korea, and New Zealand. Of particular interest to me was the provisions that were likely to be copied from the CPTPP around source code.

Thank you for having me today to participate in this discussion. I am a Computer Systems Engineer at the University of Auckland, using and developing artificial intelligence and machine learning algorithms for image processing and computer vision. In other words, I write software code. I’d like to speak today about algorithmic bias and discrimination, and why access to source code matters. This is important for the e-commerce chapter, but also has implications for intellectual property.

We live in a world where software is not perfect. The motto and attitude of many companies is to "move fast and break things". In software development, encountering errors and bugs is the norm, and it is expected that updates and patches have to be provided in order to correct these after products have been released. We don't trust civil engineers to build bridges or buildings in this way, yet we increasingly rely on software for so many parts of our lives. Algorithms can decide who is eligible for a loan, who gets prioritised for health services, or even which children might be removed from their homes by social workers. We need to be able to find errors and to correct them, especially when the real-world stakes are high.

With the rise of artificial intelligence, we have also seen an increase in a particular type of error - algorithmic bias and discrimination. There have been a number of well publicised cases in recent years. Computer vision algorithms for facial detection and recognition have historically had higher error rates for people of darker skin colours. An algorithm for assessing the risk of re-offending for convicted criminals in the US was found to be biased against African Americans, leading to harsher sentences. Earlier this year, Amazon decided to deactivate a system that screened potential job candidates when they realised that it was biased against female applicants. These effects are not intentional, but sometimes we just get things wrong.

The MIT Technology Review wrote last year that bias in artificial intelligence is a bigger danger to society than automated weapons systems or killer robots. There is a growing awareness that algorithmic bias is a problem, and its impacts are large because of how pervasive software is. Software spreads very quickly, and negative effects can lay dormant for a long time before they are discovered.

Without going into too much technical detail, there are two critical sources of algorithmic bias:
- Poor data that either does not reflect the real world, or encodes existing biases forever
- Poor algorithm choices or systems that artificially constrain choices, for example by selecting the wrong features or wrong output classes, or optimising towards specific goals while ignoring others

In both cases, there is often no way for an end user to confirm that something is wrong. We say that these systems are opaque, because we cannot see into how these algorithms work. Most research into discovering biased algorithms requires population level data in order to reverse engineer the system, often after the system has already been deployed and harm has accrued. It is the role of governments to protect its populace from dangers such as these. Many currently do not know how to deal with this, and the black-box nature of many advanced algorithms can make this difficult.

It is therefore of concern that trade agreements may stifle this work by putting in place restrictions against inspecting source code. By doing so, a powerful tool is taken away from regulators, and we massively empower engineers and developers to make mistakes with real-world consequences.

As an example of how trade agreements have affected this, Article 14.17 of the CPTPP specifies that "no party shall require the transfer of, or access to, source code of software." I can understand why companies want this, to help protect their intellectual property rights. But we may have to decide which rights are more important – a company’s property rights, or the public’s rights to not be subject to mistakes, errors, biases, or discrimination that can have unforeseen and long-lasting impacts? Or in other words, the public’s right to safety.

In paragraph 2, it clarifies that source code restrictions are only limited to "mass-market software" and software used for critical infrastructure is exempted. Presumably this is an acknowledgement that software can have errors, and that in critical situations regulators must have the ability to inspect source code to protect people. It begs the question – what is critical infrastructure, and what about everything else that still has a strong impact on people’s lives?

Algorithms don’t just affect aeroplanes or nuclear power plants, we’re talking about scheduling algorithms that control shipping operations to decide what goods go to which places at what times, we’re talking about social media algorithms that influence our democratic processes, we’re talking about resource allocation algorithms that decide who gets a life-saving organ transplant. Why are we locking the door to our ability to reduce real harm? Where software is being imported across territorial boundaries, regulators need to have the opportunity to check for algorithmic bias and discrimination in order to protect their populations. Please do not just copy these articles from the CPTPP; more recent trade agreements such as NAFTA and EUFTA have already recognised that this was a mistake, and have tried to correct it with more exceptions. A high quality and modern trade agreement must allow us to manage and deal with the risks and harms of algorithmic bias. Thank you very much for your time.

Protecting Privacy Rights with Alternative/Progressive Trade Agreements

As part of the Alternative and Progression Trade Hui in 2018, I was asked to speak for a few minutes about privacy rights in the digital age, and how they can be influenced by international trade agreements.

Q: Privacy and controlling the use of our personal information for commercial purposes is increasingly at risk in the digital world. It is changing very rapidly, with the Facebooks and Googles of the world increasingly difficult to regulate and control. Their interests are also reflected in the trade and investment agreements, particularly in e-commerce chapters. How do you think this industry will develop over the next decade or so, and how would you see international agreements best structured in the face of this constant change in order to ensure people’s privacy and control of their lives is protected?

A: A lot of the threats to privacy in the coming years are enabled by advances in AI, which allow us to process a lot more data more quickly, while also doing so in a way that is opaque to humans. Our rights were not designed with these types of automated capabilities in mind.

Trade is not just about physical goods! Data and information have value and are now commoditised - privacy is what helps us keep that value to ourselves and maintain ownership. There has been an erosion of rights with commercial entities getting on board - we can't think of surveillance as being an exclusively state activity, and we need to understand how corporations are trading in and using our data.

Privacy seems to be one area where exposing the downsides of large-scale data collection and trade of that data can generate a lot of attention - e.g. NSA, Cambridge Analytica and Facebook, etc. But after each breach, we focus on individual responsibility and individual actions - delete Facebook, or don't use social media, etc. By and large, there are very few nation state actions or responses.

Governments simply do not know what is out there, lawmakers are unaware of both the risks and the opportunities to use technology to protect privacy. This is one area where states have been largely reactionary. The current Privacy Bill has been characterised as fit for 2013; it will be outdated upon arrival. There is a reliance on lobbyists in this space, funded by the types of companies that say "move fast and break things". Privacy rights are sometimes viewed as antithetical to capitalism, because they get in the way of doing business. More companies are wary of this now, but in a sense, without regulation there may not be sufficient incentive for companies to make privacy a priority and actually protect people's data. This influences our trade agreements, for example, by asking for source code to be kept secret in order to protect intellectual property. Strong privacy legislation can be seen as a trade barrier, and thus it becomes traded away in exchange for economic benefit.

At the same time, Europe is exporting their privacy standards with the GDPR - privacy is one area with contagious legislation where states often copy each other. In some ways this is good, if it means that everyone is adopting good protections. The GDPR led to a massive scramble of companies rushing to get themselves compliant - not because it was impossible before, but because they didn't need to before. So a progressive trade agreement could lift the standards in this area for everyone - it requires leadership from a state, such as we've seen in the EU. So while our privacy and our data can be at risk through trade agreements, there can also be opportunities for those trade agreements to strengthen privacy protections - it depends on how much we can convince governments to prioritise it. New Zealand can be a leader in this space and say that it’s important to us.

Trade agreements can demand performance standards over how trade is conducted, for example cross-border data transfers which are covered by TPP, RCEP, EUFTA, and others. It may be weird to think about data transfers as international trade, but there is an exchange of value there. There are some existing standards, but we could go much further to introduce stronger property rights around data, particularly around how multinationals obtain and then use and trade our data, and make sure that we can own our own data and protect it. GDPR is an example of how this can be achieved. New Zealand can make privacy a priority, and it should really be a priority for all trading nations.

[But in 30 years time none of this might matter anyway, as we head towards more complex AIs that cannot be understood or inspected by a human, which may process and trade our data in ways that we cannot foresee. How we deal with that as AI becomes more pervasive and harder to control is a different but also critical discussion.]

Thursday, 30 August 2018

Privacy and Camera-based Surveillance

This talk was prepared as part of Raising the Bar 2018, a series of talks organised by the University of Auckland to get research out into the public in different settings and contexts. A recording of the talk is also available here!

Kia ora koutou, anei taku mihi ki a koutou kua tae mai nei. Thank you very much everyone for coming along tonight. Welcome to Raising the Bar! A big thank you to the University of Auckland for putting this all together. My name is Andrew, and I’m a PhD candidate specialising in Computer Systems Engineering, working in the area of practical and ethical video analytics systems. Video analytics is a relatively new term, so you might not have heard of it before, but it’s really all in the name. Video analytics is essentially where we take a video, and we analyse it. In reality, it’s often just a nicer-sounding term for camera-based surveillance, because when we analyse video, we’re almost always looking for particular objects or things, and in many cases those things are people.

The system that we’re developing at the University of Auckland is one where we can track people in real-time across multiple cameras, so that we can have these large camera networks and see how people move and use physical spaces. We need to use artificial intelligence and machine learning, embedded systems, big data, hardware/software co-design, the internet of things, and a bunch of other buzzwordy technologies together to achieve this end goal. My degree is fundamentally an engineering degree, so the primary focus is on the application and development of the system itself, but as I continued to work away at this video analytics system, I became more and more concerned about how these systems might actually be used. Something in the back of my mind felt a bit bad about helping to create these next-generation surveillance systems, because I knew that as with most technologies, these systems can be used for good or for bad, depending on who owns and controls the system.

And so, from Edward Snowden and the NSA, to Cambridge Analytica and Facebook, information about us seems constantly at risk. Technological advancements have meant that surveillance capabilities have accelerated away from our understanding and regulations around privacy, and it’s an area fraught with complexity, differences in context, and many subjective opinions, which makes it really, really hard to figure out what the right answer is.

So tonight, I’m going to try and break things up into a few sections, and depending on how tipsy everyone is, we might try some audience interaction. We’re going to start off with an introduction to the problem space, and what has changed recently that means we might have to talk about privacy and surveillance in new ways. Then we’ll discuss privacy generally and why it matters. We’ll meander through some of the technologies that enable surveillance in new ways. Then I’d like to share some results from recent University of Auckland research on public perceptions of privacy and surveillance cameras, and the factors that we think affect how people feel about these systems. Lastly, I’ll touch on how we might be able to use technology to help protect our privacy, and what might be needed to get that technology in place. Sound good?

Problem Context
Right now, you’re probably most familiar with camera surveillance systems in law enforcement and public safety contexts. Airport immigration environments, CCTV cameras in London, and facial recognition systems in China are just a few examples of where cameras have been deployed on a  massive scale, automated with the help of artificial intelligence. That Chinese example is particularly interesting, because they plan to have full coverage of the entire country with facial recognition-based tracking by 2020, including surveillance in homes through smart TVs and smartphones. I’m not sure if they’ll get there based on the current state-of-the-art technology, but that’s just quibbling about the deadline – if it’s not 2020, it might be 2025. Still scary.

But as the costs of deploying large-scale camera networks continues to fall, and the abilities of artificial intelligence and computer vision continue to rise, we’re going to see more commercial entities utilise these types of systems to gain insights into how customers use and interact with physical space. You can call it business intelligence. For example, let’s say that we have a supermarket. There are a bunch of decisions about how you set up that supermarket, how you structure the aisles, where you put the products, that are known to have strong impacts on consumer purchasing behaviour. Up until recently, most of those insights have come from stationing human market researchers with a clipboard and a pen, observing shoppers and taking notes manually. It’s a boring job, and you can only get humans to observe people some of the time, and if the shoppers know that they’re being observed then they often end up changing their behaviour. Now imagine that we can set up a camera network that observes the shoppers all the time. It can count the number of people in the shop at any time, determine which aisles are most popular, and even tell you which paths customers are taking. There are commercially available systems in place right now that can detect if a checkout queue is getting too long, and send alerts to the manager that they need to open another checkout counter. Then you can collect statistics over time and start to answer higher level questions like, which products should I put closest to the entrance and exits, how often do we need to restock certain aisles, how many staff do we need to schedule in on a weekly basis? And if you really wanted to, the technology is there to allow you to answer questions like, what items did loyal customer number 362 pick up today but not buy, so we can send them an e-mail with a special offer so they’ll buy it next time? Is this person who has just entered the supermarket at risk of shoplifting based on their criminal history? Do customers who look a certain way buy more stuff, and so should we get a shop assistant to go upsell to them? And there is the real potential for secondary uses of data as well – even if you are told that the surveillance camera system is there to collect shopper statistics, what if the supermarket then sells that data to the food manufacturers, or sells that data to a health insurance company, or lets the police have access to those camera feeds?

I probably should have warned you at the beginning that this might be a bit of a scary talk. Unfortunately it just comes with the territory, that in order for me to talk about this stuff, I have to scare you all a little bit with examples of how this technology can be used. We often like to pretend that technology is value-neutral, in that technology itself is not inherently good or bad, but that’s not really true, because sometimes we can definitely foresee how that technology might be used. There is no shortage of science fiction out there featuring mass surveillance of the population, whether it’s Orwell’s 1984 or Minority Report. As technology developers, I believe that we have an obligation to not just ignore those dystopian futures, or in other words “do the thing and let the lawyers worry about the consequences later”. Where we can clearly foresee bad things happening, we should be doing something about it. I’ll come back to this later in the talk.

Back to the supermarket. What is it about this scenario that makes us feel so uneasy? There can be relatively benign uses of surveillance camera technology, such as letting managers know when the queues are getting too long, but there can also be much more controlling, more invasive uses. As I hinted at earlier, one of the big factors here is that the owner of this camera surveillance system is a commercial owner, rather than the state. In a traditional sense, whether it’s the police or the national intelligence agency, if they have a camera surveillance system, you’d hope that they’re using it for the public good, to keep people safe. You may have problems with that assumption, and that’s okay. But when it comes to corporations, their incentives are clearly different, and in some senses worse. They aren’t using this camera network for your safety – they’re using it to find ways to make more money. The benefit of having the surveillance network goes to the corporation, rather than to the general public who are being observed, whose privacy is being infringed upon. We hold corporations and the state to account in different ways, and the power relationship is different. Personally, I believe that this significantly changes the discussion about privacy and how we as a populace accept surveillance cameras. But, part of the problem is that we’re all used to surveillance cameras now – even if you don’t like them, you probably still walk down Queen St where there are CCTV cameras. You can’t really avoid them if you want to participate meaningfully in society – if you need to buy groceries, you’re going to do it whether there’s a camera there or not. In a sense, the use of surveillance cameras for security and safety has desensitised us to the use of cameras for less publicly beneficial purposes, which is why we need to be vigilant.

Why Privacy?
Okay, but before we get too much further down this line of thinking, we should take a step back and answer our zero-basis question. Why do we care about privacy? Why does it matter? [Audience answers]

Those are all good ideas, but we should think about it even more fundamentally than that. In the broadest sense, privacy is about keeping unknown information unknown. Another way to think about this is what a breach of privacy might look like. Again in the broadest sense, a breach of privacy is where some unknown information about someone becomes known.

Now you might feel that this definition is hopelessly broad, and it is. There are many bits of information that we have no choice but to give away – if I stand here and you look at me, then your brain has automatically extracted a bunch of information about my ethnicity, hair colour, height, and so on that your brain maybe previously did not know. There is a lot of information that we have to give away in order to function in society, such as our names, where we live, our phone numbers, etc.

And this is totally fine when we accept that privacy is not absolute, and it’s non-binary. You don’t have all privacy or no privacy all of the time. Privacy can depend on what information it is that is at risk, the specific use case of how our privacy is being protected or infringed, and other cultural or contextual factors like the type of government we have or the interface with which information is being collected. There are some situations that we could define as privacy breaches, that we are actually fine with and we think are probably okay. Let’s try to make this more concrete with some examples. If the government put surveillance cameras in your home, you would probably feel uncomfortable with that and call that a breach of privacy. But if there is a natural disaster, and the government uses drones with cameras to survey property damage in your area, then you might be more okay with that. That changes if your government is more democratic or not, more transparent or not, more trustworthy or not. Another example: a CCTV camera outside a McDonalds for public safety purposes will probably see you as you walk inside, and you might not care about it at all if you’re not a criminal. But that might change if you’re supposed to be on a diet, and your friend works at the company that monitors the surveillance cameras. I found out a few weeks ago that the CCTV cameras in central Wellington are actually monitored by a team of volunteers, not uniformed police officers, so the people behind the cameras probably operate at a different standard to what you might expect. Your feelings might change if data is being extracted from the video feed and then sold to health insurance companies who might raise your premiums if you go to McDonalds too often. It’s physically the same camera, but how it’s being used, who is in charge, and your own personal circumstances can have an impact on what privacy means.

This is all before we talk about the right to privacy. All of what we just discussed was just defining privacy, but that is separate to whether or not we actually have a right to privacy. So why is it important that everyone have a reasonable expectation of privacy? There are a lot of different arguments for why something as nebulous as privacy should be protected. It’s much easier to make a case for well defined things, like a right to life or a right to access basic needs like water and air. But the right to privacy is sort of like the right to free speech – it’s really hard to define and there are a lot of exceptions. I think for me, my summary of many arguments is that the need for privacy is a response to imperfect trust. We know that there are bad people around, and we can’t perfectly trust everyone all the time to always act in our collective interest. There are many interpretations of what is morally and ethically right to do at any point in time. And information is power, information gives people control over others. So we need to keep some information to ourselves to prevent it from being abused or used against us, ultimately so that we can maintain some sense of feeling secure. And I think that feeling of security and being able to trust people in limited ways is inherent in allowing our society to function. If you go to a coffee shop and buy a cup of coffee, you inherently trust that the barista are going to keep up their end of the bargain and give you a cup of coffee and not orange juice or soup or poison. If you couldn’t trust them, you’d have to make your own coffee all the time, and that might be an added cost to you. But we can only trust each other so much. While you’re okay with trusting the barista to make you coffee, you probably wouldn’t just give them all your medical and financial records, because you don’t necessarily trust them to handle those in the context of your customer-barista relationship. You need to keep some things private from others in order to maintain the appropriate social boundaries that define your relationship, with an appropriate level of trust. Maybe it’d be nice if we could all be open books and give away all of our information and be public about everything, but we just know that we can’t do that. Scarily, the day after I drafted this, I saw some news that a pregnant lady in Canada was accidentally served cleaning fluid instead of a latte because they plugged the wrong tubes into the coffee machine by accident, so even trusting your barista to make coffee right might be going too far.

This notion of trust and confidence is captured in our privacy legislation. The new Privacy Bill, which is currently at Select Committee, has the explicit intention of “promoting people’s confidence that their personal information is secure and will be treated properly.” If the information is secure and treated properly, that would be insufficient – it’s people’s confidence that is targeted by this Bill.

New Surveillance
But on the topic of legislation, one of the big problems with legislation is that it simply doesn’t keep up with the pace of technology. Here’s an example – NEC is a Japanese company that has been contracted to provide some person tracking services on Cuba Street in Wellington as part of the council’s smart city initiative. Most of us probably missed this story up here in Auckland, although there have been discussions within Auckland Council about doing the same thing up Queen St. The idea is that they want to know how many people are moving up and down a busy pedestrian route, at what times, and at what speeds, to inform pedestrian traffic management officers and so that the urban planners can have better information to work with when redesigning that space. Good intentions, good use of the technology. NEC proposed to do this in multiple ways, including the use of microphones and cameras. But it turns out that recording audio is illegal, because there are laws that prohibit the interception of private conversations, originally intended as a defence against espionage and police overreach, before video recording was cheap and ubiquitous. This is really old law in the Crimes Act that has been around for decades, and so NEC had to disable the microphones. You can’t make an audio recording of a conversation, but it seems to be legal for you to make a video recording of two people having a chat, and it’s okay for you to know that the conversation took place, which is in a sense metadata, which could be enough to infer all sorts of things, like that John Key supports John Banks enough to have a cup of tea with him. You could then watch the footage and try to figure out what they were saying by reading their lips or similar. Maybe you could even use an algorithm to do the lip reading. So back on Cuba St, the cameras are still running, collecting counts of people as they move throughout the space. There are privacy principles around a reasonable expectation of privacy, but even though we managed to make audio recordings of conversations illegal, video is, in a general sense, legal. The Office of the Privacy Commissioner has kept an eye on it for a long time, but the Privacy Act and Crimes Act have very different enforcement mechanisms. This is a demonstration of how the legislation might fall behind the development of technology, how the government has not protected the populace from a potential threat, and so our expectations and rights have eroded away. Oh and by the way, it turned out that the council wasn’t just interested in person counts and tracks – news articles reported that they also wanted to identify beggers and rough sleepers, and use the data to improve their efforts to get rid of homeless people on Cuba St. NEC also publicly said that they wanted to sell the data to tourism companies and retailers. So maybe not so well-intentioned after all... but apparently pretty legal. [Note: This system has recently been shut down and is no longer running in its original form]

And it’s not just that there’s a gap between technology and legislation, but that the technology is accelerating away. Think about what we might consider the status quo at a shop like Farmers. Most of the time if you see a surveillance camera in a shop, one of two things is happening behind the scenes. Either the footage is just being recorded and stored, and no one looks at it unless something bad happens, or there is a human security officer trying to watch ten camera feeds at once. With computer vision and big data architectures, a third option has become accessible to camera network owners – getting computers to automatically process the footage and then just generate statistics or alerts for human supervisors. The technology is at the point where we can go fast enough to process the footage in real-time, and this all enables surveillance networks to be implemented on much larger scales. Rather than needing one human to struggle to watch ten camera feeds at once, you can get a hundred computers to watch a hundred cameras in real-time.

The other thing we can do is combine data from multiple sources. You may have read about people painting their faces with weird shapes to try and fool face detection systems, or people advocating for wearing masks in public. Well, our research at the University of Auckland doesn’t use facial recognition, it recognises people based on the appearance of their clothing. Other research has shown that gait or walk recognition works, because people walk in slightly different ways. When that fails, surveillers can track your phone, sometimes through the cell network, but also by tracking the MAC address reported by the wi-fi or bluetooth. All of this can be done now, and in some cases, is already commercially available. If any one of these systems fail, we can fuse together enough data from the others sources to still get a pretty good understanding of where people are.

The natural response is to try and think of ways to defeat these systems as an advesary – change your clothing regularly, put your phone in flight mode when you’re not using it, take a class from the Ministry of Silly Walks. You could try to legislate against specific technologies too. But there will always be a way for technology to be developed further, to defeat those methods, and you just end up in an escalating war against technology, which probably doesn’t end well for the humans. We can stamp out one type of surveillance, and there will still be many others that can be used and exploited by unscrupulous system owners. The technology will evolve beyond the narrow defintions offered in the law. Instead, we need to ask ourselves some more principle-based questions – who actually wants these systems to exist, who is paying for the development and installation of these surveillance systems? And then we can ask a deeper question – why do they want these systems, and how do we as consumers or the electorate accept or reject these systems?

Public Perceptions
So when I told my supervisors that I wanted to do some research on privacy, their first response was “you’re doing an engineering degree though, so where are you gonna get the numbers?” So to understand why people accept or reject surveillance camera networks, I ran a survey earlier this year to understand what drives public perceptions of privacy. With a survey, now we have numbers, so I can justify putting it in my thesis!

What we wanted to know, was what makes people feel more comfortable or less comfortable about the presence of surveillance cameras and how they’re used? We know that not all surveillance cameras are necessarily bad, you can have good intentions mixed with good purposes and good system owners and maybe things will be okay, but it’s the people who are observed who should get to make a judgement of what good means. Privacy is not absolute, and the context makes a big difference, but what is it about that context that changes people’s perceptions? In contrast to previous research, our survey was designed to be a bit more subtle – rather than asking a series of questions like “do you like surveillance cameras if they are being used for public safety”, we used scenarios – short stories that provided a bit of detail about the context in which the surveillance cameras are being used.

Let’s do one as an example: the question that we asked the respondents was “how comfortable does this scenario make you feel?” So the local traffic authority wants to be able to track cars and trucks on major city streets and highways in order to learn about traffic patterns. They propose to do this by placing surveillance cameras on top of every traffic light and at certain points of highways, and running an automated algorithm that can count the vehicles automatically. The footage would not be recorded as the algorithm just produces a report with the number of vehicles on each road at certain times.” Hold up a hand, on a scale of 1 to 5, 1 being not comfortable at all, and 5 being very comfortable, how does that scenario make you feel? If there are any gaps in the story, you should fill them in yourself with your own personal context. [Generally okay, mostly comfortable? Why do you feel comfortable about it? Why do you feel uncomfortable about it?]

You might get a sense of why even though you might be pro or anti surveillance cameras generally, you can still have different feelings towards those cameras in different contexts, and that the context has different implications for different people.

Alright, so what did our research find? I don’t have much time to go into details, so I’ll skip the statistics and just get to the end results. The first headline result was that demographics don’t matter. There have historically been arguments that demographics play a strong factor, for example, some research has shown that women tend to prefer surveillance cameras in a public safety context, because they feel safer out in public, that they won’t get attacked. But this relationship simply didn’t appear in our data. Whether it was by age, binary gender, level of education, ethnicity, country of origin, country of current residence, or occupation, there were no statistical correlations with liking surveillance cameras more or less. Even though demographic groupings have long been held to influence or predict ideology and beliefs, in this case it really didn’t seem to matter. The conclusion may seem obvious – that what you believe is more than just your demographic characteristics.
Instead, we found that the context that the surveillance camera was much, much more important. Even those who self-reported as hating surveillance cameras could find some merit in using cameras after a natural disaster to maintain public safety, while those who seemed to be totally apathetic to cameras were still wary of a pervasive national-level person tracking system controlled by an intelligence agency. We distilled this down to the five most significant factors, which gives us a sense of what causes people’s perceptions to change.

The first is access – who has access to the video feed or footage, including any secondary data that has been derived from the cameras. For example, people’s perceptions might be changed if only three trusted government officials are allowed to view the footage, versus any one of ten thousand employees of a large corporation that can then onsell collected statistics to other companies.
The second is human influence – is there a person-in-the-loop, is there someone watching the footage, or is it entirely processed by computers? Generally in a public safety context, people felt better if a human is watching or the footage is recorded, but in a commercial video analytics context, people felt better if a computer processes the footage and no human ever sees it.

The third is anonymity – are the observed people in the footage personally identifiable or anonymous? Might there be personally targeted actions as a result? Generally respondents felt uncomfortable if they knew that being watched by the surveillance camera would lead directly to actions that affected them personally, like getting customised specials from the supermarket.
The fourth is data use – how will the data be used? Is the purpose in the public good and providing benefit to the observed? Are there secret secondary uses of the data? The scenario that made people the most uncomfortable wasn’t actually the one that involved an intelligence agency tracking every person in the country, which surprised result – it was actually the scenario where the supermarket tracked consumers and tried to sell them more stuff.

The last factor, and possibly the most important one, is trust – do we trust the owner of the surveillance camera network? Do we belive that they are competent? And this applies whether the owner is a government or a corporation; if we have a trust deficit where people simply do not believe what the owner is telling them, or we do not believe that they have good intentions, then people will feel uncomfortable.

Privacy-affirming Architectures
Okay, so a lot of the talk so far has probably been a bit scary, and we should try to address the big question of “well, what are we going to do about all this?” The first step for us was to understand what makes surveillance camera networks more okay, more comfortable, less scary. And as the prevalence of corporately-owned camera networks continues to rise, it’s really important that we consider how we can systematically put the right protections in place.

And so we have two pathways to achieving this. The first is to regulate. Governments can pass laws that protect our privacy, by requiring system owners to play by rules such as banning unconsented secondary uses of data, requiring footage to be deleted within a set timeframe if unused, requiring opt-in rather than opt-out approaches to consent, requiring transparency or reliability tests for algorithmic processing of footage, and so on. In New Zealand, we’re lucky that we have principles based privacy legislation that is very flexible and covers a lot of cases, but there are further rights that could be extended to the populace. Then the other tricky part is actaully enforcing these laws, regularly auditing these surveillance systems to ensure that they do what they say they do, and that they are compliant, and punishing those that turn out to be infringing upon the privacy rights of individuals. The GDPR in the European Union is starting down that direction, but we’re still awhile away here in New Zealand. The Office of the Privacy Commissioner just doesn’t have the tools it needs to really enforce our privacy legislation right now.

But governments are slow, and they simply cannot respond to the pace of technological development that creates these threats and dangers. Legislators often aren’t expert enough in these areas, and rely on outside information that is amplified by money, which means that the information that they get is more likely to be in the interests of malicious system owners than in the interests of the general population. And to make things worse, international trade agreements seem to be tying the hands of our legislators, forcing them to weaken privacy protections at the behest of corporate lobbyists in exchange for other economic benefits. For example, the EU-Japan economic partnership agreement has conflicts with the GDPR, and they’ve given themselves three years to sort it out – but in a battle between privacy rights and the economy, which one do you think is going to win?

The other approach is to protect privacy by design. Technology developers like myself should, or must, build privacy into their products, such that privacy becomes harder to infringe upon. So one of the features of the system that I’ve designed at the university is what we call the privacy-affirming architecture. In this system, we use smart cameras, where some processing can be done at the point of image capture, such that the footage does not actually need to be stored or transmitted. This means that in a commercial context where you just want the high-level statistics about how your supermarket is being used, the footage would never be seen by a human, it would all be automatically processed and you just get the anonymised statistics out at the other end. A system like this forces the system owners to respect the privacy of individuals, because even if they wanted to be voyeuristic and spy on their customers, they can’t. It takes away one tool from malicious system operators who could otherwise abuse that source of information. It doesn’t solve all of the privacy problems, but it’s a step towards protecting the privacy rights of individuals by default.

But the big counterargument against protecting privacy using technology is that some argue that rights protection is somewhat incompatible with capitalism, because there are real costs associated with developing privacy-affirming or privacy-conscious camera systems, and system owners are not incentivised to pay for the development of these types of systems. They would rather order a system that doesn’t have the extra privacy protecting stuff. Maybe you actually want to infringe upon people’s rights in order to improve your analytics and drive profits up. Maybe you actually want to infringe upon people’s rights to control your population better. No amount of rights-protecting technology is useful if the people responsible for implementing and owning these systems choose not to respect those rights and simply don’t buy that better technology.

So well, there is kind of a third pathway, which is about education. A more educated populace, that knows more about the way that these surveillance cameras are used, that knows more about the threats and dangers of these systems, that knows more about the potential downside of abuse by system owners, and that knows more about how we could make things better with legislation or technology or otherwise, can exert power in other ways. Whether that’s participating in the democratic process, or using market forces to tell corporations how we feel, in the same way that governments and corporations can control people, they also depend on people, who have opinions and feelings that eventually have to be respected.

And that’s why I want to do talks like these – the content can be a bit scary at times, but I’ve seen the academic papers that describe how large-scale surveillance systems can be practically achieved in the real world soon. Not in fifty years time or in twenty years time, but genuinely soon. I mean, I’ve contributed in some way towards creating one, even if my ethics have gotten in the way of me making a lot of money off it. So I want to get the word out, that this technology is coming, and if we are too complacent about it and let surveillance happen to us, before we know it we might be living in one of those science non-fiction dystopias. So the next time that you see a surveillance camera, look into it and ask yourself – do I know who owns this camera? Do I know how this footage is being processed? Do I know what they’re going to do with the data? Did I meaningfully consent to being observed? Do I trust the owner to actually do what’s best for me? Am I getting something out of this camera being here, or is the benefit entirely for someone else? And more importantly of all, how does this all make me feel?

There is a quote that is often attributed to Thomas Jefferson, even though it turns out he never actually said it, but it makes a good point so I’ll close with it here anyway: “An informed citizenry is a vital requisite for our survival as a free people”. I hope that if we can all become more informed, then we can fight against poor uses of surveillance technology, and technology in general, and keep our freedoms. Thank you for coming along to talks like this, for continuing to learn, and for keeping your minds open to new voices. Ngā mihi nui ki a koutou katoa, thank you very much for taking the time this evening to listen to me.

Sunday, 20 May 2018

Privacy Bill Submission (2018)

1. Thank you for the opportunity to provide a submission. I am currently a PhD Candidate in Computer Systems Engineering, investigating embedded vision and video analytics. As technology continues to improve, new types of applications will be enabled that allow for the greater and faster extraction and collection of data and information about individuals. As part of my research, I have sought to understand the implications of camera-based surveillance systems on privacy, how we can protect privacy using technology during system design, and the drivers of public perceptions of privacy.

2. I am happy that the Bill places specific emphasis on “promoting people’s confidence that their personal information is secure and will be treated properly”. Without a strong expectation of privacy, our society would be far more insular, and the barriers and costs of interaction would be much higher. The proposed Privacy Bill is a step in the right direction, but it is only that – a step. The proposed changes, particularly giving the Office of the Privacy Commissioner more powers to investigate privacy breaches, requiring public notification of privacy breaches, and introducing compliance notices, are sorely needed in the digital age where private information flows more freely than ever before. I am generally in favour of the proposed Bill. However, the protections given in this Bill need to be extended further to ensure that we have adequate protections for individuals and their information going into the future. The Privacy Bill also needs to become more enforceable to de-incentivise non-compliance. All subsequent suggestions should be taken to be additive, i.e. that they are added on top of the existing Bill, not replacing any of the existing parts.

Information Privacy Principles
3. New Zealand is fortunate to have a set of strong Information Privacy Principles (IPPs), as elucidated in s 19 of the Bill. As new technologies are developed, along with their associated opportunities and threats, it is helpful that we can return to and apply the same set of Principles that can be used in a wide variety of circumstances. I strongly support the continuation of the use of these Principles.

4. However, IPP6 needs to be further extended to provide better protection for individuals. The “Right to Access”, as presented in the European Union’s General Data Protection Regulation (GDPR) goes further than IPP6 to allow for greater transparency. Confirmation that the agency holds information or not, and access to that information, is insufficient. I believe that agencies should also, upon request, be required to state how personal information is being stored, the specific purposes for which the information is being collected (as already included under IPP3, but available after the collection of information), whether data will be used anonymously or not, how data is being shared, and how data was acquired. Making these details available is critical for allowing individuals to understand, after data collection, where their information will go and who will have access to it. Importantly, it is also a source of evidence for individuals seeking to understand how their information has ended up somewhere unexpected.

5. There is perhaps more scope to include the findings of the Data Futures Partnership into this Bill. Their work focused on social license and improving public confidence and trust around the use of data. In particular, the specific questions that have been identified by the Partnership that should be answered about data use could be built into IPP6. Extending beyond the details included in the previous paragraph, this includes identifying what the benefits of collecting the information are, and identifying who receives those benefits, as well as stating whether there is potential for data to be sold or used for other secondary purposes that are not stated at the point of data collection.

6. I note that s 28 means that only breaches of IPP6 by public sector agencies are enforceable in a court of law. This is a positive step forward from the status quo, but it is definitely not enough. Agencies that breach the privacy of individuals need to be held accountable, more concretely than through a compliance notice. The Human Rights Tribunal may be the only recourse for most individuals seeking restitution for privacy breaches, but this process is too slow and the barriers too high for many individuals. While we may hope that these never need to be used, it is important that stronger civil penalties are eventually introduced, with adequate infrastructure to support the associated justice processes, so that privacy is taken very seriously and not treated as a secondary concern.

7. While the Commissioner has power to obtain information during investigations (s 88), in order to issue compliance notices (s 129), or to determine whether personal information can be transferred (s 194), the penalties for not co-operating with this under s 212 are worryingly weak. In some cases, without the co-operation of the Agency, it may be impossible for the Commissioner to obtain the necessary evidence for determining if a privacy breach has occurred. For example, a large company may be internally using collected data for secondary purposes that are not covered by their Privacy Statement or notified to customers. Even though the Commissioner may suspect that something is wrong, they cannot prove that anything is wrong without the co-operation of that company. The large company may well choose that they would rather pay a small fine for obstructing the investigation, than to be subject to a more public compliance notice or Tribunal hearing. Stronger penalties are required, and exemptions such as the “reasonable excuse” defence should be further limited or removed, as recommended by the Privacy Commissioner in their Report to the Minister of Justice under s 26 of the Privacy Act from 2017 (
8. At the same time, giving the Office of the Privacy Commissioner more investigative powers requires sufficient oversight. It appears that there is little opportunity for appeals against requests for information, or for a complaint to be laid against the Privacy Commissioner for vexatious requests. For example, there exists the potential for a Privacy Commissioner to demand information repeatedly, or for information to be demanded that is on the borderline of the Privacy Commissioner’s scope. Appropriate checks and balances need to be in place in order to improve public confidence and trust in this system. It may be helpful to provide an intermediary ombudsman or similar oversight body to allow for appeals without having to go through the Court system.

9. In general, the Commissioner needs more powers to investigate whether appropriate privacy protections have been put in place. A step below the Compliance Notice may be a “Please Explain”-style notice that is commonly used by stock exchanges and other agencies in financial areas. This may be useful in a scenario where the Commissioner is not sure if a breach of any IPP has occurred, but there is strong potential for an IPP breach and there is public interest in determining if this is the case. For example, recent revelations that Foodstuffs are using a security product from Auror that uses facial recognition to detect shoplifters led to some public concern about the integrity of that system ( In this case, I believe that there would be significant value in allowing the Privacy Commissioner to ask Foodstuffs to provide more details about the system, and for the Privacy Commissioner to determine if a subsequent investigation into an IPP breach is necessary. If the Privacy Commissioner determines that the system is actually compliant and that there are no concerns, then that can help allay the fears of the public, improving public confidence. This option gives Agencies an opportunity to co-operate with the Office of the Privacy Commissioner before the more punitive step of issuing a Compliance Notice, and gives the Privacy Commissioner an opportunity to spot potential issues and provide advice so that Agencies can rectify any issues before harm can accrue.

Anonymisation and Re-identification
10. On the protection of “anonymised” data from re-identification, I believe that the Privacy Commissioner’s proposed amendments (, which include controls and penalties on the re-identification of previously anonymised data, are almost adequate. The idea that intentionally de-anonymising data for nefarious purposes should be a criminal offence should be supported. However, identifying the intent is important, and there should be exceptions in place for those with good intentions. For example, academic researchers who discover that anonymised data can be de-anonymised should be given an opportunity to disclose that to the Agency and the Privacy Commissioner or similar regulatory body, and not suffer negative consequences as a result. Penalties should exist for Agencies that release poorly anonymised data that can be easily re-identified in order to incentivise Agencies to take appropriate care in anonymising and releasing that information.

11. As a final point, I urge the committee to remain steadfast in a Principles-based approach to privacy. My recent research into public perceptions of privacy in the context of surveillance cameras has shown that the context of how data is being collected, stored, and used is incredibly important for public confidence and acceptance of surveillance cameras, and this is likely applicable to other contexts. Creating specific rules that dictate how to protect privacy will lead to loopholes, non-compliance, and ultimately reduced public confidence in the efficacy of those privacy protections. One-size-fits-all privacy protections will not work – the nuances of each individual application and scenario can significantly change whether something is considered to be appropriate or not. Trust in how our privacy is protected is critical for public confidence. Our current Principles allow for flexibility so that a wide variety of applications can be considered, but also need to be further extended to provide sufficient protections for individuals in the digital age.

12. Thank you again for the opportunity to make a submission to this Bill. I would be glad to make an oral submission, and understand that all submissions will be available publicly.

Andrew Chen

Sunday, 11 March 2018

A few months of owning an e-bike

After buying the e-bike in late December, I’ve used it almost every day (skipping the days when the tropical cyclone was being a bit of a nuisance or I wasn’t in Auckland). It’s been great – absolutely worth the money that I paid for it. Having the e-bike has meant I can go way, way further than I used to – even heading out to the airport for a 50 minute ride just to try it out. Bernard Hickey asked for an honest evaluation six months after buying the e-bike. Here’s one after 2-3 months.

Hills and Distance
This is by far and away the best benefit of having the e-bike. A trip that used to take me about 15 minutes from City and Newmarket campus now takes 7. On top of that, I arrive far less sweaty and able to get into work straight away, rather than needing another 5 minutes to recuperate. As alluded to earlier, my typical manual bike range was probably around 5km on a good day – if everything was flat I could go much further (I biked about 20km around Mission Bay and St Heliers without any issue on my old bike), but the hills around the city really zap my energy (and I’m not that fit). With the e-bike, I did about 30km in a single trip without feeling tired at all (although I did stop for ice cream in the middle).

I managed to get some pannier/saddle bags for my bike on sale for about $60, adding 60L of carrying capacity onto the bike. I can also wear a large backpack, and now it’s actually practical for me to do a weekly supermarket trip on the bicycle. No need for a car, and I can get about 3-4 shopping bags worth of stuff back home easily. This has become particularly important because the 277 shopping centre in Newmarket has been closed for upgrades, which means Greenlane is the nearest supermarket (3.5km instead of 700m). As a bonus, the pannier bags are also great for board games, and I can get a large selection of games along to any game night. Also as a bonus, these pannier bags come with built-in waterproof covers that are also a reflective bright yellow, helping make the bike more visible at night.

It is so much easier to go to events now, rather than sitting at home and feeling like it will be a huge hassle to go anywhere. When I really did want to go a friends place, I’d have to pay a ridiculous amount to Uber/Zoomy there, but now I can just ride the bike. There’s something about the freedom/liberation that comes with having access to transportation; I’m not an expert, but it’s probably something that’s well known in social sciences. It also helps with Pokemon Go – being able to get to raids quickly means actually spending less time on it overall. I’ve also done a few trips where I’ve just gone out to random places for dinner, cycling about 10km to get to some highly reviewed place on Zomato just to try it out. I never would have bothered before if I had to public transport there and back.

My complicated three-step bike security system seems to be working. Apart from the standard bike chain and my SmartHalo alarm, a critical step is the rear wheel lock. If it hadn’t come with the bike, I probably wouldn’t have bought one. Essentially, it’s a key-lock that puts a bar through the spokes of the rear wheel, effectively rendering it stationary. No one can ride it away, meaning that if someone wants to steal the e-bike then they will need to pick up and lift it into a truck or whatever. It’s pretty heavy, so that’s not easy to do quickly without being noticed by someone. Also by that point, the alarm detects enough motion to go off, and the (very) loud alarm should mean they drop it and run. The weird thing about security is that you don’t really know if it’s working unless your stuff gets stolen (in which case you know that it didn’t work), but thus far it has had a 100% success rate in keeping my e-bike safe.

Assist Levels
When I first got the bike, I was told that an assist level of 2 or 3 would be enough in most cases. The way that the sensor/motor works on different bikes, but on mine it seems that setting the assist level essentially sets a target speed for the bike. If the bike is going below the target speed, then the motor turns on, and if the bike is going above the target speed, the motor is off. Level 1 is about 12km/h, Level 2 is about 18 km/h, Level 3 is about 25km/h, Level 4 is about 30km/h, and Level 5 is uncapped (always on). When you first ride the bike, Level 3 feels fast enough. After awhile, riding on the road next to cars and buses going past at close proximity at 50km/h, Level 3 doesn’t feel like enough. Since I predominantly use my bike for commuting, and I’m often trying to get there as quickly as possible, level 5 is the most common setting that I use now. Is it bad for the battery lifespan? Probably. Does it get me where I need to go faster? Probably not given that traffic lights are the main problem. Does it make me feel a little less anxious about being late? Yes.

Basically, saying that e-bikes are faster is really oversimplified. The main pain for most cyclist is hills, and you definitely go faster on hills with an e-bike. The motor also really helps with acceleration, getting you up to speed quickly which makes you feel faster. However, on most e-bikes the top speed is capped at about 32km/h, which may be fast enough in most circumstances, but is actually slower than the top speed that you can achieve on a standard bike. On the flats, I have noticed that I can get to about 35-38km/h, which is a little bit slower than on my old mountain bike. I’m pretty sure that I can put this down to the added weight on the bike, meaning that you need to generate quite a bit more force to keep accelerating after the motor has cut out. While you can get to the top speed on an e-bike more easily, it may not as high as you’re used to. This is perhaps most important when you are on the road, where there are cars and buses going at 50km/h and you sometimes need to take the lane and might struggle to keep up.

“It’s still just a normal bike when you run out of battery”
This oft touted benefit of using an e-bike in comparison to buying a motorcycle or moped that stops running when it runs out of fuel is strictly speaking true. Except that you’ve probably gotten used to riding an e-bike over the last couple of months. The e-bike is maybe 8-10kg heavier than your old non-electric bike. Your legs aren’t as strong as they used to be when you were struggling up hills every day. The combination of these two factors means that actually, riding the e-bike around without the electric assist is not that easy. Doable for short distances, but if you run out of battery far from home, don’t expect to be able to just ride it like it’s a normal bike the rest of the way.

Battery Capacity and Voltage
The battery indicator on the computer for most e-bikes is based on the voltage. There is a relationship between voltage and battery capacity – for a 36V battery, at fully charged it is probably actually outputting 40V, and it drops down to around 30V when it is running out of charge. The bike computer typically tells you that you’re running out of battery around the 32-33V mark, while the battery itself probably waits until you’re closer to the 28 or 29V mark. Does this matter? Well it turns out that voltage is proportional to your bike’s acceleration – on a fully charged battery, it actually does feel quite a lot faster than when the battery is close to dead. By the time the battery is down to 32V, full power isn’t enough to get you up the hill anymore. So it means that you end up charging the battery often, which means putting the battery through more charge cycles, which maybe means reducing the overall lifespan of the battery (but maybe not). In practice for me, I’m charging the battery once it drops to about 50% capacity as indicated by the battery, which I suspect is about 32V. This also means that the range may be a little less than what is actually claimed if you’re recharging the battery once it gets to about 50%.

So that’s a follow-up on what I’ve found out about the e-bike after riding it for almost three months. I should say that the pros definitely outweigh the cons for me, but I can also see why people end up upgrading to more powerful bikes after a few years. If you can afford it, you might like to move straight to the higher end ones and it might last you a bit longer. Hopefully this post is helpful for providing some information about the more practical considerations beyond shiny marketing materials, and if you have any questions feel free to get in touch on Twitter (@andrewtychen).