The promises and challenges of digital data
Digital technologies are visibly ‘disrupting’ how our societies work (Owen 2015), and this also has profound implications for those of us studying Asia. Digital tools make it possible to assess vast amounts of data, systematically explore their patterns, and visualize the results in compelling new ways. Indeed, for the case of China, digital methods have been deployed to study a wide range of intriguing issues: how censorship works (King et al. 2013), how public opinion and belief systems are polarized online (Wu 2014), what exchanges on Sina Weibo might tell us about specific social phenomena like suicide (Fu et al. 2013), and – in my own work – what the structures of China’s internet reveal about ‘Sino-phone’ web spheres and internet governance (Schneider 2015a, 2015b).
And yet the use of digital tools for research has also attracted criticism. Most recently, Daniel Allington and his colleagues have argued that the prominence of so-called ‘digital humanities’ approaches is skewing opportunities to receive research funding, that the digital methods project as a whole undermines humanist education, and that current trends to promote digital approaches to knowledge ultimately facilitate ‘the neoliberal takeover of the university’. While I sympathize with such concerns, I also believe that such arguments are getting the logic of digital research wrong. If digital technologies are used to facilitate the neoliberal project (as they at times may very well be), then this is a symptom, not the cause, of a broader problem: that universities are being eroded by the pernicious logic of accelerated capitalism. It is most certainly not in the nature of the digital to promote such a logic, and digital tools can be used to challenge, critique, and ‘hack’ that rationale, as political and digital arts projects have shown.
As China scholars, digital research methods can provide exciting windows into the topics we study. It would be premature to bar these window out of fear that the glass might be ideologically tinted. This is not to say that digital research is unproblematic, but I do believe that the well-intended ideological criticism may at times distract from the very real practical challenges that such research faces today, particularly in the context of studying China. Three challenges strike me as particularly pressing:
1. Data access
Digital data now represent practically any aspect of contemporary social life, and yet such data remains highly restricted. Scholars do not normally have unfettered access to government data bases or to the data feeds and meta-data of social media corporations. This is a particular challenge in the Chinese case, where censorship restrictions severely limit what scholars can access. In such a context, even an ostensibly trivial task like retrieving data through a company’s application program interface (API) can become prohibitively difficult. Sina Weibo’s API, for instance, requires users to upload their full personal information, including a digital copy of their passport. Other services in digital China require a Chinese mobile number to verify credentials. Since spring of 2016, new restrictions in China apparently prohibit the sale of anonymous prepaid sim card in the PRC. If this is indeed the direction in which restrictions on information and communication are headed in China, then this would effectively exclude foreign visitors without a permanent Chinese mobile phone contract from many activities in digital China, including research opportunities.
A result of such practical challenges is that researchers frequently only have access to very limited digital data. This, of course, is not solely a Chinese problem. As Mirko Schaeffer from the Utrecht Data School pointed out at a recent conference on digital disruption in Asia, the practical idiosyncrasies of digital research often confront scholars with ‘data in search of a question’. This poses a very real incentive to use data that are conveniently available and then mould research questions on top of that data. Such a pragmatic approach stands in stark contrast to the more accepted academic procedure of asking questions and then answer them with appropriately selected materials. The challenge to get the right data can thus quickly turn into a challenge to academic standards.
2. The ethics of digital research
Getting the right data for digital research also raises ethical questions. For the Chinese case, it can often be prudent to collaborate with Chinese institutions to secure data access, for instance social media corporations or data research centres at universities. Considering that such institutions ultimately answer to state directives, this may mean that data-driven research by international teams feeds back into Chinese state agendas. Depending on how the research results ultimately get disseminated and used, researchers may find themselves in the awkward position of assisting the unapologetically surveillance-focused Chinese state as it further hones its analytical capacities. The ethical implications of such collaborations are hard to gauge.
Whether scholars collaborate with central Chinese stake-holders, or whether they use data that they have scraped themselves, their research is likely to touch on complicated privacy concerns. Aside from the fact that social media data may generally not be as ‘public’ as they seem, even in truly public settings, like open chat rooms or public Weibo accounts, researchers may find that their studies direct attention to specific online practices in ways that users did not originally anticipate. What, for instance, happens if research into bloggers or microbloggers inadvertently shines a spotlight on activities that previously seemed innocuous? What happens when research unintentionally politicizes such activities, or when it helps make specific users identifiable to those who object to certain types of online behaviour? Scholars with field-work experience are of course very familiar with the need to keep the identities of research subjects secret, but the use of digital data magnifies this challenge. Digital data points now make it possible to trace and identify individuals to an extent that most scholars may not be in a position to properly assess before the fact.
As important as it is to protect research subjects, it is also important for scholars to protect themselves. The Chinese government has a history of arbitrarily deciding what kind of data should be prohibited, and scholars who mine or otherwise retrieve digital data from China may unwittingly and retroactively find themselves the target of the PRC’s counter-espionage and national security laws. The fact that these laws remain extremely vague on what constitutes, for instance, a ‘state secret’, only exacerbates this problem. This provides a strong incentive for scholars to err on the side of caution, which is arguably an intentional effect (Hassid 2008).
3. Analysing the data
Even where researchers have successfully managed to retrieve ethically sound, useful data, they are still confronted with the challenge of analysing (and possibly visualizing) their data in meaningful ways. This is likely to involve computational tools like corpus analysis or network mapping software. Using such digital tools includes many non-trivial decisions that require a firm grasp of what the software can and cannot achieve. What, for instance, does it mean if particular keywords frequently appear in close proximity to other keywords? How does the amount of links that a node in a network receives translate into social or political relevance? What does a piece of software actually do when it calculates various parameters for a data set? What happens when it visualizes the data, for instance by representing all the nodes in a network as red squares rather than green circles?
These are by no means banal concerns. Some are conceptual: As David Berry has argued, digital technologies change how we make sense of the world, and scholarship that uses digital tools would be well advised to ask how such tools affect the perspective of the researcher, or in other words how software changes knowledge. Other concerns are practical, and this is where scholars outside of computer science will likely find themselves beyond of their comfort zone: what is happening ‘under the hood’ of the digital tools we use? It may be asking too much of busy scholars in the humanities and social sciences to now learn advanced coding skills to take apart the programmes they are using, but a modicum of code literacy is arguably required to fully understand the implications of conducting research with software.
Studying China through digital data will remain fraught with such challenges in the foreseeable future. However, these challenges are themselves instructive. They highlight how the idiosyncrasies of digital media usage affect our pursuit for knowledge. Scholars with area-studies expertise, and China scholars in particular, are well positioned to contribute their experiences in different media ecologies to such important debates. Rather than asking how digital data shape a society like China, we should thus also ask: how does China shape digital data?
This post has also appeared on Nottingham University’s China Policy Institute blog.
Fu, King-wa, Cheng, Qijin, Wong, Paul W.C., & Yip, Paul S.F. (2013), ‘Responses to Self-Presented Suicide Attempt in Social Media’. Crisis, 34(6), 406-412.
Hassid, Jonathan (2008), ‘Controlling the Chinese Media – An Uncertain Business’. Asian Survey, 48(3), 414-430.
King, Gary, Pan, Jennifer, & Roberts, Margaret (2013), ‘How Censorship in China Allows Government Criticism but Silences Collective Expression’. American Political Science Review, 107(2), 1-18.
Owen, Taylor (2015), Disruptive Power: The Crisis of the State in the Digital Age. Oxford: Oxford University Press.
Schneider, Florian (2015a), ‘China’s Info-Web: How Beijing Governs Online Political Communication about Japan’. New Media & Society, first view.
Schneider, Florian (2015b), ‘Searching for “Digital Asia” in its Networks: Where the Spatial Turn Meets the Digital Turn’. Asiascape: Digital Asia, 2(1-2), 57-92.
Wu, Angela Xiao (2014), ‘Ideological polarization over a China-as-superpower mindset: An exploratory charting of belief systems among Chinese Internet users, 2008-2011’. International Journal of Communication, 8, 2243-2272.