Tag Archive for: big data

China’s extensive use of genetic information sounds a warning

‘Do not miss a single family in each village, do not miss a single man in each family’ (村不漏姓,姓不漏人). This is the aim of the China-wide male family screening system project (男性家族排查系统项目), according to official government documents from Gansu province obtained by the New York Times. As China increasingly relies on biometric data collection for public and national security purposes, it is time for democracies to address its role in their systems.

Under Secretary General Xi Jinping, China has been steadily expanding the surveillance of its citizens through new biometric technologies. The government’s identification of genetic resources as a national security asset has reinforced the domestic surveillance apparatus. It has also boosted China’s defence and biomedical research and, perhaps unintentionally, strengthened the protection of its citizens’ data against access by foreign powers.

In contrast, the limited discussion in democracies on genetic and biometric data, and the profound risks and opportunities it presents, poses a national security risk. The low vigilance of foreign access to genetic data creates privacy and security concerns and potentially facilitates ethically questionable research by external actors. China has recognised this lack of caution and foresight and is already working to maximise the information and capability advantage it holds.

China is known for using biometric technology to expand its surveillance and security apparatus. Apart from an increase in the number of provinces establishing male DNA databases, little has changed in the scale of the data collection since ASPI’s Genomic surveillance report in 2020, which estimated that DNA samples had been collected from 5–10% of the country’s male population. What has changed is the sophistication and strategic thinking behind this program. The documents obtained by the New York Times explain that government purchases of advanced bio-surveillance technologies are for the management and control of the people (对人员的管控) and to realise the comprehensive collection (全面采集) of samples. The work is often conducted in the name of social stability and crime fighting, but there’s no evidence to suggest that the men targeted for collection are criminal suspects.

This is part of an ongoing biometric surveillance regime that expands the government’s control over its citizens. While the collection methods remain multimodal, such as an expansion of voice- and facial-recognition systems, there’s been a shift to more invasive collection of personal identifiers that are less likely to change over time, like DNA sequences and iris patterns. Indeed, Chinese authorities have already been forcibly mass-collecting biological samples from ethnic minorities like the Tibetans since 2013 and the Uyghurs since 2016. The overall surveillance doesn’t discriminate between ethnicities, however, and is part of the mass DNA collection campaign conducted under Xi’s leadership.

Chinese Premier Li Keqiang pushed for the introduction of big data into everyday life at a state council executive meeting in 2016, and specifically mentioned health and medical data. This was soon after the release of the ‘action outline for promoting the development of big data’, which emphasised the importance of worldwide collection in the context of rapid global informatisation.

China has not only recognised the benefits that genetic data brings to the surveillance of its citizens (and foreigners), but also realised its potential as a wider national security resource, particularly for military research. The result was the new draft guidelines on managing human genetic resources that the state council promulgated in March. It contributes to the implementation of the Regulation on Human Genetic Resources following the adoption of the Biosecurity Law and Data Security Law. The genetic information of Chinese nationals is prohibited from being sent abroad and human genetic databases must be catalogued every five years.

Although the intent was probably to stop rival countries from accessing this valuable information, it resulted in heightened security of genetic data through export controls and a standardised management system. The protective legislation has, however, still failed to protect citizens’ data from being exploited by the Chinese government itself. This is particularly worrying given its growing potential uses to control the population.

China’s active and exploitive use of genetic data should serve as an impetus for Western democracies to consider genetic data as an intrinsic part of national security. As emerging technologies create more potential for the use of genetic data, so too do they increase the urgency of establishing clear direction to prevent data abuse and genetic surveillance. The US National Counterintelligence and Security Center has identified China as a primary strategic competitor due to its resourcefulness and comprehensive strategy, specifically its military–civil fusion policy and National Intelligence Law. It highlighted the risk of genomic technology being used to identify genetic vulnerabilities in a population, and the threat that data relating to people’s ancestry would be misused for surveillance and societal repression.

The data provided to foreign researchers should be restricted to civilian medical and scientific research, although China’s concept of military–civil fusion makes due diligence easier said than done. Military–civil fusion promotes acquiring intellectual property, technology and human resources from the academic and private sectors. An example is the Beijing Genomics Institute (BGI), a company that collaborates with the People’s Liberation Army and military hospitals on genetic research programs that enhance soldiers’ performance and improve ‘population quality’. It also happens to be one of the biggest providers of Covid-19 and prenatal tests worldwide.

This type of research requires a comprehensive database of the variation in the human genome, and BGI has easy access to foreign genomic data through prenatal tests. Reuters reported that online records show that at least 500 women, including women outside China, who have taken BGI’s prenatal tests have their genetic data stored in the government-funded China National GeneBank.

BGI researchers have already used the genetic data generated from prenatal tests of more than 141,000 Chinese women to study genetic associations, describing it as an ‘untapped resource’. While there’s no evidence to suggest the use of genetic data from foreigners, BGI’s storage of and access to a diverse set of human genomes provide the potential for future big-data analysis.

While there’s much discussion on China’s current technologies and capabilities, we must also prepare for their future potential. Having a comprehensive gene bank that includes diverse sets of genomic data may not seem problematic now, but the emergence of powerful artificial-intelligence tools may expose previously underappreciated vulnerabilities.

Perhaps it already has. A BGI researcher worked with China’s National University of Defence Technology to develop software to speed up the sequencing of human genomes using the Chinese supercomputer Tianhe-2. The university was blacklisted by the US as a threat to national security specifically because of its access to and ownership of Tianhe-2, which is capable of simulating nuclear explosions.

More countries need targeted legislation for genetic data export to limit the ease of access by foreign companies and governments. Clients engaged in medical tests should have standardised consent and transparency in how their genetic information is handled and stored. And they should be informed of their privacy rights.

Competition with China is difficult when the state fosters a nexus between the military and the market, and biomedical technology is expanded through a coordinated effort. As China advances in research and development, countries like Australia risk being exploited and left behind. In this instance, we are not only aiding foreign military development but are turning a blind eye to significant privacy and surveillance concerns.

Finding Australia’s asymmetric advantage in big data

It’s important to think beyond the cliché that data is the new oil. The dystopian dream of seamless data integration and the ability to ‘collect and know it all’ overlooks the complex politics of data. National borders and ambitions make for a landscape characterised by Balkanisation, conflict and contestation.

Countries know that strategic advantage is lurking somewhere in the data—the signal in the noise. However, a key question for all countries is whether that advantage lies in ensuring centralised control of data, in promoting openness and transparency, or in prioritising the use of data to serve the public interest.

It would be easy to pull up the drawbridge and regard increasing calls for digital sovereignty as calls for autarky. Mobilising Australia’s asymmetric advantage relies on our ability to lean into our democratic legacy to avoid perverse mirrors of the systems of data surveillance, or ‘dataveillance’, we frequently criticise. This means moving from a mindset of tech adoption and data dependence to one of strengthened interdependencies, international partnerships and civic engagement.

Nearly 10 years ago, CIA Chief Technology Officer Gus Hunt articulated the ‘collect it all’ theory of cybersecurity and mass dataveillance: ‘The value of any piece of information is only known when you can connect it with something else that arrives at a future point in time. Since you can’t connect dots you don’t have, it drives us into a mode of … try[ing] to collect everything and hang on to it forever.’

This collect-it-all ethos increasingly defines vast areas of national security and economic and social life. We are living in an age of big data where seamless integration is seen as a strategic necessity.  Much rests on our ability to convert the potential of data into a resource. This would extend the national conception of where and how we derive and trade economic value.

It’s easy to get swept up in a mindset that sees modernisation of data infrastructure as an essential good. Although corporate and government jargonistas would have us believe they’re scaling us into the stratosphere, many societal problems remain intractable.

A recent example of the tensions common to many corporate and government data projects is the pause of the ‘battle management system’ component of the Australian Army’s Land 200 project. The chief of army described the digital command and control of forces as the service’s highest priority project: ‘[W]hen we build a network and connect all the parts of that force to that network we are greater than the sum of the individual. It is the improved quality of command and control of all aspects of our operations, so it’s not about high-end warfare, not about counterterrorism. It’s about everything we do.’

The battlefield management system was intended to connect every vehicle and soldier to a secure tracking system so that commanders would know the exact location of all their personnel and equipment, and so that every soldier would see, through moving symbols on a personal or vehicle-mounted computer screen, where everyone else was.

The cost overruns, the significant governance issues identified in a scathing audit report and the pause of the program represent common vulnerabilities in data projects.

It’s tempting to buy equipment off the shelf to save money, but we have to think about how enterprise data projects are products of unique organisational cultures, human capital and infrastructure densities.

The great promise of 360-data systems is that they allow a busy executive or commander to determine problems and opportunities by glancing at an appealing dashboard without needing to understand the shape of the data.

However, this assumes that organisations have a solid governance and hygiene strategy for managing data that may be inconsistent, poorly integrated or from questionable sources. If the right questions aren’t asked of the data, what appears on a dashboard may be meaningless or deceptive. In the quest to rationalise and simplify, a dashboard may indicate that something is wrong, but not how bad it is.

Lack of data literacy from the top to the bottom of organisations is a widespread vulnerability. In an age defined by attention deficits, the promise of complex problems neatly solved on PowerPoint slides concedes too much to slick sales agents.

And all of our sophisticated tools for handling, searching, linking, sharing and analysing data in mind-numbing iterative and synergising processes may give us a misdirected sense that we’re on the path to change, when in fact we’re simply replicating the status quo.

Despite the claims of tech evangelists, the tendency towards indiscriminate corporate collection and monitoring of everything risks lowering the ability to understand complexity. Big data can construct new realities where ‘face-tuned’ algorithmic acceptability becomes the ideal.

The worst aspects of data analytics reduce diverse human lives and multidimensional social structures into data points. The functional and the mechanistic are preferred over the contradictory and paradoxical, and the critical value of weathered experience, good judgement, the obscure and the marginal can be lost.

The collect-it-all ethos, and the 360-degree view of human and ecological systems it might offer, could be endlessly exploited by the powerful. Citizens grouped into pseudo-scientific psychological profiles make rich fodder for microtargeting by rogue governments, foreign agents, charlatans and corporate executives.

There are inherent inequalities in data systems. A report commissioned by the Dutch Data Protection Authority 12 years ago estimated that the average Dutch citizen was included in 250–500 databases, and up to 1,000 for more socially active people. This can occur through over-monitoring of some groups and individuals (reinforcing over-policing, for example) and under-monitoring of those without access to data systems.

Data always presents a partial view, but even more pressing is the issue of data power asymmetry. The collection of our movement, health and demographic data is increasingly given over to global and national monopolies for control and profit extraction, rather than to further the national interest.

With fewer firms possessing the human capital and brute computer-processing power to handle complex data, we face a narrow world where rich and paradoxical human experience is nudged and reduced towards more ‘efficient’ and predictable choices. In the process, we lose a sense of a democratic data ideal—to use this resource to preserve the public good.

This translates to the setting of international norms about data, another highly contested area. When we assume that data is perfectly interchangeable, we miss the important ways that regulatory frameworks and international standards ensure the continuation of cross-border data trade.

Blockchain, for example, is meant to smooth the flow of information between jurisdictions, but when single countries insist on global compliance with their domestic cryptographic protocols, this can force the opening of backdoors that pose security risks. We should not lose sight of the importance of maintaining an international system that provides some form of common regulatory oversight over data transfers.

China has made data a matter of national security, and has placed strict limits on how it can be stored and taken beyond the ‘Great Firewall’. Although there’s an economic case for this form of data mercantilism, it’s a far cry from the early promises of a free and open internet. China has more to learn from the comparative openness of democratic systems as a source of innovation, which it has tried to gerrymander through strategic theft of intellectual property.

But within our own borders, we may not even know our own data network’s strengths and weaknesses. Many data systems function because they’ve been effectively patched over by temporary measures. Although there’s a lot of attention on modernisation of tech infrastructure, most of our tech spending is on operating and maintaining legacy systems, not on improvement.

A recent US Government Accountability Office report found that of the US$90 billion the federal government spent on information technology in 2019, nearly 80% went towards operation and maintenance of existing systems.

In Australia, a 2019 review of IT spending in the Australian public service concluded that ‘agency capital budgets are under-funded and there is strong evidence of a technology deficit across the APS, with some major legacy systems at or near end of life’. That review called for an audit of infrastructure, which might provide a strategic direction for a government otherwise relying on a passive approach to tech adoption.

A 2016 GAO report found that the US departments of commerce, defense, treasury, health and human services, and veterans affairs were still using 1980s and 1990s Microsoft operating systems that the vendor had stopped supporting more than a decade before. The same mix of Australian departments may be similarly vulnerable.

Frank assessment of Australian networks and infrastructure risks may evolve as the pivot to a risk-management approach in government information security becomes more widely understood.

It’s impossible to eliminate all data security risks. All organisations want to avoid a breach that adversely affects individuals or national revenues associated with intellectual property and collective surveillance. It is within our grasp to drive for a high standard of preventive risk reduction and to create a data regime that acts in the national public interest.

There are underexplored opportunities in Australia for genuine use of data that a top-down collect-it-all ethos might miss. There are examples of experimentation with data that build citizen engagement and participation into regulatory processes. One is the Taiwan process established by a civil-society movement at the invitation of Taiwan’s minister for digital affairs.

Australia needs to ensure that data benefits are more evenly distributed and that data harms are not socialised. If data frameworks don’t balance preservation of Australia’s democratic system and development of our skills ecosystem, we’ll end up with a digital and data dependence that compromises our sovereignty and doesn’t provide the strengthened digital interdependence we’ll need to navigate a period increasingly defined by digital geopolitics.

China’s big-data big brother

The Communist Party of China’s (CPC) decision this week to eliminate presidential term limits seems to open the door for President Xi Jinping to be not just ‘Chairman of Everything’, but also ‘Chairman Forever’. The move has been met with dismay around the world, but it has also intensified an ongoing debate among China experts over whether the biggest threat to China is too much executive power, or too little.

Where one stands on that question seems to depend largely on whether one is a political scientist, an economist or a technologist. Many political scientists and legal scholars, for example, argue against the change because they consider the model of collective leadership that the CPC established after 1979 to be one of its biggest successes. That model’s term limits and system of peer review for high-level decision-making has provided the checks necessary to prevent a repeat of Mao-era catastrophes such as the Great Leap Forward and the Cultural Revolution.

In fact, the post-1979 dispensation has often allowed for a genuine battle of ideas, particularly between the statist Young Communist League and coastal elites who favour more economic liberalisation. China may remain a closed society in many ways, but its top policymakers have shown an open-minded willingness to experiment and learn through trial and error.

Many economists, meanwhile, are less worried about excessive executive power because they think it is even more dangerous to have a government that is too weak to overhaul the country’s economic model when needed. Among the government’s current economic challenges are slower growth, spiralling debt—particularly among state-owned enterprises—and vested interests standing in the way of structural reforms.

Most economists would concede that the collective leadership model has prevented disasters. But they would argue that it has also impeded reform, and allowed the CPC to become a syndicate of corruption and cronyism, ideologically bereft and devoid of purpose.

At the end of Hu Jintao’s two-term presidency in 2013, many feared that the collective leadership model was inadequate for confronting vested economic interests, tackling inequality and delivering basic public goods. Indeed, as early as 2007, Hu’s own premier, Wen Jiabao, had concluded that China’s economic trajectory was ‘unstable, unbalanced, uncoordinated and unsustainable’.

By contrast, argue the economists, Xi has begun to turn things around by fighting for a ‘cleaner party’. His massive anti-corruption campaign has jailed thousands of party officials at all levels, and re-established the CPC’s grassroots credentials. The economists would concede that Xi’s campaign has also conveniently removed many of his potential rivals. But they would argue that his strengthened position now allows him to replace a growth model based on credit-financed debt with something more sustainable.

Of course, whether they are right about that remains to be seen. Despite Xi’s success in consolidating his power and extending his hold on it indefinitely, there is reason to doubt that he would be willing to risk a new economic model if sustainability proves to be incompatible with maintaining rapid growth.

This is where the technologists come in, by offering new ways to correct or avoid potential mistakes. In addition to supplanting the collective leadership model with one centred on the personality of a supreme leader, Xi has also significantly expanded the surveillance state. The government is increasingly using CCTV, big data and artificial intelligence to study Chinese citizens’ behaviours, hopes, fears and faces so that it can forestall dissidence and challenges to its authority.

Moreover, under Xi, the government has established online ‘social credit’ databases, which suggests that it could eventually roll out a single score for all Chinese citizens, comprising credit ratings, online behaviour, health records, expressions of party loyalty and other information.

The beauty of a big-data dictatorship is that it could sustain itself less through direct threats and punishment as a public spectacle, and more through ‘nudges’ to manipulate people’s perspectives and behaviour. And the more time Chinese citizens spend online, the more the government will be able to control what they see and do there.

Digital technologies will also allow the government to respond more quickly to public discontent, or to head it off altogether if it can discern or predict changes in public opinion. Given that many dictatorships collapse as a result of poor information, digital technologies could become an even more powerful prophylactic against bad decision-making than term limits.

If there is one thing that political scientists, economists and technologists can all agree on, it is that Xi is building the most powerful and intrusive surveillance regime in history. It remains to be seen if his approach to ‘making China great again’ will strengthen his hand or turn out to be a fatal weakness. But with China playing an ever-larger role in the global economy through its investments and infrastructure projects, the reverberations from what happens there will be felt everywhere, and for years to come. In a sense, Xi might just end up being a ‘chairman of everything forever’ after all.

The illusion of freedom in the digital age

Over the last few weeks, media around the world have been saturated with stories about how technology is destroying politics. In autocracies like China, the fear is of ultra-empowered Big Brother states, like that in George Orwell’s 1984. In democracies like the United States, the concern is that tech companies will continue to exacerbate political and social polarisation by facilitating the spread of disinformation and creating ideological ‘filter bubbles’, leading to something resembling Aldous Huxley’s Brave New World.

In fact, by bringing about a convergence between democracy and dictatorship, new technologies render both of these dystopian visions impossible. But that doesn’t mean that there is nothing to fear.

Much of the coverage of the 19th National Congress of the Communist Party of China (CPC) focused on President Xi Jinping’s consolidation of power. He is, observers warn, creating an information-age dictatorship, in which the technologies that were once expected to bring freedom to China’s 1.4 billion citizens have instead enabled him to entrench his own authority. By providing the government with highly detailed information on the needs, feelings and aspirations of ordinary Chinese, the internet allows China’s leaders to preempt discontent. In other words, they now use big data, rather than brute force, to ensure stability.

And the data are big indeed. More than 170 million face-recognition surveillance cameras track every step citizens make. An artificial-intelligence-enhanced security system can spot criminal suspects as they cycle beside a lake or purchase dumplings from a street vendor, and immediately alert the police. Data surveillance cameras feed into China’s ‘social credit’ data bank, where the regime compiles thick files on its people’s creditworthiness, consumption patterns and overall reliability.

The CPC is also using technology to manage its own ranks, having developed dozens of apps to communicate with party members. Meanwhile, it blocks some of the empowering features of technology: by forcing all tech companies to have their servers within China, it effectively ‘in-sources’ censorship.

The impact of technology on American politics has been even more visible, but it is analysed in terms of the market, rather than the state. Among the most eye-catching stories has been the role that ‘fake news’ played in shaping last year’s presidential election. Facebook has admitted that 126 million Americans might have seen fake news during the campaign.

More recently, Special Counsel Robert Mueller, who is conducting an investigation into whether US President Donald Trump’s campaign colluded with Russia’s interference in the 2016 election, charged one-time campaign chairman Paul Manafort with 12 counts—including ‘conspiracy against the United States’—for his actions prior to the campaign. A foreign policy adviser to the Trump campaign, George Papadopoulos, was also indicted for lying to the FBI about meetings with individuals closely associated with the Russian government during the campaign, though he has already pleaded guilty and has been cooperating with investigators since the summer.

But beyond such bombshell developments is a broader anxiety about the ability of tech companies to control the information people receive. With big tech’s secret algorithms determining how we perceive the world, it is becoming increasingly difficult for people to make conscious decisions—what philosophers perceive as the basic dimension of free will.

Big tech companies, worth more than some countries’ GDP, seek to maximise profits, not social welfare. Yet, at a time when attention is supplanting money as the most valuable commodity, the impact of their decisions is far-reaching. James Williams, a Google engineer turned academic, argues that the digital age has unleashed fierce competition for our attention, and few have benefited more than Trump, who is for the internet what Ronald Reagan was for television.

At the same time, the impact of technology on politics is relatively independent of regime type. Technology is blurring the comforting distinction between open and closed societies, and between planned and free economies, ultimately making it impossible for either to exist in its ideal form.

By revealing the US National Security Administration’s massive government surveillance, Edward Snowden made clear that the state’s desire for omniscience is not limited to China. On the contrary, it is central to the idea of national security in the US.

In China, things are moving in the opposite direction. To be sure, the Chinese government is pressuring the biggest tech companies to give it a direct role in corporate decision-making—and direct access to their data. At the same time, however, the internet is changing the nature of Chinese politics and the Chinese economy, pushing both to become more responsive to consumer needs.

For example, a friend who worked for the search engine Baidu explained to me how the company tries to enhance the consumer experience of censorship, testing the ways in which people prefer to be censored. Jack Ma of tech giant Alibaba thinks that China can use big data to design perfectly calibrated state interventions that enable it to outperform free-market economies. In the coming decades, Ma believes, ‘the planned economy will get bigger and bigger’.

In the digital age, the biggest danger is not that technology will put free and autocratic societies increasingly at odds with one another. It is that the worst fears of both Orwell and Huxley will become manifest in both types of system, creating a different kind of dystopia. With many of their deepest desires being met, citizens will have the illusion of freedom and empowerment. In reality, their lives, the information they consume, and the choices they make will be determined by algorithms and platforms controlled by unaccountable corporate or government elites.

The future of intelligence analysis: computers versus the human brain?

Last month in The Strategist, Mark Gilchrist put down a wager that computers will ‘be unable to provide any greater certainty than a team of well-trained and experienced analysts who understand the true difficulty of creating order from chaos’. While I commend Mark’s bravery in predicting the future with such certainty, I suspect that, in time, he’ll lose his money. I’d also argue that his zero-sum perspective sets an impossible standard for human analysts and algorithms—whether basic or self-learning. Reducing the intelligence problem down to ‘making sense of war’s inherent unpredictability’ doesn’t do this field of endeavour any justice.

Discussing ‘intelligence’ theory and practice is made all the more difficult by the absence of any universally accepted definition. Nevertheless, talking about intelligence processes and outputs without referring to any intelligence theory leads to inherently inaccurate assumptions—a point Rod Lyon and I made last year in separate Strategist posts.

I’m firmly in Mark’s camp when it comes to the importance of qualitative analysis and the analytical ability of intelligence professionals to make assessments with incomplete datasets. But to do that work, intelligence analysts must have a clear understanding of the epistemological construction for their analysis: they must know what it means to know. Good intelligence tradecraft involves employing a range of analytical techniques to ensure that the validity and reliability of different assessments and explanations are tested.

Mark’s argument against the value of big data analytics and artificial intelligence doesn’t engage with the reality that the role of intelligence is to reduce uncertainty. Seldom is intelligence—be it secret (in the sense that it’s not publicly available) or otherwise—able to offer a decision-maker complete certainty. Rather, an intelligence assessment should be considered as an evidence-based hypothesis accompanied by an associated estimate of its probability: something is probable, likely, possible or unlikely, for example.

In practice, though, the intelligence problem can be categorised as either a puzzle or a mystery.

An intelligence puzzle is a problem you can solve if you’re able to collect and collate enough information. For example, with enough information, you could locate a certain Australian citizen fighting in Syria. The challenge for the intelligence manager is that analysts now have access to an unprecedented quantity and variety of raw data. Sifting through that deluge of data in the required timeframes is now, more often than not, beyond the capacity of a single intelligence professional. With an increasing number of analysts collating data, the task of joining the dots between disparate data points is ever more difficult. Unsurprisingly, increasing the number of data collators may not result in any tangible improvement in output or outcome. Fortunately, that is the exact problem for which AI and big data analytics are best suited.

In contrast to puzzles, intelligence mysteries can’t be solved by gathering more information. Getting to the bottom of a mystery requires analysis and judgement. As highlighted by Rod Lyon, that is the realm of the subject-matter expert. It’s there that intelligence professionals earn their money, and big data analytics may help by supporting hypothesis development and testing.

Mark is again onto something when he argues that the qualitative intelligence analyst is an important component of intelligence capability—and will remain so. However, in an operating context, where decision-makers and operational staff have direct access to classified single-source reporting, the intelligence profession must adapt. In the age of the data deluge, intelligence managers will need to make greater use of quantitative and qualitative analytical capabilities, especially those of subject-matter experts, from economists to data scientists.

In responding to these challenges, intelligence professionals, whether they’re in the military or in law enforcement, can ill afford to view structural and organisational barriers to innovation as reasons not to try.

In his recent report, Michael Chi made it abundantly clear that there is indeed a lot of hype around big data analytics. But underneath the hype, this growing field of science offers a whole new world of capabilities and possibilities. Yes, we’ll need to develop new human resource capabilities to exploit that potential. And yes, we’ll need to find ways to fuse or integrate discreet information architecture. But neither problem is insurmountable.

The next generation of big data analytic and AI capabilities will not solve the challenge of war. But it’s not meant to. Its raison d’etre is to reduce the uncertainty of decision-making by solving puzzles and exploring mysteries. Whether developed by AI or intelligence professionals, intelligence brings no guarantees and must be weighed on a scale of probabilities.

Big data analytics and AI have already proven their worth as tools for the intelligence fraternity. Arguments to the contrary are reminiscent of the pointless debates between operational staff over the superiority of sigint or humint as a collection discipline; they are mutually reinforcing.

Big Data: the devil’s in the detail

Image courtesy of Pixabay user PeteLinforth.

As the government’s review of the Australian Intelligence Community (AIC) picks up steam, one of the key challenges is to identify and resolve growing gaps in the AIC’s technological capabilities. One such capability is the collection and use of Big Data.

The generally accepted definition of big data casts it as a “problem”, because it’s characterised by its extreme volume, velocity, and variety, which makes collection, management and analysis rather challenging. The problem stems from a ‘data deluge’ of social media posts, photos, videos, purchases, clicks and a burgeoning wave of sensor data from smarter and interconnected appliances and accessories, known as the ‘Internet of Things’. Those sources generated a staggering 4.4 trillion gigabytes of data in 2013, but that figure is forecasted to reach 44 trillion gigabytes of data by 2020, which threatens to overwhelm conventional methods for storing and analysing data.

In response to the problem of big data is the “promise” of big data analytics. Analytics promises to not only manage the data deluge, but also to analyse the data using algorithms to uncover hidden correlations, patterns and links of potential analytical value. Techniques to extract those insights fall under various names: ‘data mining’, ‘data analytics’, ‘data science’, and ‘machine learning’, among others. That work is expected to yield new insights into a range of puzzles from tracking financial fraud to detecting cybersecurity incidents through the power of parallel processing hardware, distributed software, new analytics tools and a talented workforce of multidisciplinary data scientists.

However, in order to keep the big data “promise”, the AIC review needs to address the following challenges:

  1. There need to be mechanisms to ensure that data is manageable in terms of the definitional iron triangle of volume (size), velocity (of data flow), and variety (of data types and formats). While these are challenging enough in the big data context, the data analyst must consider the veracity of the datasets they select: the data’s representativeness, reliability, and accuracy. The analyst must also consider how the data will generate insights of value to the end user.
  2. The current framework for privacy protection, based on the idea of ‘notice and consent’, needs updating. Currently, the burden is placed on an individual to make an informed decision about the risks of sharing their information. This is complicated in the context of big data, as this informed decision is, in the words of former President Obama’s council of advisors, ‘defeated by exactly the positive benefits that big data enables: new, non-obvious, unexpectedly powerful uses of data’.
  3. Data-based decisions need to be comprehensible and explicable by maintaining the transparency and ‘interpretability’. That will become more challenging over time, as explaining the processes and reasoning behind a machine learning algorithm isn’t a simple task, especially as those algorithms become more complex, merge into composite ensembles, and utilise correlative and statistical reasoning.
  4. Regardless of whether data analytics decisions are explicable, a fourth challenge is whether the public will accept algorithmic decisions. The ongoing debate about machine learning in autonomous vehicles demonstrates these concerns. As decisions of increasing importance are informed by algorithms, this challenge of ‘Algorithm Aversion’ will intensify.
  5. Big data and analytics security needs to be ensured. That isn’t just limited to the traditional problem of the ‘honeypot’ allure of big datasets. It also relates to the distributed, parallel, and diverse architectural challenges of big data computing. It also relates to the security of the analytics themselves, with research into ‘adversarial examples’ showing that small changes to inputs can result in serious misclassification errors in machine learning algorithms.
  6. Big data has both benefited and suffered from fever-pitch hype. According to Gartner’s Hype Cycle, mainstream attention to big data began in 2011, peaked in 2013, and was removed from the hype cycle in 2015, with the explanation that big data is now ‘normal’ and several of its aspects are ‘no longer exotic. They’re common’. While the idea that big data is the ‘new normal’ is reaching a growing consensus, it’s important to note that some of the early big data promises no longer hold in reality, and the distinction between hype and reality needs to be clear to policymakers before effective big data policy can be developed.

Over the coming months, ASPI, with support from CSC Australia, will undertake an analysis of Big Data in National Security to further explore the policy issues and challenges outlined in this piece, and to stimulate policy discussions around the issue.

National security and civil liberties

We need to be able to keep an eye on the spooks.

It’s always good to see debate about civil liberties and security legislation here on The Strategist. With a number of changes to security legislation in the wind, it was appropriate that my colleagues Toby Feakin and Anthony Bergin took the issue up recently.

Anthony’s right that there’ll always be an acceptable level of government intrusiveness on individual privacy in the name of safety and security. For example, I don’t think anyone believes that police curbing dangerous driving on public roads is an unreasonable infringement of civil liberty. But when the nexus between government activity and public safety isn’t so obvious—as with metadata retention—or when government activity is secret, it’s reasonable to demand a higher level of justification, and a robust mechanism for accountability. Toby’s right; it’s not a simple matter of balancing security against privacy.

Neither of my colleagues addressed the core question of oversight. History shows us that, left to their own devices, intelligence and security agencies come up with all sorts of unhelpful notions. Despite various mechanisms, including constitutional protections, being in place to protect American’s privacy, that country’s NSA has overstepped the mark more than once (most recently in their offhand and probably unconstitutional treatment of the Foreign Intelligence Surveillance Court). And even the most ardent supporters of secretive state security should be troubled by recent revelations that CIA employees penetrated the computer systems of Congressional Committees investigating allegations of torture. Read more

Intelligence, privacy and cyber security

What are we to do with Big Data? Edward Snowden has kick-started a public debate about the legitimate scope of intelligence in a world of digital interconnectivity, and last week Klee Aiken asked us to consider the effectiveness of intrusive surveillance programs which analyse bulk information. There’s a robust debate to be had about the nature of modern privacy. But precisely because it’s so controversial on legal grounds, Aiken maintains that the indiscriminate collection of telephone and internet data by the NSA needs to prove its value as we ponder what to reform. Along with some of the US politicians who oversee the system, he’s pessimistic that the game is worth the candle.

In response, Anthony Bergin drew attention to the shifting operational template of terrorism. Echoing a point made by IISS’s Nigel Inkster , Bergin suggested that metadata techniques will be needed to trace the development of home-grown radicals, who have become more of a threat in recent years. The pattern of future attacks are likely to be dispersed and irregular, receiving inspiration from al Qaeda through the very channels that prove most susceptible to signals intelligence. Read more