During wars and uprisings it is often very unclear how many victims there are. The Human Rights Data Analysis Group deploys statistical techniques that are also used for counting shy species to drastically reduce the margins of uncertainty. That is important for political decisions, and for justice afterwards – if that ever happens.
From March to June 1999, between nine and twelve thousand civilians and insurgents were killed in a terror campaign by the Serbian army against the Albanian population of Kosovo. At least that's what a 2002 report concludes, even though there were only 4,400 documented deaths. The report, prepared by the Human Rights Data Analysis Group (HRDAG), has therefore aroused skepticism and criticism. Where do you get a sloppy five thousand unnoticed deaths?
To intervene or not to intervene
Nevertheless, their estimate was confirmed by two surveys in 2010, when peace was restored and extensive site investigations became possible. HRDAG is a United States-based organization whose core mission is to maintain best records of violent conflicts. This is often important in decisions by the international community to intervene or not. And of course also afterwards, to designate those responsible and try them if possible.
HRDAG does not send observers to conflict areas itself, but uses as many existing sources as possible and jointly derives an overall picture from all data. It is certainly not just a matter of tallying victims. See, for example, the Syrian civil war:all kinds of aid organizations and individuals report about the very chaotic battle. The same victims are sometimes reported by multiple sources. No one reveals the deaths of other victims.
Completely wrong picture
Megan Price, researcher at HRDAG, gave a talk on this work at the third Heidelberg Laureate Forum. This is the annual meeting, in August, of the winners of the most important mathematics and computer science awards. “If you just add up the numbers of casualties that various parties report, you can get a completely wrong picture of a conflict.” That is of much more than mere accounting importance. For example, when news media, by naive counting, report that a conflict flares up after another party has joined the fray. Or they convey the message that a new development has led to a significant decrease in the weekly number of victims. This can have a major influence on the decision by Western countries to intervene. Or to support one of the parties with weapons or money.
There are mathematical techniques to extract reasonably reliable estimates from such messy data:the so-called Multiple Systems Estimation (MSE). An in principle simple, but very labour-intensive cleaning of the data involves de-duplication:ensuring that all reports of deaths can be traced back to unique individuals. Of course you don't want a database in which some people under different names have been killed more than once. Until January 2014, HRDAG had counted 260,000 reports of a casualty in the Syrian conflict, but after de-duplication, 93,000 remained. Those are the documented cases. What can you say about the actual number of casualties?
MSE uses mathematical techniques that were once developed to estimate how many specimens of an animal species, for example hares, still live in an area. The basic principle is capture-recapture. You first catch, as much as possible at random, about a hundred hares and give them a mark. A few weeks later you come back and grab another hundred or so hares. If the population isn't too big, the second time you've caught a few hares that have already been tagged. The relative size of this overlap between the two samples provides an estimate for the size of the total population (see the illustration caption above for explanation with formulas). Obviously this is an estimate with an uncertainty margin, the size of which also depends on A, B and M.
This is also the way in which you can count the many nameless victims of a chaotic violent conflict. Such methods have previously been used to estimate the number of HIV-infected people in a taboo population and even the number of lesbians in a very conservative region of the United States. Practice has also shown that MSE works and provides reliable estimates with a reasonable margin of uncertainty.
Overlapping samples
Translated into a war such as in Syria, every organization that collects data on victims on site over a certain period of time is a sample. If multiple sources independently report the same victim, this is an overlap between two or more samples. If you take more than two samples from a population, you can apply more advanced techniques to also take into account that some samples may not be independent of each other. For example, HRDAG has been able to map the forgotten deaths of the conflict in Kosovo and many others.
During her talk in Heidelberg, Price gave the example of a phase during the civil war in Syria when media reported that the intensity of the fighting had abated. This was based on simply counting reported casualties. HRDAG's analysis showed that the overlap between the various reports had decreased. The calculation showed that the battle had just flared up. Price:“If you rely on media reports, you can draw completely wrong conclusions about such conflicts.”
Unfortunately, analyzes such as HRDAG do take time. Time that the news media will not allow themselves, Price understands that too. Yet she argues that the media should limit themselves more to what they are good at:reporting personal stories, giving the victims a face. And all kinds of statistics and infographics of armed conflicts that are put together in one day at editorial offices, readers and viewers should take them with a large grain of salt.