Social Evolution Forum
FIND sef:
Seshat: What Studying War Is Good for?

 Seshat: The Global History Databank is a game-changing database construction project that may reveal there is actually one thing war is good for: historical statistics.

By publishing free-to-public data on historical societies, Seshat will soon become the world’s largest professional historical database.

The massive research effort has been far too much for any individual or a small team to work on alone. It is a collaborative project bringing together the expertise of historians, anthropologists, economists, and archaeologists from universities around the world.

The hope is that social science models, new, long-held, speculative and dogmatic alike will in future be submitted to the test of historical data: to let Seshat decide which ones are right and which are wrong.

Sign up for our newsletters

I wish to receive updates from:

With the amount of data accessible to anyone, Seshat will also change the way historians write books and revolutionize the teaching of history.

To date, progress has been made on over 100 polities. These include the period of the Roman Empire and Ancient Egypt. There are hundreds of polities still to code.

The scale is huge. For each polity Seshat has over 681 variables (and counting) on government, religion, economy and society. Seshat will also make available information on resources and agricultural productivity for natural geographic areas.

Due to the size and ambition of the database, however, there is one more area on the list for development: ancient warfare.

Warfare is a human behavior at the group-level which as discussed on SEF may have played a major role in the development of group-level cooperation. Can Seshat, in addition to everything else, also provide researchers with a database to test theories relating to warfare?

Why warfare?

As Peter Turchin put the argument: The logic is very simple. Groups of people who can’t cooperate to put together an army, will be overrun by those who can. The result is that genetic and cultural traits for noncooperation will go extinct.

Is this true, or not? Can we test the claim that societies that possess universal norms (such as equality or charity) and large-scale ultrasocial state institutions (like bureaucracies, health and education systems) exist because over the last 3000 years ancestral societies used war to eliminate societies that did not possess their prototypes and antecedents?

With enough data on warfare – who fought whom, where, when, what were the consequences? – many evolutionary models, and non-evolutionary models, can be tested, and improved.

A Seshat contribution to warfare statistics would end where Correlates of War (1816-present) database has already begun with data up to about the late 18th century.

Whilst data collection for COW was begun by the political scientist J. David Singer in 1967 the field of ancient warfare statistics has no comparable digital database. It is stuck at the stage where primary and secondary historical sources have been collected together into large compendia.

One example of the genre, which was first published in 1967, Eggenberger’s “An Encyclopedia of Battles” contains descriptions for 1,560 battles. Since then the more recent multi-volume tomes have begun to creak the bookshelves. Is the time now right for a digital approach?

While the analogue paper literature is very good, and provides superb “highlights programming”, it is not accessible for statistical analysis, and there does not yet exist even a reasonable presentation of the “full game”: all the wars, battles and sieges that have been recorded in history.

Proving this last point, mostly using the remarkable resource that is Wikipedia, in an experimental effort I coded over 1,800 battles and sieges only from the Roman era, through the Byzantine Empire, to Ottoman Empire up to 1700 CE.

Lots of the battles and sieges did not have their own webpage (many were under the name of a famous general or ruler). However, the exercise suggested all battles and sieges recorded in history could amount to something staggering like 100,000-150,000 – a magnitude beyond the current compendiums.

Wikipedia will never get all available historical data on warfare and nor will Seshat. The historical record is not detailed enough. At some point you have to say that is all the data you can get and leave it to the analysts to figure out how to account for record bias.

The take home point is there is a lot more data out there than is being put into war compendia. Seshat will do a more complete job putting it all together and, unlike Wikipedia and the paper volumes, it will be an accessible, expert checked, open-for-all resource designed by and for researchers.

(Some improvised graphics above and below (with help of shows what could be done with a large amount of data-points).

What variables are covered?

A Seshat warfare database would separately code warfare variables, and variables for the battles and sieges that make up the wars.

Battle and siege pages would record the most obvious information about the belligerents and consequences of interest. The siege page would also identify the besieger who initiated the siege of a city. Raids on cities would be coded with a siege template (but not as a siege).

A “war” could be considered a polity versus polity violent conflict (e.g. Roman-Sassanid, Roman-Parthian) or a collection of military events between polities (First Punic, Second Punic etc.). In the latter case, the war is nested within a meta-conflict (such as between Rome and Carthage).

We propose variables that distinguish between types of war and the cultural distance between the belligerents.

What are the challenges of coding warfare variables on Seshat?

Accuracy is the main challenge. This is the distinguishing feature of a professional academic database. All codes have acceptable references and are checked by experts in the field.

While accuracy is of central importance the first thing the coder and experts may notice in their historical sources is the absence of accuracy.

A result of the exaggeration, bias, and the inconsistent and contradictory ways in which witnesses, chroniclers and writers have reported facts about historical events modern estimates must be used. This is again why the role of the expert is important.

The database must also be constructed with the accuracy challenge in mind. Let’s take one example. In reference to army losses in battle a source might say the army was “completely routed.”

Usually this phrase is used as part of a battle description where claimed losses ranged from 25-50%. However, a coder cannot assume every author means “completely routed” in the same percentage.

This does not make data collection impossible. Seshat would provide space for written explanation and allow the coder to input values in the form of a range, or even in the form of a disagreement between experts.

Some of the most inconsistent data might still be used for derivative variables: how about one for reported army size based on the logarithmic magnitude of total participants at the battle? A useful statistic based on the total size of fielded armies would be one that does not require the historical data to be exceptionally accurate (because it isn’t).


Join the discussion


  1. O.Voron says:

    Very interesting and ambitious project you are involved with, Edward. Have not military historians at military academies such as West Point already constructed these databases? Apparently the have not. Or they are top secret 🙂

    • Edward A L Turner says:

      databases have been created for the most recent historical conflicts. the only really big database I know about is the University of Michigan COW project which starts in 1816. link to here is another warfare database from Uppsala University that starts in 1946. link to

      we start our database 3000 years earlier. 😀 by coding conflicts at two levels – that of the war and separate battles/sieges that make up a war – our warfare coding will be flexible enough to provide interesting data whether the record of battles and sieges is almost complete or very sparse.

      so if the historical record does not provide battles and siege data we are still able to code something about the war: belligerents and alliances, war type, cultural distance, consequences for territories etc.

  2. Peter Turchin says:

    Thank you, Edward, for this post!

    For the readers of this blog: Edward is the Principal Research Assistant for Seshat. He started as a volunteer several years ago (and participated in our article that came out in PNAS last Fall). Now that we have been funded by several new grants, Edward has become a salaried employee of the Seshat project. He is responsible for social complexity and warfare variables. But, as he writes in the post, we are still developing our conceptual approaches to coding warfare. Edward has done some preliminary warfare coding of several polities, but until now this aspect of Seshat has been on a back burner. I expect that we will be going all out on coding warfare starting in the Fall, and abut a year from now we will be able to start analyzing the data – and killing off those hypotheses…

  3. Ross David H says:

    Sounds fascinating (and mammoth). For those of us with an IT background/interests, can you tell us anything about how the data will be stored? I’m not sure, when you use the term “database”, if you mean what an IT guy would mean when he says that (MySQL or MongoDB or Oracle or some such), or if you mean the more general sense of “many files stored together that relate to the same general topic”.

    Or, for those who wish to access it someday, what kind of API’s will be available? Will they be made available to those who are not professionals in the field?

    Thanks for sharing your progress, it sounds exciting! The Human Genome Project of historians!