Sarah Palin does the Mediterranean!

Professor Cyprian Broodbank of the Institute of Archaeology of University College London has written a book called The Making of the Middle Sea: A History of the Mediterranean from the Beginning to the Emergence of the Classical World.  It is published by Oxford University Press who should be ashamed of themselves because this work is uniquely and splendidly terrible.  I have both read and written my share of graceless and flat academic prose.  If ‘academicese’ was all that was going on here I wouldn’t complain but this author’s prose style is incoherent, baroque, and dyslexic.  The book cannot be read because it is unreadable.

Now it’s not a crime to write poorly.  Lots of academics do write badly but we can’t let that be a barrier to their writing books.  After all, if they’re not going to write up their research or their pensées then who’s going to?  But in former years publishing houses gave effective editorial help to their authors.  Unclear pronomial referents, run-on sentences, mismatched verbal numbers, baroque and captious stylings  …  all such things were trimmed away by editorial fiat with the reader being none the wiser.   In this respect either Oxford University Press or Thames and Hudson (and that ‘best of editors’, Colin Ridler[1]) have done Professor Broodbank a gross disservice.  No actual editor could possibly read the following sentences and think that it was ready for publication.  Listen:

1. Such ways of life based on the selective uptake of domesticates alongside wild foods only enjoyed only a brief, very patchy currency on the northern side of the Mediterranean, and a fairly marginal if longer-lasting status in the Levant.

 

2. In much of Mediterranean Africa, however, they became the norm, primarily thanks to the wider orientation on pastoral practices, decoupled from crop cultivation, that had arisen in the Saharan heart of North Africa.

 

3. As we shall see in future chapters, this lifestyle survived for millennia along Africa’s Mediterranean fringe, even as that heart was ripped out by the return of the desert.”[2]

I’ve numbered these consecutive sentences for easier reference.

In sentence 1: ‘only enjoyed only a brief, very patchy …’  Only..only?  No editor ever read that sentence.

In  sentence 2 what does ‘they’ refer to?  The phrase ‘wider orientation on pastoral practices’ – what does that mean?  It looks as though Broodbank may have meant to say ‘wider orientation OF pastoral practices’ although that doesn’t really make sense either.  We will never know what he intended because no editor cared enough to figure it out.  Notice, too, the mismatch of nominal numbers between sentences 1 and 3.  In sentence 1 the plural ‘ways of life’ has become the singular ‘this lifestyle’ (jargon police!) in sentence 3.

But the grand prize, the pièce de résistance, the big enchilada, the actual SONG OF THE FAT LADY HERSELF is in sentence 3.

‘even as that heart was ripped out by the return of the desert’.

Wait a minute!!!!  WHOSE heart?!?!?  What’s going on here?!?!  An actual editor paid by an actual corporation allowed that sentence to get away?

This book is verbal salad.  It is what Sarah Palin would write if she took up Mediterranean Studies.  It is NOT academic writing but a burlesque of it.

This example was chosen entirely at random; there are hundreds of pages of this syntactical goo.

This book was never read by anyone.  I emphasize that.  This book was never looked at by an editor.  And I haven’t read it either.  I admit that up front.  Life is too short to read such graceless, distorted, and dyslexic prose.  But the culprit here is not Broodbank.  The blame attaches entirely to his editors who abandoned their posts  and let Broodbank go down with his ship.  SpellCheck is NOT editing.

The way this book is presented (a beautiful volume with many lovely and informative pictures) combined with the way that it was rushed into print makes me think that before long we will be seeing promos for a TV series “Mediterranean: Sea of Destiny!” presented by Professor Cyprian Broodbank.  Could happen.  Professor Broodbank is a nice looking young man with wonderful credentials and who would look good on TV.  We need more historians on TV and, if this is going to be such a series, I wish him well.  At least as a TV presenter he’ll get better editorial support than he got from Oxford University Press or Thames and Hudson.

But all is not lost with the book either.  I’m looking forward to the English translation.

[1] Cyprian Broodbank, The Making of the Middle Sea: A History of the Mediterranean from the Beginning to the Emergence of the Classical World.  Oxford University Press, 2013.  6.

[2] Ibid.  210.

UPenn Museum Records. A Table-Driven Parser for the Date Records.

The University of Pennsylvania Museum of Archaeology and Anthropology has made data about its extensive collections available to the general public.  This is a very important collection: the Mediterranean collection alone describes more than twenty-six thousand objects.  I’ve barely begun even to look at them to find out what they are.

You can download the comma-separated version of the Mediterranean collection but … it’s not really comma-separated is it.  So I downloaded the .xml version of it and  had a look.  Here’s a typical record:

   

The object is bounded by tags and within that record are such things as <cultures>, <date_made>,<emuIRN> (which I take to be the museum distinct identifier for the object), a description of iconography, <materials>,<object_name>,<proveniences>.  It even has a nice tag to identify the URL of the resource.
This is pretty complete stuff.  I had a look at the description lines for keywords and identified something like twelve-thousand distinct keyword possibilities.  Obviously I put that aside until later and I began to analyze this treasure trove with a parser just optimized for converting dates.  Some typical dates are as follows:

-200-100 BC
 -350
 -45
 0 AD – 199 AD
 0 AD-50 AD
 0-100 AD
 0-1936
 0-199
 0-199 ad
 0-299 AD
 0-399 AD
 0-400 AD
 0-50 AD
 0-79 AD
 0-99
 0-99 AD
 0AD-50AD

Not too bad.  Maybe parsable but this <date_made> record supports any free-form statement of dates.  This amounts to natural language parsing because we also find entries like this:

8th cent. BC
 8th century – 7th century BC
 8th century BC
 8th century BC – 7th century BC
 8th century BC-7th century BC

Or this…

late 6th century BC
 late 7th and 1st half of 6th centuries BC
 late 7th century BC
 late 8th century BC
 late 8th to late 6th centuries BC
 Late Hellenistic
 Late Minoan I
 Late Minoan Period
 Late Roman
 late1st century BC – early 1 st century AD
 latter half of 4 c. AD
 LC IIIB
 mid 1st c. AD
 Mid 1st century A.D.
 mid-3rd c. BC?
 no later than Archaic period?
 POST 133BC
 POST 141 AD
 POST 168 BC
 POST 168BC

Youch!  That stuff’s never going to parse.  But I live my life by playing the percentages and so I figured I could write a parser that would get most of it and the rest would jam – possibly to be fixed by hand.  My strategy was to write a parser that would take in all the Penn records and just write them out again without a change – this would amount to a pre-pass over the data.  Except in the case of the <date_made>, <date_made_early>, and <date_made_late> records.  Those would be written out again unchanged but each one would be followed by a new <SQUINCHDATE></SQUINCHDATE> record that would have cleaned up the dates and identified the century as well as the year, if given in the original.  The <SQUINCHDATE> records would consist of two parts – a single year based on its preceding record and then the century to which it belonged.  The year and the century would be separated by an ‘;’ (semicolon) character.  If there were two years, a terminus post quem and a terminus ante quem, they would be separated by a comma and the two resulting centuries for those years also would be separated by a comma.  If, for example, the input Penn date record is

<date_made>101 – 98 BC</date_made>

I would write out these two records:

<date_made>101 – 98 BC</date_made>
<SQUINCHDATE>-101,-98;-2,-1</SQUINCHDATE>

…and also these.

<date_made_early>101 BC</date_made_early>
<SQUINCHDATE>-101;-2</SQUINCHDATE>

<date_made_late>98 BC</date_made_late>
<SQUINCHDATE>-98,-1</SQUINCHDATE>

But if the record couldn’t be parsed then I issue a ‘jam’ record like so:

<date_made>early Mycenaean</date_made>
<SQUINCHDATE>JAM</SQUINCHDATE>

If a parser fails (as this parser is going to fail some of the time) then it’s important to be clear about it.
This parser is not like the other parsers that I’ve created in previous posts.  This is an example of a table-driven parser.  To create it I set down a grammar of the possible input tokens that reflected some of the many alternative inputs.  Here are a few examples:

BC <yyyy> – <yyyy>
<yyyy> – <yyyy> BC
BC <yyyy> – <yyyy> AD
BC <yyyy> – AD <yyyy>

Etc.  You get the idea.  Its semantic actions consisted of just outputting the <SQUINCHDATE> records.  You  can find the parser (it’s a PHP function) in Google Drive here.  I put the first 1000 lines of output of the transformed UPenn records also on Google Drive.

You should know that, because of the free-form nature of most of their data input, the UPenn records are going to be hard to work with unless dedicated people are willing to standardize some of this input by hand.  This parser isn’t finished; its sole objective was to get the number of JAM records down to a minimum.  You can judge for yourselves how well I succeeded.  (The very first date record, ‘Cypriot’, jams.)  I did a little more analysis on this and, in the first 1000 UPenn records, 4.7% jam.  So the parsable rate is 95%+.

Next time I’ll present the main parser for the UPenn records (the program which calls my century function) and I’ll try to address the question of generating meaningful lat/lon pairs from their data.

Welcome to Carole Raddato!

Carole Raddato (@carolemadge on Flickr), indefatigable traveler and photographer, has been an inspiration to us since we first ran across her Mediterranean photographs a year ago.  ‘If only’, we thought, ‘Carole would allow her photos to be hosted on Squinchpix..’  Fat chance, huh?  Well, now we’re delighted to announce that Carole has given us permission to feature some of her photographs.  The first photos went up yesterday (search for ‘carolemadge’ on Squinchpix) and there will be more soon.  The first that we’re featuring are from the Museo Ostiense which has a priceless collection of sculpture both from Ostia itself and from the necropolis in  the Isola Sacra (across the river Tiber from Ostia).  In the next few weeks we’ll also be posting an interview with Carole.
Amor and Psyche.  From the domus of Amor and Psyche.
Ostia (I, xiv, 5). Lazio, Rome.
Photo by Carole Raddato, all rights reserved.
Welcome to Carole!
By the way, those of you who post your photographs on Flickr and  disable the download option .. be aware that this is trivially defeated and anyone can actually copy your photos if they want to.  No, I’m not referring to the print screen key.  If you use the right browser you can copy any Flickr photo in a few seconds no matter what its protection is set to.  I debated with myself about showing you exactly how this is done but I decided against it.  It’s not hard to figure out (hint: Chrome).   A modification of the same technique works on SmugMug.  And don’t get the idea that content scrapers aren’t exploiting this hole.  They are.  Anyway this is for those of you who assume that your photos are secure on photo-sharing sites.
In the near future I’ll post about security threats and what we do about these things at SquinchPix.