PDFs in Academic Publishing

Today Erika Hall tweeted:

_

And it is awful that a paper about professional practice seems to ignore what we in industry would see as good professional practice.

But.

Colusso, Bennett, Hseih and Munson are following best practice (or at least standard practice) in their profession. Specifically, they’re HCI academics so the main and most prestigious publishing outlet is the Association for Computing Machinery’s (ACM) set of HCI-related conferences. It’s safe to say that the ACM is… idiosyncratic… when it comes to publishing.

Before I escaped to industry, I was an academic, in roughly the same area as Colusso and colleagues, for 10 years and a PhD student for four years before that. It’s safe to say that Academic publishing is a total garbage fire.

This post is not to excuse the poorly formatted PDF or the difficult language, but to outline how a PDF similar to the one produced by Colusso et al probably made its way into the world.

The production of the PDF and its appearance on the web is the result of satisficing a workflow for several discrete steps:

  1. collaboration among co-authors
  2. production of text with suitable references to pre-existing sources
  3. peer review
  4. print production for consumption by academics

Yes, this seems backward and is clearly stuck in 1999.

The first draft was probably written in whatever text editor the main author prefers. It could have been Pages, Word, Google Docs or something that is more markdown-y like Ulysses or Bear. Eventually the main author would need to share that their co-authors use and understand so they could provide input, either substantially or as supervisory comments. This would mean exporting the draft into whatever editing software all four authors could access and use fluently — probably Word — because for all its faults Word still has the most widely understood change tracking and editorial mark-up system of most mainstream text-production software.

At some point a draft would be produced that had all 52 references in it, correctly formatted, and appropriately referred to throughout the text. This is enormously frustrating to do by hand, so the main author would have used a tool such as Endnote, Zotero, Mendeley or any number of other reference management software packages. These tools can talk to markdown-y text editing software but in practice, you drive these tools as plug-ins to Word. Getting the referencing wrong can be grounds for rejection at peer review, so it’s imperative this is done correctly.

After the referenced draft was produced, the Word file would have been passed around the co-authors again for a final round of contributions and edits.

At some point, likely as late a possible, the main author would have put the text into the ACM template. The ACM template has been around more-or-less forever. Everyone hates the ACM template. It’s only available in Word or LaTeX format (don’t laugh). After checking that none of the meaningful formatting is broken, the main author would have created a PDF of the paper, and uploaded it for peer review.

Peer review is mostly a garbage fire, but ignoring that, it’s also partly responsible for why academic papers look and read the way they do. Peer review for the venues that many HCI-ish papers get submitted to is nasty, brutish and short. The reviewers, who are fellow academics, are over-worked, pressed for time, and completely uncompensated. Standardisation, however janky it is, in presentation and structure is enormously helpful in getting through the review burden. Formalism and over-precision of language is also helpful in establishing a tone of authority which is more highly valued than clarity. And finally, especially in ACM-associated conferences, some part of peer review is also gatekeeping that the template is used correctly.

After peer review the co-authors would probably have some edits to make to the paper. The main author would likely make the edits, share the Word doc around again for a final round of comments, and then upload the final PDF to the ACMs publishing system. Eventually the PDF makes its way on to the web.

Not so long ago, the PDF would have been compiled into a real printed book and the submission of the final PDF would have made that relatively easy.

Colusso and colleagues’ paper, for all its presentation faults, was the product of an enormous system of people, organisations and technologies that have existed for a very long time. The technical debt doesn’t just exist at the ACM but in the chain of software used way back to the production of the first draft. Changing the final output to be accessible in the way that modern web pages should be, requires no small amount of effort as it changes a long established worldwide workflow. Apparently the ACM is working towards this change:

ACM is changing the archive format of its publications to separate content from presentation in the new Digital Library, enhance accessibility, and improve the flexibility and resiliency of our publications. This approach requires a new workflow that utilizes a simplified review” format and a final submission” format. The final submission” is submitted to ACMs new production platform where authors will be able review PDF and HTML output formats before publication.

As I said at the start, this isn’t to excuse the formatting, just to contextualise it as being highly optimised for four things:

  1. collaboration among co-authors
  2. production of text with suitable references to pre-existing sources
  3. peer review
  4. print production for consumption by academics

Optimising for a new thing: consumption by people who aren’t academics, requires substantial intervention. The choke-point of just before final publication is the best place to intervene but is also the place with the greatest inertia. Change will come, but slowly. Until then, know that academics, especially design academics, think that the system is broken too.


Date
January 5, 2020


I'm @bjkraal@aus.social on Mastodon