Thursday, May 3, 2018

Digitizing the Vatican Secret Archives

The Vatican Secret Archives comprise a collection of materials sought after by church historians and conspiracy theorists alike. According to rumors going back centuries, whenever the Roman Catholic Church came upon a grimoire or any other text related to magick they didn't destroy it, but rather sent it to Rome. It would then be housed in the secret archives where the church could authorize whoever they wanted to access it, but keep it away from everyone else. This has been true in at least a couple of cases - the text that went into the Heptangle edition of the Nigromancia was apparently found there, according to the book's introduction.

A new effort underway to digitize the secret archives is going to test that theory, and if we're lucky provide grimoire magicians with a whole new set of texts that have been locked away for centuries. Since most of the documents stored in the archive are handwritten rather than printed, up until now this has proved to be a very difficult process. But this latest effort is employing machine learning in a novel way to get around some of those limitations.

The grandeur is obvious. Located within the Vatican’s walls, next door to the Apostolic Library and just north of the Sistine Chapel, the VSA houses 53 linear miles of shelving dating back more than 12 centuries. It includes gems like the papal bull that excommunicated Martin Luther and the pleas for help that Mary Queen of Scots sent to Pope Sixtus V before her execution. In size and scope, the collection is almost peerless.

That said, the VSA isn’t much use to modern scholars, because it’s so inaccessible. Of those 53 miles, just a few millimeters’ worth of pages have been scanned and made available online. Even fewer pages have been transcribed into computer text and made searchable. If you want to peruse anything else, you have to apply for special access, schlep all the way to Rome, and go through every page by hand.

But a new project could change all that. Known as In Codice Ratio, it uses a combination of artificial intelligence and optical-character-recognition (OCR) software to scour these neglected texts and make their transcripts available for the very first time. If successful, the technology could also open up untold numbers of other documents at historical archives around the world.

OCR has been used to scan books and other printed documents for years, but it’s not well suited for the material in the Secret Archives. Traditional OCR breaks words down into a series of letter-images by looking for the spaces between letters. It then compares each letter-image to the bank of letters in its memory. After deciding which letter best matches the image, the software translates the letter into computer code (ASCII) and thereby makes the text searchable. This process, however, really only works on typeset text. It’s lousy for anything written by hand—like the vast majority of old Vatican documents.

It will take years before this effort is complete, and it's not clear how many of the texts can be digitized in this way. But I have a feeling that magicians will likely be pleased with the results. There probably are a few grimoire texts stored away down there that haven't seen the light of day in a very long time. I'm hoping that the digitized archive will be made public and searchable, and when it is I'll be sure to add a link to it from Augoeides along with those to the Enochian source texts and some of the other links I've complied here.

I can only imagine what medieval magicians would have made of the digital age and the Internet. Back then, you would have been lucky to get your hands on even one or two grimoires and you just practiced with what you had, making comparison and critical analysis impossible. Now more material is made available every day, and the issues have more to do with identifying the most useful information and weeding out the rest. But that's a far better problem to have than scarcity and inaccessibility.

Technorati Digg This Stumble Stumble


Unknown said...

Even without OCR technology, there's no reason why they couldn't just scan the pages as a series of images and compile them into a digital document in the meantime. It's been/being done elsewhere and I don't understand why the Vatican's approach would be any different while the software is being refined, as it most likely always will be. It's still better than having a bunch of scribes type everything manually :P

Scott Stenwick said...

I think the idea behind using OCR is that will make the documents searchable. Images of the pages could certainly be created and archived somewhere for preservation, but that does nothing to address the sheer number of documents in the archives that researchers have to go through to find anything useful.