Notes on scraping the Codex Arundel to preserve it

Kragen Javier Sitaker, 2017-08-22 (1 minute)

The Codex Arundel is a codex of manuscripts from Leonardo da Vinci, made of 283 sheets of paper, each with a recto and a verso side. The British Library has scanned it at 8579×6250 pixels, divided into 256-pixel-square tiles with one pixel of overlap, each about 12 kilobytes, fetchable using wget. This works out to 34×25 tiles per side of a page, so the total number of tiles is (* 34 25 283 2) = 481,100, and the total weight of the work should be about 5.8 gigabytes.

Initially each of the 566 images loads as an image of about 1800×800, about 2.7% of the total size. It would make some sense to fetch these low-resolution images first (totaling 160 megabytes) before making the possibly doomed effort to fetch the whole dataset.

The page for the thing is at https://www.bl.uk/manuscripts/FullDisplay.aspx?ref=Arundel_MS_263.

Topics