About
Table of Contents
Keyword list
Journal feed
RSS
ATOM



Desk items

Person

Recent desk items

Recent comments

Getting around digital archive walls

This is a: snip, written by Birgit Kellner 241 days ago.
Keywords: digital tricks

Some digital archives don’t offer entire books for download, but only allow you to see one page at a time – you have to use their website to go through the book.

This is inconvenient. And especially when the books in question are in the public domain anyway, I see no reason why publicly funded digital archives should operate this way. Perhaps some users only want to look at an individual page, but why not offer a PDF (or DjVU) download in addition, for those who prefer to read offline? (Or even print?)

The workaround: find out the directory that contains the files (for instance by right-clicking on a page image on the site in your Firefox, choosing “copy image address”, and pasting into a text editor), and then run:

wget -r http://thesite.com/thedirectory/

(Oh, and install wget for your operating system in case you are so unfortunate as not to have it.)

Some sites prohibit users from downloading entire directories. But it shouldn’t be too difficult to generate a list of all the files you want, and store them in a file (say, list.txt), with one download URL per line. And then:

wget -i list.txt.

This should do the trick. But if the site is so malicious as to prohibit access through wget altogether, just pretend you are a browser:

wget -U firefox -i list.txt.

Watch the files fly to your harddrive, and enjoy.


  1. Manuel Batsching    240 days ago    #

  2. Kellner    240 days ago    #
Name
E-mail
http://
Message
  Textile Help

::