Print

Print


So: I finally managed to download a not too shabby copy of Jeffrey
Henning's Langmaker website from the Wayback Machine, without all the
javascript cruft. Actually, I have two copies: one of the pre-wiki site,
and one of the post-wiki site before the database broke.

I've gotten permission from Jeffrey to datamine this, and I will, but I
also got permission for us, the conlang community, to set up a read only
archival copy of langmaker.com.

Cleanup needed!
---------------

Now *I* don't have the time or energy to do this: a minimal cleanup
phase is needed. This is potentially time consuming and quite dull:
every link that starts with "http://www.langmaker.com/" will need to
have that bit changed to "/", so that the links will work again. Other
internal links are broken and ought to be fixed, too, but those are
harder to find and can't be easily fixed with a script, if need be.

There's tons of links that now points nowhere thanks to GeoCities and
other similar sites being long gone, these can't really be fixed in a
way most will agree is a good method. I recommend these be left alone.

How to generate your own copy
-----------------------------

I caved and used the ruby script in the link below to fetch the data:

  https://github.com/hartator/wayback-machine-downloader

I used the following commands:

Pre-wiki:

wayback_machine_downloader -t 2005 http://www.langmaker.com

That is: up to and including all of 2005 if I understand how the Wayback
Machine works correctly.

The uncompressed data takes up some 138 MB in 8656 files and folders.

Post-wiki pre-broken database:

wayback_machine_downloader -f 2006 -t 20080618214613
http://www.langmaker.com:80/

That is: from and including all of 2006 up to and including 2008-06-18
21:46:13 (unknown timezone) which is the date of the latest seemingly
unbroken copy in the Wayback Machine.

The uncompressed data takes up some 220 MB in 22205 files and folders.

Running these commands took forever and took up a lot of data traffic so
I do not recommend that y'all repeat the procedure. I have compressed
archives of my two dumps so for these exact date ranges you don't need
to generate anything.

Where to put the raw data?
--------------------------

Who wants the files? Where should I put them? I don't want to upload
them to something only under my control, we gotta spread 'em around for
backup purposes.


K