Print

Print


Linguifex will gladly host a backup mirror.

On Sep 6, 2017 19:43, "kaleissin" <[log in to unmask]> wrote:

> So: I finally managed to download a not too shabby copy of Jeffrey
> Henning's Langmaker website from the Wayback Machine, without all the
> javascript cruft. Actually, I have two copies: one of the pre-wiki site,
> and one of the post-wiki site before the database broke.
>
> I've gotten permission from Jeffrey to datamine this, and I will, but I
> also got permission for us, the conlang community, to set up a read only
> archival copy of langmaker.com.
>
> Cleanup needed!
> ---------------
>
> Now *I* don't have the time or energy to do this: a minimal cleanup
> phase is needed. This is potentially time consuming and quite dull:
> every link that starts with "http://www.langmaker.com/" will need to
> have that bit changed to "/", so that the links will work again. Other
> internal links are broken and ought to be fixed, too, but those are
> harder to find and can't be easily fixed with a script, if need be.
>
> There's tons of links that now points nowhere thanks to GeoCities and
> other similar sites being long gone, these can't really be fixed in a
> way most will agree is a good method. I recommend these be left alone.
>
> How to generate your own copy
> -----------------------------
>
> I caved and used the ruby script in the link below to fetch the data:
>
>   https://github.com/hartator/wayback-machine-downloader
>
> I used the following commands:
>
> Pre-wiki:
>
> wayback_machine_downloader -t 2005 http://www.langmaker.com
>
> That is: up to and including all of 2005 if I understand how the Wayback
> Machine works correctly.
>
> The uncompressed data takes up some 138 MB in 8656 files and folders.
>
> Post-wiki pre-broken database:
>
> wayback_machine_downloader -f 2006 -t 20080618214613
> http://www.langmaker.com:80/
>
> That is: from and including all of 2006 up to and including 2008-06-18
> 21:46:13 (unknown timezone) which is the date of the latest seemingly
> unbroken copy in the Wayback Machine.
>
> The uncompressed data takes up some 220 MB in 22205 files and folders.
>
> Running these commands took forever and took up a lot of data traffic so
> I do not recommend that y'all repeat the procedure. I have compressed
> archives of my two dumps so for these exact date ranges you don't need
> to generate anything.
>
> Where to put the raw data?
> --------------------------
>
> Who wants the files? Where should I put them? I don't want to upload
> them to something only under my control, we gotta spread 'em around for
> backup purposes.
>
>
> K
>