All I need is the conlang word. Tab delimited is probably easier, and in a
spreadsheet is actually the best. Thanks! I'll look at the texts, too. Those
should be enough - I don't need this to be entirely exhaustive, I'm running
very limited tests because I won't be running these through iterated
learning models...although...I could.....hmmmmmmmm
As far as citation goes - what do you guys think is normal for this? First
name, last name, title of language, date accessed, site? I guess that's it,
right? Name of language.
I forgot to mention: *if possible, a list of your orthographic conventions
is incredibly helpful*, especially in the case if digraphs.
On Wed, Mar 9, 2011 at 12:26 AM, Tony Harris <[log in to unmask]> wrote:
> My Alurhsa vocabulary is stored in an OpenOffice spreadsheet and a MySQL
> database populated from that, so whipping out a set of vocabulary for either
> would be pretty easy. Do you just need conlang word and full english
> translation of that word, or just conlang word, or what? Tab delimited,
> comma delimited, etc?
> As for texts, I do have some texts online at http://alurhsa.org, under
> Language, Sample Texts. Or the direct link to that list of pages is
> I have a few more texts keyed in (not counting personal journals which are
> myriad but which I'm not willing to share publicly) that haven't made it
> online yet if those aren't enough.
> What additional info would you want for appropriate citation?
> On 03/08/2011 04:41 PM, Richard Littauer wrote:
>> I'm doing my dissertation here at the University of Edinburgh on the
>> evolution of word segmentation techniques used in first language
>> acquisition. I'm doing it by modelling learners and comparing their output
>> over hundreds of generations. It's pretty niche, but rather fun.
>> Part of what I've been doing is looking at lexical discreteness, that is,
>> how different words are from each other in a lexicon. This got me thinking
>> about conlangs, and how discrete the words are in created languages versus
>> natural languages. With this in mind, I've worked up some code to convert
>> linguistic corpora into parsable strings, and I hope to analyse them to
>> if there is a difference between conlangs and natural languages. There
>> be some pretty interesting implications to this study - I'm hoping to
>> present those at the LCC4.
>> That having all been said, I need more corpora. I have the Na'vi lexicon,
>> and the Dothraki one, and my own Llárriésh one, mostly because I am the
>> controlling the .sql files for those. However, I'd really like it if I
>> have a broader sample. So, *could you send me a file with your languages
>> words*, or point me to the right place? I don't need the english
>> translations. I also don't need it in a list form, although that would be
>> nice. I basically just need a massive chunk of your languages words. I'll
>> factor out those that aren't unique (homophones).
>> If you don't have a list that I could use, any chance you have a few pages
>> worth of text lying around that you wouldn't mind sending over, or showing
>> me where it is? I'm not overly worried about morphology, either - I'm
>> painting broad strokes, here. Basically, if you have a substantial chunk
>> translations in your conlang, I would appreciate if I could see those.
>> Sound cool? Any questions? I am willing to pay in thanks. Sadly, I am
>> otherwise just a broke undergraduate student like the rest of you (were or
>> are or will be or won't be. (ha)).
>> Feel free to message me back privately, or through the list. I'll be sure
>> cite you as a source and include the name of your language in my write-up,
>> of course. And I promise not to misuse the data - literally just going to
>> put it into numeric form, and then run some code over that.