While not specifically TEI-oriented, we thought the following
might be of interest to some TEI folks.
In order to support research and experimentation at ARTFL and
the Digital Library Development Center (DLDC) at the University
of Chicago, we have developed a set of machine learning and
text mining extensions to PhiloLogic (http://philologic.uchicago.edu/).
Philomine is designed to work with databases currently loaded under
PhiloLogic without modification. It has a pretty wide set of
functions and options designed to allow users to set up and run
various machine learning and text mining experiments interactively.
These include a number of classifiers -- such as Naive Bayes,
SMVLight, Weka implementations of SVM (SMO), Information Gain -- as
well as an experimental document similarity/clustering function. We
have also included a number of feature set selection options.
Whereever possible, we try to build links from Philomine results
to more traditional search results in PhiloLogic, so the user
can click on a particular feature (word) strongly associated with
some document characteristic and examine its contexts.
Like PhiloLogic, Philomine is Open Source software and is primarily
written in perl. This is an initial alpha release of a system that
is being actively developed, with many dependencies and more than
likely a number of installation and usability issues. Also, due
to the computer resource-intensive nature of machine learning and
text mining functions we decided not to try to present a live
demonstration site, which we think would swamp our research machines,
but have presented a number of sample results from the system.
for an overview, a discussion of design rationale, function list,
lots of examples and, future plans and, of course, source code.
Do let us know what you think, problems you may encounter, and
we always gladly receive fixed code, patches, and extensions. :-)
Contact information is on the site.
Russell Horton and Mark Olsen
DLDC and ARTFL