I cannot believe I am the first person to notice what a pain it is that
different blogging systems use different "XML export" formats, nor the
first to wonder how hard it might be to define a simple TEI-based
interchange format, so that one could take one's lovingly crafted data
out of Birdpress and put it into Slogger, or vice versa, or just store
it elsewhere, without loss of information. They both export XML, and we
all know how to do XML conversions, right?
However there's a problem. Both the systems I've looked at more than
cursorily allow you to insert arbitrary bits of HTML tagging in their
postings. So when they export your data, they have to do something to
hide whatever sins of ill-formedness it might contain... so (in the case
of Birdpress) everything gets hidden away in a CDATA marked section or
(in the case of Slogger) all the pointy brackets get escaped as entity
references. The trouble is, of course, that buried away in this hidden
HTML there are some really useful things, like links, or pointers to
graphics, which I want to preserve in my TEI export. Drat! do I really
have to tweak all this with horrible pattern matching expressions if I
want to use xslt2? It would appear so...
Anyone been down this path already? Just thought I'd ask...