Print

Print


I cannot believe I am the first person to notice what a pain it is that 
different blogging systems use different "XML export" formats, nor the 
first  to wonder how hard it might be to define a simple TEI-based 
interchange format, so that one could take one's lovingly crafted data 
out of Birdpress and put it into Slogger, or vice versa, or just store 
it elsewhere, without loss of information. They both export XML, and we 
all know how to do XML conversions, right?

However there's a problem. Both the systems I've looked at more than 
cursorily allow you to insert arbitrary bits of HTML tagging in their 
postings. So when they export your data, they have to do something to 
hide whatever sins of ill-formedness it might contain... so (in the case 
of Birdpress) everything gets hidden away in a CDATA marked section or 
(in the case of Slogger) all the pointy brackets get escaped as entity 
references. The trouble is, of course, that buried away in this hidden 
HTML there are some really useful things, like links, or pointers to 
graphics, which I want to preserve in my TEI export.  Drat! do I really 
have to tweak all this with horrible pattern matching expressions if I 
want to use xslt2? It would appear so...

Anyone been down this path already? Just thought I'd ask...