Print

Print


Ciarán --

I think Martin is basically right. I don't know of any software out
in the world, let alone Windows-based software, that will on its own
interpret the <c> element as you want. (Although perhaps some could
be configured to do so.) But running your data through an XSLT
pre-processor would likely yield quite satisfactory results.

Here is an example of one:

--------- begin program c_is_for_Ciarán.xslt ---------
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  exclude-result-prefixes="#all"
  xpath-default-namespace="http://www.tei-c.org/ns/1.0"
  version="3.0">
  
  <!--
    c_is_for_Ciarán.xslt
    Copyleft 2018 Syd Bauman and the Women Writers Project, few rights reserved.
    Feel free to copy, modify, run, use this pgm pretty much however you want,
    just please leave attribution to me somewhere, and the result must be copyleft.
    
    Demo program to read in a TEI P5 document, and write out 2 similar documents:
     - one is a copy *except* that <c> elements have been summarily dropped
     - one is a copy *except* that <c> *tags* have been dropped, but the content
       has been retained.
    See the thread "<c> tag" on TEI-L that started 2018-02-24T15:22Z.
  -->

  <!-- Explicitly state we're writing out XML: -->
  <xsl:output method="xml"/>
  <!-- Anything not matched is just copied over: -->
  <!-- (Why can't I get this to work with <xsl:mode on-no-match="shallow-copy">?) -->
  <xsl:template match="@*|node()" mode="#all">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()" mode="#current"/>
    </xsl:copy>
  </xsl:template>
  <!-- Get name of input file w/o ending ".xml": -->
  <xsl:param name="baseName" select="substring( document-uri(/), 1, string-length(document-uri(/)) - 4 )"/>
  <!-- (BTW, if input filename does not end in ".xml" this pgm may not work) -->

  <!-- Match the document root, and ... -->
  <xsl:template match="/">
    <!-- ... generate both output URIs -->
    <xsl:variable name="name4token_extraction" select="concat( $baseName, '_extractTokens.xml')"/>
    <xsl:variable name="name4display" select="concat( $baseName, '_display.xml')"/>
    <!-- putting output into file for token extraction ... -->
    <xsl:result-document href="{$name4token_extraction}">
      <!-- ... process all child nodes for token extraction -->
      <xsl:apply-templates select="node()" mode="extractTokens"/>
    </xsl:result-document>
    <!-- putting output into file for display ... -->
    <xsl:result-document href="{$name4display}">
      <!-- ... process all child nodes for display -->
      <xsl:apply-templates select="node()" mode="display"/>
    </xsl:result-document>
  </xsl:template>

  <!-- Remember, processing of any node other than the document root or those listed below
       just results in a copy of said node. Thus all we do here is copy <c> differently,
       and the result is two output files that are the same except where <c>s occurred in
       the input. -->
  <!-- For token extraction, drop the entire <c> element. -->
  <xsl:template match="c" mode="extractTokens"/>
  <!-- For display keep the *content* of the <c>, but drop the tags. -->
  <xsl:template match="c" mode="display">
    <xsl:apply-templates select="node()"/>
  </xsl:template>
  
</xsl:stylesheet>
--------- end program c_is_for_Ciarán.xslt ---------

BTW, I realize this is not the XSLT list, but if someone could show
me how to do the same thing using the new <xsl:mode
on-no-match="shallow-copy"/>, I'd appreciate it.

> I think your best approach would be a simple XSLT
> transformation. What kind of output format do you want? What will
> you be using to display the results?