Print

Print


Dear all,

Following my previous email, I just want to tell you that I found a solution.
For those who might be interested for parsing TEI data within R:

#getNodeSet from //ns:, then
listInterp <- unlist(nodelist)
listInterp
 for (i in 1:length(interp)) {
   {
   (cbind(
    (a=(KTU = (xmlGetAttr(interp[[i]],"ana")))), 
    (b=(verb.category = (xmlGetAttr(interpRef[[i]],"ana")))),
    (c=(Character = (xmlGetAttr(interpPers[[i]],"ana")))),
    (d=(State =(xmlGetAttr(interpPers_State[[i]],"ana")))),
    (e= Location = (xmlGetAttr(interpPlace_Loc[[i]],"ana")))
     ))
   listInterp[[i]] <- (paste(word(word(a,-1)), collapse=": ", (word(b, -2, -1)), (word(c, -2)), (word(d, -1)), (word(e, -1)))) #to select attribute values
    }
   listInterp <- unlist(lapply(listInterp,gsub,pattern="#",replacement="")) #to replace # by empty space  
}
listInterp

> Result for listInterp (sample)
[1] "ktu1-3_ii_l5b-6a verb.competition contend ANT active outside" 
[2] "ktu1-3_ii_l6b verb.competition contend ANT active outside"   
[3] "ktu1-3_ii_l7 verb.emotion humiliation ANT active outside"    
[4] "ktu1-3_ii_l8 verb.emotion humiliation ANT active outside"    


Best,

Vanessa

---
Vanessa Juloux | Ph.D. candidate

» Ecole Pratique des Hautes Etudes <http://www.ephe.sorbonne.fr/>  (EPHE, France), Paris Sciences et Lettres <https://www.univ-psl.fr/en> (PSL, France) Research University 
» Cultural anthropology of Ancient Near East, 
» Data coordinator & digital humanities monitoring (EPHE, PSL)
» Chair Membership and Outreach Sub-committee for Europe (American Schools of Oriental Research <http://www.asor.org/>, USA)
Mobile + WhatsApp: +33 (0) 6 98 97 02 02
Academia <https://ephe.academia.edu/VanessaJuloux>, vanessajuloux.xyz <http://vanessajuloux.xyz/>   
@vjuloux <https://twitter.com/vjuloux>, skype: mosioatunya
> Le 6 avr. 2017 à 00:43, Vanessa Juloux <[log in to unmask]> a écrit :
> 
> Dear all,
> 
> I am currently parsing TEI data within R, amongst which @ana attributes in a <ref> that have several values.
> My apologies for this post in the TEI list, but since parsing TEI data is very specific, I hope you will be able to help me since I am looking for an answer for few days now. 
> 
> Do you know how is working the hierarchy of multiple attribute values?
> 
> Example in TEI: <ref ana="whatAction #ktu1-3_ii_l6b_tḫtṣb #confrontation #action">Action, subcategory confrontation
>                                        <stage ana="whatResult #result #defeate_ofOpposition"/></ref>
> 
> The parsing in R is done with getNodeSet and xmlGetAttr functions:
> 
> interpRef <- getNodeSet(doc,"//ns:ref[contains(@ana, 'whatAction')]", ns) 
> #interpRef=paste0("'#",interpRef,"'")
> interpRef_ana <- for (i in 1:length(interpRef)) print(paste(xmlGetAttr(interpRef[[i]],"ana")))
> 
> Result:
> [1] "whatAction #ktu1-3_ii_l6b_tḫtṣb #confrontation #action"
> Do you know how I can select relevant attribute value from @ana? I would like to have only:
> [1] "#ktu1-3_ii_l6b_tḫtṣb #confrontation"
> 
> 
> A suggestion?
> 
> In advance, thank you very much.
> 
> Best,
> Vanessa