regex - How can I remove all spaces between two XML tags? -
i have perl script want amend remove spaces between 2 xml tags.
example xml:
<tag> <tag1><tag2>abc 123 def 456 ... </tag2></tag1><tag1><tag2>xyz 987 ... </tag> i'd remove occurrences of spaces between tag2 tags. tried following:
$vmodstrg =~ s/(<tag2>(.*?)<\/tag2>)/<tag2>zzzzzz<\/tag2>/g; but replaces whole match zzzzz. how can tell perl remove spaces match occurrences of tag2?
regular expressions bad tool job, because parsing xml requires recursion. can newer versions of regex, @ best leads complicated , hard read regular expressions, , ones edge cases they'll break.
see: why it's not possible use regex parse html/xml: formal explanation in layman's terms
so use parser - remove 'spaces between <tag2> elements':
#!/usr/bin/env perl use strict; use warnings; use xml::twig; #parse data our "data" filehandle. #you might want "parsefile('somefilename.xml')" instead. $twig = xml::twig -> parse ( \*data ); #iterate 'text' below "tag2" anywhere in document. foreach $tag ( $twig -> get_xpath ('//tag2/#text') ) { #modify tag. $tag -> set_text($tag -> text =~ s/\s+//gr ); } #set output options $twig -> set_pretty_print('indented_a'); #print stdout. might want: #print {$output_fh} $twig -> sprint; $twig -> print; __data__ <root> <tag2>words spaces</tag2> <tag2> <child>wordswordswords more words </child> </tag2> <tag1>some more words spaces</tag1> <tag2>something here <another_child att="fish" /> </tag2> </root> this outputs:
<root> <tag2>wordswithspaces</tag2> <tag2> <child>wordswordswords more words </child> </tag2> <tag1>some more words spaces</tag1> <tag2>somethinghere<another_child att="fish" /></tag2> </root> so can see - correctly modifying text between <tag2> elements, , leaving other stuff untouched. , bonus points, it's @ least clear it's doing equivalent regex be!
Comments
Post a Comment