Looping over group in a Python regex -
edit: i've gotten work--i had forgotten put in space separator multiple edges.
i've got python regex, handles of strings have parse.
edge_value_pattern = re.compile(r'(?p<edge>e[0-9]+) +(?p<label1>[^ ]*)[^"]+"(?p<word>[^"]+)"[^:]+:: (?p<label2>[^\n]+)')
here example string regex meant parse:
'e0    bike-event              1 "biking" 2'
it correctly stores e0 edge group, bike-event label1 group, , "biking" word group.  last group, label2, different variation of string, shown below.  note label2 regex group behaves expected when given string 1 below.  
'e29 e30                          "of" :: of, of'
however, regex pattern fills in label1 value e30.  truth string not have label1 value--it should none or @ least empty string.  ad-hoc solution parse label1 regex determine if it's actual label or edge.  want know if there way modify original regex group edge takes in edges.  e.g., output above string be:
edge = "e29 e30"
label1 = none
word = of
label2 = of, of
i tried solution below, thought translate looping on first group, edge (this trivial if had actual fsa), doesn't change behavior of regex.  
edge_value_pattern = re.compile(r'(?p<edge>(e[0-9]+)+) +(?p<label1>[^ ]*)[^"]+"(?p<word>[^"]+)"[^:]+:: (?p<label2>[^\n]+)')
if want edge match "e29 e30", have put repetition inside group, not outside.
you did sticking new group inside edge group + repetition—which fine, although wanted non-capturing group there—but forgot include space inside repeating group.
(you left external repeat, , used capturing group wanted non-capturing, less serious.)
look @ fragment:
(?p<edge>(e[0-9]+)+)   
here, expression catches e29 1 match, e30 subsequent match. so, if add else expression, it's either going miss e29, or fail. add space:
(?p<edge>(e[0-9]+ )+)   
and it's matching e29 e30 plus trailing space single match, means can tack on additional stuff , work (as long additional stuff right—you still need remove +, , think may need make couple of other repetitions non-greedy…).
Comments
Post a Comment