Looping over group in a Python regex -
edit: i've gotten work--i had forgotten put in space separator multiple edges.
i've got python regex, handles of strings have parse.
edge_value_pattern = re.compile(r'(?p<edge>e[0-9]+) +(?p<label1>[^ ]*)[^"]+"(?p<word>[^"]+)"[^:]+:: (?p<label2>[^\n]+)')
here example string regex meant parse:
'e0 bike-event 1 "biking" 2'
it correctly stores e0 edge group, bike-event label1 group, , "biking" word group. last group, label2, different variation of string, shown below. note label2 regex group behaves expected when given string 1 below.
'e29 e30 "of" :: of, of'
however, regex pattern fills in label1 value e30. truth string not have label1 value--it should none or @ least empty string. ad-hoc solution parse label1 regex determine if it's actual label or edge. want know if there way modify original regex group edge takes in edges. e.g., output above string be:
edge = "e29 e30"
label1 = none
word = of
label2 = of, of
i tried solution below, thought translate looping on first group, edge (this trivial if had actual fsa), doesn't change behavior of regex.
edge_value_pattern = re.compile(r'(?p<edge>(e[0-9]+)+) +(?p<label1>[^ ]*)[^"]+"(?p<word>[^"]+)"[^:]+:: (?p<label2>[^\n]+)')
if want edge match "e29 e30", have put repetition inside group, not outside.
you did sticking new group inside edge group + repetition—which fine, although wanted non-capturing group there—but forgot include space inside repeating group.
(you left external repeat, , used capturing group wanted non-capturing, less serious.)
look @ fragment:
(?p<edge>(e[0-9]+)+) 
here, expression catches e29 1 match, e30 subsequent match. so, if add else expression, it's either going miss e29, or fail. add space:
(?p<edge>(e[0-9]+ )+) 
and it's matching e29 e30 plus trailing space single match, means can tack on additional stuff , work (as long additional stuff right—you still need remove +, , think may need make couple of other repetitions non-greedy…).
Comments
Post a Comment