Looping over group in a Python regex -
edit: i've gotten work--i had forgotten put in space separator multiple edges.
i've got python regex, handles of strings have parse.
edge_value_pattern = re.compile(r'(?p<edge>e[0-9]+) +(?p<label1>[^ ]*)[^"]+"(?p<word>[^"]+)"[^:]+:: (?p<label2>[^\n]+)')
here example string regex meant parse:
'e0 bike-event 1 "biking" 2'
it correctly stores e0
edge
group, bike-event
label1
group, , "biking"
word
group. last group, label2
, different variation of string, shown below. note label2
regex group behaves expected when given string 1 below.
'e29 e30 "of" :: of, of'
however, regex pattern fills in label1
value e30.
truth string not have label1
value--it should none
or @ least empty string. ad-hoc solution parse label1
regex determine if it's actual label or edge. want know if there way modify original regex group edge
takes in edges
. e.g., output above string be:
edge = "e29 e30"
label1 = none
word = of
label2 = of, of
i tried solution below, thought translate looping on first group, edge
(this trivial if had actual fsa), doesn't change behavior of regex.
edge_value_pattern = re.compile(r'(?p<edge>(e[0-9]+)+) +(?p<label1>[^ ]*)[^"]+"(?p<word>[^"]+)"[^:]+:: (?p<label2>[^\n]+)')
if want edge
match "e29 e30"
, have put repetition inside group, not outside.
you did sticking new group inside edge
group +
repetition—which fine, although wanted non-capturing group there—but forgot include space inside repeating group.
(you left external repeat, , used capturing group wanted non-capturing, less serious.)
look @ fragment:
(?p<edge>(e[0-9]+)+)
here, expression catches e29
1 match, e30
subsequent match. so, if add else expression, it's either going miss e29
, or fail. add space:
(?p<edge>(e[0-9]+ )+)
and it's matching e29 e30
plus trailing space single match, means can tack on additional stuff , work (as long additional stuff right—you still need remove +
, , think may need make couple of other repetitions non-greedy…).
Comments
Post a Comment