Skip to content
  • Miguel Pineiro Jr's avatar
    4236e5a9
    Fix regular expression clobbering in the lexer · 4236e5a9
    Miguel Pineiro Jr authored
    It is a mistake for the lexer not to make a copy of regexp literals,
    because the parser may need to shift more than one of them onto its
    stack as it searches for a matching rule.
    
    For the following demonstration, the correct output is "1b":
    
            $ echo ab | ./master 'sub(/a/, "b" ~ /b/)'
            a1
    
    The debugging output confirms that /b/ clobbered /a/, incorrectly
    modifying sub's first argument:
    
            $ echo ab | ./master -d 'sub(/a/, "b" ~ /b/)' | grep reparse
            reparse <b>
            reparse <b>
    
    One of those should have been "reparse <a>".
    
    I introduced this bug in a fix [1] for memory leaks discovered by
    Todd C. Miller [2]. This commit reverts my fix and applies a slightly
    modified version of Todd's recommendation (I omit the INDEX case
    because reg_expr is stored in a node).
    
    [1] 577a67cc
    [2] https://github.com/onetrueawk/awk/pull/156
    
    Testing with `valgrind --leak-check=full $awk "$prog" </dev/null`, the
    leaks produce something like this:
    
    ==473055== 4 bytes in 1 blocks are definitely lost in loss record 36 of 128
    ==473055==    at 0x484586F: malloc (vg_replace_malloc.c:381)
    ==473055==    by 0x49E54FD: strdup (strdup.c:42)
    ==473055==    by 0x409288: tostring (tran.c:522)
    ==473055==    by 0x411DF5: regexpr (lex.c:557)
    ==473055==    by 0x402E99: yyparse (awkgram.tab.c:2251)
    ==473055==    by 0x402877: main (main.c:211)
    
    Where $prog is any one of these one-liners:
    
    '/abc/; /cde/'
    '"abc" ~ /cde/; /fgh/'
    'match(/abc/, /cde/)'
    'split(/abc/, a, /cde/)'
    'sub(/abc/, /cde/)'
    'gsub(/abc/, /cde/)'
    'sub(/abc/, /cde/, a)'
    'gsub(/abc/, /cde/, a)'
    4236e5a9
    Fix regular expression clobbering in the lexer
    Miguel Pineiro Jr authored
    It is a mistake for the lexer not to make a copy of regexp literals,
    because the parser may need to shift more than one of them onto its
    stack as it searches for a matching rule.
    
    For the following demonstration, the correct output is "1b":
    
            $ echo ab | ./master 'sub(/a/, "b" ~ /b/)'
            a1
    
    The debugging output confirms that /b/ clobbered /a/, incorrectly
    modifying sub's first argument:
    
            $ echo ab | ./master -d 'sub(/a/, "b" ~ /b/)' | grep reparse
            reparse <b>
            reparse <b>
    
    One of those should have been "reparse <a>".
    
    I introduced this bug in a fix [1] for memory leaks discovered by
    Todd C. Miller [2]. This commit reverts my fix and applies a slightly
    modified version of Todd's recommendation (I omit the INDEX case
    because reg_expr is stored in a node).
    
    [1] 577a67cc
    [2] https://github.com/onetrueawk/awk/pull/156
    
    Testing with `valgrind --leak-check=full $awk "$prog" </dev/null`, the
    leaks produce something like this:
    
    ==473055== 4 bytes in 1 blocks are definitely lost in loss record 36 of 128
    ==473055==    at 0x484586F: malloc (vg_replace_malloc.c:381)
    ==473055==    by 0x49E54FD: strdup (strdup.c:42)
    ==473055==    by 0x409288: tostring (tran.c:522)
    ==473055==    by 0x411DF5: regexpr (lex.c:557)
    ==473055==    by 0x402E99: yyparse (awkgram.tab.c:2251)
    ==473055==    by 0x402877: main (main.c:211)
    
    Where $prog is any one of these one-liners:
    
    '/abc/; /cde/'
    '"abc" ~ /cde/; /fgh/'
    'match(/abc/, /cde/)'
    'split(/abc/, a, /cde/)'
    'sub(/abc/, /cde/)'
    'gsub(/abc/, /cde/)'
    'sub(/abc/, /cde/, a)'
    'gsub(/abc/, /cde/, a)'
Loading