Menu

#5 bugs in mbox parsing

closed-invalid
None
5
2002-03-25
2002-03-20
No

I don't know Perl so I can't submit any patches.

About compute_sigs in Agent.pm

I've written my own maildir to mbox converter and razor
wasn't able to check the mbox, in fact it always only
checked the first mail of the mbox file.

After looking at Agent.pm, I've modified my
maildir2mbox converter to create different "From "
lines (first line in mbox, starting a new message). Now
it works.

I blame this line in the function:

if ($line =~ /^From / && $mailbox[0] ne $line) {

the $mailbox[0] seems to be responsible for this
behavior. IMO it is a bug because nowhere in the mbox
specs does it say that the "From " lines must be different.

Ok. I think I've found another one: the mbox parsing
code looks fishy to me: The mbox spec says that "From "
starts a new message and that any other "From "s,
">From "s, ">>From "s and so on must be quoted with a
">" in front of them. Your mbox parser doesn't seem to
do the necessary un-quoting. This is only from a look
at the Perl code from a non-Perl programmer, so I could
miss some deep magic.

Discussion

  • Vipul Ved Prakash

    Logged In: YES
    user_id=22038

    The parsing code is correct. The only way to parse a
    mbox file is to look for "^From " and use it
    as the start of message until the next "^From "
    or EOF is encountered.

    As to your second observation, we don't want to unquote
    other From's because they obviously don't indicate
    start of a new message (and hence they are unquoted).

    If there's another issue with this code then it's not
    clear to me. Maybe you need to explain better.

    cheers,
    vipul.

     
  • Vipul Ved Prakash

    • assigned_to: nobody --> hackworth
    • status: open --> closed-invalid
     
  • Gerhard Häring

    Gerhard Häring - 2002-03-25

    Logged In: YES
    user_id=163326

    Ok, more info.

    You may want to read the mbox specification at
    http://www.qmail.org/man/man5/mbox.html again, especially
    the part about ">From quoting". The current code doesn't do
    the necessariy unquoting.

    I'll try to upload a test case soon to prove that the if
    line above is indeed buggy.

     
  • Gerhard Häring

    Gerhard Häring - 2002-03-29
     
  • Gerhard Häring

    Gerhard Häring - 2002-03-29
     
  • Gerhard Häring

    Gerhard Häring - 2002-03-29

    Logged In: YES
    user_id=163326

    I've uploaded two files that demonstrate the problem with
    the mbox parsing. Use "razor-check -d -M $filename" on the
    two files to see that it only checks the first message for
    the mbox.doesnotwork file while it checks all three messages
    for the mbox.works file. Then just use a diff to show that
    the difference between the file is *only* in the first From
    line that seperates messages in a mbox.

    As for the other bug, and a bug it is, despite that you were
    really quick to mark this report as "invalid", it's IMO an
    important one as the lack of "From " unqouting effectively
    calculates different checksums for certain messages
    depending on wether they're parsed from an mbox file or as a
    single message. All the details about "From " quoting can be
    found in the mbox "specification" I gave you the URL for.

    I hope this additional information helps you to realize that
    my bug report is not invalid and perhaps gives the necessary
    starting point for the (trivial) fixes to the source code.

     

Log in to post a comment.