I don't know Perl so I can't submit any patches.
About compute_sigs in Agent.pm
I've written my own maildir to mbox converter and razor
wasn't able to check the mbox, in fact it always only
checked the first mail of the mbox file.
After looking at Agent.pm, I've modified my
maildir2mbox converter to create different "From "
lines (first line in mbox, starting a new message). Now
it works.
I blame this line in the function:
if ($line =~ /^From / && $mailbox[0] ne $line) {
the $mailbox[0] seems to be responsible for this
behavior. IMO it is a bug because nowhere in the mbox
specs does it say that the "From " lines must be different.
Ok. I think I've found another one: the mbox parsing
code looks fishy to me: The mbox spec says that "From "
starts a new message and that any other "From "s,
">From "s, ">>From "s and so on must be quoted with a
">" in front of them. Your mbox parser doesn't seem to
do the necessary un-quoting. This is only from a look
at the Perl code from a non-Perl programmer, so I could
miss some deep magic.
Logged In: YES
user_id=22038
The parsing code is correct. The only way to parse a
mbox file is to look for "^From " and use it
as the start of message until the next "^From "
or EOF is encountered.
As to your second observation, we don't want to unquote
other From's because they obviously don't indicate
start of a new message (and hence they are unquoted).
If there's another issue with this code then it's not
clear to me. Maybe you need to explain better.
cheers,
vipul.
Logged In: YES
user_id=163326
Ok, more info.
You may want to read the mbox specification at
http://www.qmail.org/man/man5/mbox.html again, especially
the part about ">From quoting". The current code doesn't do
the necessariy unquoting.
I'll try to upload a test case soon to prove that the if
line above is indeed buggy.
Logged In: YES
user_id=163326
I've uploaded two files that demonstrate the problem with
the mbox parsing. Use "razor-check -d -M $filename" on the
two files to see that it only checks the first message for
the mbox.doesnotwork file while it checks all three messages
for the mbox.works file. Then just use a diff to show that
the difference between the file is *only* in the first From
line that seperates messages in a mbox.
As for the other bug, and a bug it is, despite that you were
really quick to mark this report as "invalid", it's IMO an
important one as the lack of "From " unqouting effectively
calculates different checksums for certain messages
depending on wether they're parsed from an mbox file or as a
single message. All the details about "From " quoting can be
found in the mbox "specification" I gave you the URL for.
I hope this additional information helps you to realize that
my bug report is not invalid and perhaps gives the necessary
starting point for the (trivial) fixes to the source code.