My company is slowly rolling out OC 2007; right now there are a mix of users (some with 2005, some with 2007).
Ever since commit 370146b any IM conversation that is initiated by a user with OC 2005 results in every reply from myself getting a "This message was not delivered to <user> because one or more recipients are offline". If the other user has OC 2007 I do not see this. If I initiate the conversation I do not see this (no matter what client is on the other end).
Example conversation
Debug log
This sounds like the same thing we're seeing here. I'm not sure exactly when the problem was introduced, but we were running on old build until recently when we updated to a new version. Now, I'm seeing this error quite regularly. Our users (that aren't using pidgin) are using OC 2005, and I typically see these problem when someone iniitiates the conversation to me. (I didn't realize until I read this report that it made a difference who initiates the conversation but I just tried a quick test and that seemed to be consistent with this bug too. I was in a conversation where I'm seeing the problem, then I disconnected and initiated the conversation myself and the problem went away.)
For what it's worth, the "message was not delivered" error is not printed until quite some time (30 seconds or more) after I've sent the message. I can tell from the other person's response that they are seeing and responding to my message immediately, and then some time later the error message is printed.
This continues to be an annoyance, so I'm wondering if there is any chance of getting a fix or workaround. I'll be happy to supply debug logs, or whatever additional information would be helpful.
Sorry, missed the fact that a debug log had already been added to this report.
The log shows for the example conversation (Call-ID: d60fd4cf76...) that all our MESSAGEs are left unacknowledged by OCS. Therefore the new 60 second timeout error message will be triggered for all of them.
It could be that this was always the behaviour for OCS 2005, but until now it was never noticed. Could you try the last release version 1.11.2? Does OCS 2005 acknowledge MESSAGEs with that version? If yes, could you then run "git bisect" to figure out the commit that broke MESSAGE generation for OCS 2005?
When I added the comment on 4/18 the problem was still occurring with the very latest snapshot from git (which called itself 1.11.2). I think there have been few, if any, changes since then.
Unfortunately, it's probably not practical for me to identify the exact change that introduced the problem (a combination of lack of knowledge of git, and the difficulty of building, deploying, and testing multiple versions to isolate the problem). However, the original bug report seemed to think the problem was introduced by "commit 370146b" - does that mean anything?
As I stated in my previous response: according to the debug log OCS2005 clients don't acknowledge our MESSAGEs like they are supposed to do. The commit 370146b introduced a 60 seconds message timeout to ensure that users get notified when their message aren't delivered in the case OCS hasn't recognized that the remote party went offline. So this commit doesn't cause the incorrect behaviour of OCS2005 clients, it just makes it visible.
Unless someone with access to OCS2005 clients can prove that in an older SIPE version OS2005 clients acknowledge to MESSAGEs, i.e. something new in SIPE causes them to no longer respond correctly, then we can only close this error as "Won't fix"/"Invalid".
You don't have to install multiple version. Just the copy on your local machine needs updated, so that you can run tests against an OCS2005 client (i.e. when OCS2005 stop to respond to MESSAGEs). "git bisect" is just a built-in binary search tool. If you have a git checkout from the standard branch "mob" you just follow this approach:
$ git reset --hard 1.11.2
# Build & Test that the older version works OK
# mob HEAD is bad, current HEAD is OK
$ git bisect start origin/mob HEAD
Bisecting: 146 revisions left to test after this (roughly 7 steps)
[41662784f650545963ccc6412751cdd1382b3e50] add "603 Decline" response to undelivered message handling
# Build & test
# Test failed:
$ git bisect bad
Bisecting: 72 revisions left to test after this (roughly 6 steps)
[5d03c3a98159b7edbb04cc5fdcd665bbffca22f8] session: remove is_multiparty and focus_uri fields
# Test OK:
$ git bisect good
Bisecting: 36 revisions left to test after this (roughly 5 steps)
[0f9fb48d88af57f80804b6799375dd5531da072d] l10n: Updated Polish (pl) translation to 100%
# rinse & repeat...
# Go back to mob HEAD
$ git bisect reset
$ git merge origin/mob
Updating f87e2b6..e7fa5d3
Fast-forward
...
I'm currently running on a build from last August (1.11.0) because it works with OCS2005 and doesn't demonstrate the problem. Would that log be useful to you?
It would show if OCS2005 clients always had that problem.
I've collected a debug log that shows the working scenario using the 1.11.0 build. However, I can't seem to attach it to this bug report. (I seem to recall that we ran into this before - I think maybe we decided that only the originator of the bug can attach files.) Any suggestions? I could mail it to you. Or put it on my own web server. Or I guess I could created another bug report.
pastebin is fine for me.
I've never used pastebin. I tried to create an account and "paste" in the log file, but got "You have exceeded the maximum file size of 500 kilobytes per paste. PRO users don't have this limit!"
I've place a copy at http://psfales.freeshell.org/pidgin.log-2011-05-05 - Can you use that?
The scenario is
- Start pidign (the plugin is 1.11.0 from last August)
- autologin to OCS 2005
- sip:mark.tischler@alcatel-lucent.com initiates a conversation and says "Hi"
- I respond with "Got it"
- Wait about two minutes (no errors reported, he reports by phone that he got the message)
- Terminate pidgin
Same problem there: OCS2005 client doesn't acknowledge our MESSAGEs. Excerpt from your log (after running contrib/debug/parse_log.pl on it):
------------- NEXT MESSAGE: incoming SIP at 2011-05-05T19:28:01.537071Z
MESSAGE sip:Peter.Fales@alcatel-lucent.com:37104;maddr=135.3.63.245;transport=tcp;ms-received-cid=7202E200 SIP/2.0
...
Call-ID: 028a89a8b78e45cab789463975283479
CSeq: 2 MESSAGE
User-Agent: LCC/1.3
...
Hi
------------- NEXT MESSAGE: outgoing SIP at 2011-05-05T19:28:01.662174Z
SIP/2.0 200 OK
...
Call-ID: 028a89a8b78e45cab789463975283479
CSeq: 2 MESSAGE
...
------------- NEXT MESSAGE: outgoing SIP at 2011-05-05T19:28:08.478638Z
MESSAGE sip:mark.tischler@alcatel-lucent.com:57972;maddr=135.3.63.244;transport=tcp;ms-received-cid=71802800 SIP/2.0
...
CSeq: 1 MESSAGE
...
User-Agent: Purple/2.7.11 Sipe/1.11.0 (linux-i386; RTC/2.1)
Call-ID: 028a89a8b78e45cab789463975283479
...
Got it
------------- NEXT MESSAGE: incoming SIP at 2011-05-05T19:28:56.930178Z
BYE sip:Peter.Fales@alcatel-lucent.com:37104;maddr=135.3.63.245;transport=tcp;ms-received-cid=7202E200 SIP/2.0
...
As you can see: the remote client doesn't acknowledge our MESSAGE as it is supposed to do (see RFC3428). I fear that we can only close this report as "Invalid", because the problem is on the remote client/server side. Unless of course someone can point out that there is something wrong with the MESSAGEs created by SIPE that makes OCS2005 clients not respond to them.
So, is there some way to remove the 60 second check added by 370146b? I can understand the argument that says that OCS2005 is broken and therefore they should be the one that gets fixed, but in reality we know that's not going to happen. Unless we can make some change (even a private patch that's not in the official sipe code base) our users are going to be stuck on 1.11.0 forever.
I "worked" around the issue by syncing with mob and then modifying sipe-im.c to comment out the 2nd if block in process_message_timeout:
/*
if (found) {
gchar *alias = sipe_buddy_get_alias(sipe_private, with);
sipe_user_present_message_undelivered(sipe_private, session, -1, -1,
alias ? alias : with,
msg->body);
g_free(alias);
}
*/
Probably not optimal but its been working while I wait for the 2007 upgrade to complete.
commit 5ff8ea5 implements --enable-ocs2005-message-hack.
BTW: it would be nice to also get a log from the OCS2005 client side.
Haven't done a lot of testing here, but so far the -enable-ocs2005-message-hack seems to do the trick.
I'm not sure what you mean by "get a log from the OCS2005 client side." Isn't the log you looked at before from the client? Or, are you asking about something from the Windows client? Are there logs available in Wndows?
The logs were from the SIPE side, so if something is eaten by the OCS server what the OCS2005 client has sent, we won't see it. The FAQ on the wiki shows how to enable logging for the M$ OCS clients: <https://sourceforge.net/apps/mediawiki/sipe/index.php?title=FAQ>
I've collected the Office Communicator log files and since I can't attach them to this bug report, I've put them at http://psfales.freeshell.org/MOC2005.tgz
The process I used was
- Enable all tracing registry keys
- Remove all files from %USERPROFILE%/Tracing
- Start Office Communicator and sign in
- Exchange a few messages with another MOC user
- Exit from MOC
- Create a gzipped tarball with all the log files
Hopefully this has what you need
Thanks. The important log files are LCAPI0.log and LCIMSP0.log.
Looking at the logs both parties were using the OCS2005 client. I actually meant that one of them should be SIPE so that we can also what happens on the OCS2005 client side when it receives a MESSAGE from the SIPE user.
But keep these logs so that we have a reference for the OCS2005/OCS2005 case too.
I've added a new set of logs at http://psfales.freeshell.org/MOC-to-SIPE.tgz
Test scenario was similar to the last one, except that the person on the far end was running pidgin+sipe
Do you happen to have the --debug log from the SIPE side of this conversation too?
LCAPI0.log shows:
14:03:42.951 E7C:450 INFO :: Data Received - 135.3.63.19:5060 (To Local Address: 135.185.237.144:1594):
14:03:42.951 E7C:450 INFO :: MESSAGE sip:peter.fales@...
...
CSeq: 1 MESSAGE
User-Agent: Purple/1.7.0
Call-ID: facc7e61633843abb9ecbf3da376cfb7
...
<<<<<hello there pete>>>>>
...
14:03:42.951 E7C:450 INFO :: End of Data Received - 135.3.63.19:5060 (To Local Address: 135.185.237.144:1594)
...
14:03:42.951 E7C:450 INFO :: Sending Packet - 135.3.63.19:5060 (From Local Address: 135.185.237.144:1594):
14:03:42.951 E7C:450 INFO :: SIP/2.0 200 OK
Via: SIP/2.0/TCP 135.3.38.50;
...
Call-ID: facc7e61633843abb9ecbf3da376cfb7
CSeq: 1 MESSAGE
...
14:03:42.951 E7C:450 INFO :: End of Sending Packet - 135.3.63.19:5060 (From Local Address: 135.185.237.144:1594)
-> from the OCS2005 client side everything looks OK, i.e. the client DOES acknowledge our MESSAGEs.