Menu

#151 False "could not be delivered" errors

closed-fixed
nobody
5
2011-05-06
2011-04-01
KwikSilvr
No

My company is slowly rolling out OC 2007; right now there are a mix of users (some with 2005, some with 2007).

Ever since commit 370146b any IM conversation that is initiated by a user with OC 2005 results in every reply from myself getting a "This message was not delivered to <user> because one or more recipients are offline". If the other user has OC 2007 I do not see this. If I initiate the conversation I do not see this (no matter what client is on the other end).

Discussion

1 2 > >> (Page 1 of 2)
  • KwikSilvr

    KwikSilvr - 2011-04-14

    Example conversation

     
  • KwikSilvr

    KwikSilvr - 2011-04-14

    Debug log

     
  • Peter Fales

    Peter Fales - 2011-04-18

    This sounds like the same thing we're seeing here. I'm not sure exactly when the problem was introduced, but we were running on old build until recently when we updated to a new version. Now, I'm seeing this error quite regularly. Our users (that aren't using pidgin) are using OC 2005, and I typically see these problem when someone iniitiates the conversation to me. (I didn't realize until I read this report that it made a difference who initiates the conversation but I just tried a quick test and that seemed to be consistent with this bug too. I was in a conversation where I'm seeing the problem, then I disconnected and initiated the conversation myself and the problem went away.)

    For what it's worth, the "message was not delivered" error is not printed until quite some time (30 seconds or more) after I've sent the message. I can tell from the other person's response that they are seeing and responding to my message immediately, and then some time later the error message is printed.

     
  • Peter Fales

    Peter Fales - 2011-04-22

    This continues to be an annoyance, so I'm wondering if there is any chance of getting a fix or workaround. I'll be happy to supply debug logs, or whatever additional information would be helpful.

     
  • Stefan Becker

    Stefan Becker - 2011-05-05

    Sorry, missed the fact that a debug log had already been added to this report.

    The log shows for the example conversation (Call-ID: d60fd4cf76...) that all our MESSAGEs are left unacknowledged by OCS. Therefore the new 60 second timeout error message will be triggered for all of them.

    It could be that this was always the behaviour for OCS 2005, but until now it was never noticed. Could you try the last release version 1.11.2? Does OCS 2005 acknowledge MESSAGEs with that version? If yes, could you then run "git bisect" to figure out the commit that broke MESSAGE generation for OCS 2005?

     
  • Peter Fales

    Peter Fales - 2011-05-05

    When I added the comment on 4/18 the problem was still occurring with the very latest snapshot from git (which called itself 1.11.2). I think there have been few, if any, changes since then.

    Unfortunately, it's probably not practical for me to identify the exact change that introduced the problem (a combination of lack of knowledge of git, and the difficulty of building, deploying, and testing multiple versions to isolate the problem). However, the original bug report seemed to think the problem was introduced by "commit 370146b" - does that mean anything?

     
  • Stefan Becker

    Stefan Becker - 2011-05-05

    As I stated in my previous response: according to the debug log OCS2005 clients don't acknowledge our MESSAGEs like they are supposed to do. The commit 370146b introduced a 60 seconds message timeout to ensure that users get notified when their message aren't delivered in the case OCS hasn't recognized that the remote party went offline. So this commit doesn't cause the incorrect behaviour of OCS2005 clients, it just makes it visible.

    Unless someone with access to OCS2005 clients can prove that in an older SIPE version OS2005 clients acknowledge to MESSAGEs, i.e. something new in SIPE causes them to no longer respond correctly, then we can only close this error as "Won't fix"/"Invalid".

    You don't have to install multiple version. Just the copy on your local machine needs updated, so that you can run tests against an OCS2005 client (i.e. when OCS2005 stop to respond to MESSAGEs). "git bisect" is just a built-in binary search tool. If you have a git checkout from the standard branch "mob" you just follow this approach:

    $ git reset --hard 1.11.2
    # Build & Test that the older version works OK

    # mob HEAD is bad, current HEAD is OK
    $ git bisect start origin/mob HEAD
    Bisecting: 146 revisions left to test after this (roughly 7 steps)
    [41662784f650545963ccc6412751cdd1382b3e50] add "603 Decline" response to undelivered message handling
    # Build & test
    # Test failed:
    $ git bisect bad
    Bisecting: 72 revisions left to test after this (roughly 6 steps)
    [5d03c3a98159b7edbb04cc5fdcd665bbffca22f8] session: remove is_multiparty and focus_uri fields
    # Test OK:
    $ git bisect good
    Bisecting: 36 revisions left to test after this (roughly 5 steps)
    [0f9fb48d88af57f80804b6799375dd5531da072d] l10n: Updated Polish (pl) translation to 100%
    # rinse & repeat...

    # Go back to mob HEAD
    $ git bisect reset
    $ git merge origin/mob
    Updating f87e2b6..e7fa5d3
    Fast-forward
    ...

     
  • Peter Fales

    Peter Fales - 2011-05-05

    I'm currently running on a build from last August (1.11.0) because it works with OCS2005 and doesn't demonstrate the problem. Would that log be useful to you?

     
  • Stefan Becker

    Stefan Becker - 2011-05-05

    It would show if OCS2005 clients always had that problem.

     
  • Peter Fales

    Peter Fales - 2011-05-05

    I've collected a debug log that shows the working scenario using the 1.11.0 build. However, I can't seem to attach it to this bug report. (I seem to recall that we ran into this before - I think maybe we decided that only the originator of the bug can attach files.) Any suggestions? I could mail it to you. Or put it on my own web server. Or I guess I could created another bug report.

     
  • Stefan Becker

    Stefan Becker - 2011-05-05

    pastebin is fine for me.

     
  • Peter Fales

    Peter Fales - 2011-05-05

    I've never used pastebin. I tried to create an account and "paste" in the log file, but got "You have exceeded the maximum file size of 500 kilobytes per paste. PRO users don't have this limit!"

    I've place a copy at http://psfales.freeshell.org/pidgin.log-2011-05-05 - Can you use that?

    The scenario is

    - Start pidign (the plugin is 1.11.0 from last August)
    - autologin to OCS 2005
    - sip:mark.tischler@alcatel-lucent.com initiates a conversation and says "Hi"
    - I respond with "Got it"
    - Wait about two minutes (no errors reported, he reports by phone that he got the message)
    - Terminate pidgin

     
  • Stefan Becker

    Stefan Becker - 2011-05-06

    Same problem there: OCS2005 client doesn't acknowledge our MESSAGEs. Excerpt from your log (after running contrib/debug/parse_log.pl on it):

    ------------- NEXT MESSAGE: incoming SIP at 2011-05-05T19:28:01.537071Z
    MESSAGE sip:Peter.Fales@alcatel-lucent.com:37104;maddr=135.3.63.245;transport=tcp;ms-received-cid=7202E200 SIP/2.0
    ...
    Call-ID: 028a89a8b78e45cab789463975283479
    CSeq: 2 MESSAGE
    User-Agent: LCC/1.3
    ...
    Hi
    ------------- NEXT MESSAGE: outgoing SIP at 2011-05-05T19:28:01.662174Z
    SIP/2.0 200 OK
    ...
    Call-ID: 028a89a8b78e45cab789463975283479
    CSeq: 2 MESSAGE
    ...
    ------------- NEXT MESSAGE: outgoing SIP at 2011-05-05T19:28:08.478638Z
    MESSAGE sip:mark.tischler@alcatel-lucent.com:57972;maddr=135.3.63.244;transport=tcp;ms-received-cid=71802800 SIP/2.0
    ...
    CSeq: 1 MESSAGE
    ...
    User-Agent: Purple/2.7.11 Sipe/1.11.0 (linux-i386; RTC/2.1)
    Call-ID: 028a89a8b78e45cab789463975283479
    ...
    Got it
    ------------- NEXT MESSAGE: incoming SIP at 2011-05-05T19:28:56.930178Z
    BYE sip:Peter.Fales@alcatel-lucent.com:37104;maddr=135.3.63.245;transport=tcp;ms-received-cid=7202E200 SIP/2.0
    ...

    As you can see: the remote client doesn't acknowledge our MESSAGE as it is supposed to do (see RFC3428). I fear that we can only close this report as "Invalid", because the problem is on the remote client/server side. Unless of course someone can point out that there is something wrong with the MESSAGEs created by SIPE that makes OCS2005 clients not respond to them.

     
  • Peter Fales

    Peter Fales - 2011-05-06

    So, is there some way to remove the 60 second check added by 370146b? I can understand the argument that says that OCS2005 is broken and therefore they should be the one that gets fixed, but in reality we know that's not going to happen. Unless we can make some change (even a private patch that's not in the official sipe code base) our users are going to be stuck on 1.11.0 forever.

     
  • KwikSilvr

    KwikSilvr - 2011-05-06

    I "worked" around the issue by syncing with mob and then modifying sipe-im.c to comment out the 2nd if block in process_message_timeout:

    /*
    if (found) {
    gchar *alias = sipe_buddy_get_alias(sipe_private, with);
    sipe_user_present_message_undelivered(sipe_private, session, -1, -1,
    alias ? alias : with,
    msg->body);
    g_free(alias);
    }
    */

    Probably not optimal but its been working while I wait for the 2007 upgrade to complete.

     
  • Stefan Becker

    Stefan Becker - 2011-05-06

    commit 5ff8ea5 implements --enable-ocs2005-message-hack.

     
  • Stefan Becker

    Stefan Becker - 2011-05-06
    • status: open --> open-fixed
     
  • Stefan Becker

    Stefan Becker - 2011-05-06
    • status: open-fixed --> closed-fixed
     
  • Stefan Becker

    Stefan Becker - 2011-05-06

    BTW: it would be nice to also get a log from the OCS2005 client side.

     
  • Peter Fales

    Peter Fales - 2011-05-09

    Haven't done a lot of testing here, but so far the -enable-ocs2005-message-hack seems to do the trick.

    I'm not sure what you mean by "get a log from the OCS2005 client side." Isn't the log you looked at before from the client? Or, are you asking about something from the Windows client? Are there logs available in Wndows?

     
  • Stefan Becker

    Stefan Becker - 2011-05-09

    The logs were from the SIPE side, so if something is eaten by the OCS server what the OCS2005 client has sent, we won't see it. The FAQ on the wiki shows how to enable logging for the M$ OCS clients: <https://sourceforge.net/apps/mediawiki/sipe/index.php?title=FAQ>

     
  • Peter Fales

    Peter Fales - 2011-05-11

    I've collected the Office Communicator log files and since I can't attach them to this bug report, I've put them at http://psfales.freeshell.org/MOC2005.tgz

    The process I used was
    - Enable all tracing registry keys
    - Remove all files from %USERPROFILE%/Tracing
    - Start Office Communicator and sign in
    - Exchange a few messages with another MOC user
    - Exit from MOC
    - Create a gzipped tarball with all the log files

    Hopefully this has what you need

     
  • Stefan Becker

    Stefan Becker - 2011-05-11

    Thanks. The important log files are LCAPI0.log and LCIMSP0.log.

    Looking at the logs both parties were using the OCS2005 client. I actually meant that one of them should be SIPE so that we can also what happens on the OCS2005 client side when it receives a MESSAGE from the SIPE user.

    But keep these logs so that we have a reference for the OCS2005/OCS2005 case too.

     
  • Peter Fales

    Peter Fales - 2011-05-12

    I've added a new set of logs at http://psfales.freeshell.org/MOC-to-SIPE.tgz

    Test scenario was similar to the last one, except that the person on the far end was running pidgin+sipe

     
  • Stefan Becker

    Stefan Becker - 2011-05-12

    Do you happen to have the --debug log from the SIPE side of this conversation too?

    LCAPI0.log shows:

    14:03:42.951 E7C:450 INFO :: Data Received - 135.3.63.19:5060 (To Local Address: 135.185.237.144:1594):
    14:03:42.951 E7C:450 INFO :: MESSAGE sip:peter.fales@...
    ...
    CSeq: 1 MESSAGE

    User-Agent: Purple/1.7.0

    Call-ID: facc7e61633843abb9ecbf3da376cfb7

    ...
    <<<<<hello there pete>>>>>
    ...
    14:03:42.951 E7C:450 INFO :: End of Data Received - 135.3.63.19:5060 (To Local Address: 135.185.237.144:1594)
    ...
    14:03:42.951 E7C:450 INFO :: Sending Packet - 135.3.63.19:5060 (From Local Address: 135.185.237.144:1594):
    14:03:42.951 E7C:450 INFO :: SIP/2.0 200 OK

    Via: SIP/2.0/TCP 135.3.38.50;
    ...
    Call-ID: facc7e61633843abb9ecbf3da376cfb7

    CSeq: 1 MESSAGE

    ...
    14:03:42.951 E7C:450 INFO :: End of Sending Packet - 135.3.63.19:5060 (From Local Address: 135.185.237.144:1594)

    -> from the OCS2005 client side everything looks OK, i.e. the client DOES acknowledge our MESSAGEs.

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.