Discussion:
mutt/Exchange imap timeouts; patch for mutt_wait_filter()
John Hawkinson
2018-09-15 15:23:55 UTC
Permalink
Good morning, mutt-dev:

At the beginning of this year, our Exchange servers were updated to Exchange 2013 (15.0.1365.7), at which point I and other MIT mutt users began to suffer greatly from mutt connections to the Exchange IMAP server timing out after 5 minutes if mutt was anywhere other than the index (e.g. reading or composing a message).

This was extremely painful, and I'm super-embarassed to report that at the time, although I spent a while mucking with the "timeout" and "mail_check" parameters to no avail, somehow the "imap_keepalive" parameter escaped my notice. Reducing it from it's 300-second default to 60-seconds seems to solve the vast majority of those problems, and I am kicking myself for suffering through months of aborted connections and lost deletion state unnecessarily.

I wonder if there's any appetite for reducing the default "imap_keepalive," either uncondionally all the time (I'm not sure whether our Exchange short timeout is a site-specific configuration headache or a standard one, and I also haven't experimeted to determine the largest timeout that safely works), or adaptively in response to servers identifying themselves as:

6< * OK The Microsoft Exchange IMAP4 service is ready.
6> a0000 CAPABILITY
6< * CAPABILITY IMAP4 IMAP4rev1 AUTH=PLAIN AUTH=NTLM AUTH=GSSAPI STARTTLS UIDPLUS CHILDREN IDLE NAMESPACE LITERAL+
6< a0000 OK CAPABILITY completed.


In any case, defaults aside, it appears there are a few other places where IMAP keepalives aren't being sent. The primary one that I hit is when an external interactive viewer is launched (e.g. w3m for html email), mutt calls mutt_wait_filter() on the pid, which calls waitpid() and blocks. But instead it could call imap_wait_keepalive() on the pid, just as _mutt_system() does. This appears to solve the problem of dieing connections during external viewers. A patch for this is attached.

It does appear there are a handful of other places where waitpid() is called directly that could potentially be problems: (1) In mutt_tunnel.c; I don't use this feature, so I'm loathe to touch it. (2) In sendlib.c; as long as sending mail doesn't take too long, this shouldn't come up, and this code already uses alarm() around wait4pid(), so likely would not play nice with imap_wait_keepalive() which also uses alarm() -- at least not without careful thought.

Anyhow, patch for the simpler mutt_wait_filter() case attached.

p.s.: While running mutt under lldb before I realized "imap_timeout"'s existence, it seemed that mutt's choice to ignore SIGPIPE leads to it failing to report that the IMAP connection had closed and it can sit unresponsive to user input for many seconds, when instead it should be able to report closed connections earlier from SIGPIPE. I did not run this down, and presumably closed connections will be much more rare in my life going forward.


Thanks.

--***@mit.edu
John Hawkinson


From 998aee796ba1a5459e28aea047ff7b5452836e51 Mon Sep 17 00:00:00 2001
From: John Hawkinson <***@mit.edu>
Date: Sat, 15 Sep 2018 00:40:34 -0400
Subject: [PATCH 1/1] mutt_wait_filter(): Use imap_wait_keepalive() not
waitpid()

Otherwise we do not maintain IMAP connections while an external viewer
is open (e.g. HTML email), and servers with short timeouts can close
the connection when we come back.
---
filter.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/filter.c b/filter.c
index 331d5425..67dcf72f 100644
--- a/filter.c
+++ b/filter.c
@@ -23,6 +23,10 @@
#include "mutt.h"
#include "mutt_curses.h"

+#ifdef USE_IMAP
+# include "imap.h"
+#endif
+
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
@@ -183,7 +187,12 @@ int mutt_wait_filter (pid_t pid)
{
int rc;

- waitpid (pid, &rc, 0);
+#ifndef USE_IMAP
+ /* wait for the (first) child process to finish */
+ waitpid (pid, &rc, 0);
+#else
+ rc = imap_wait_keepalive (pid);
+#endif
mutt_unblock_signals_system (1);
rc = WIFEXITED (rc) ? WEXITSTATUS (rc) : -1;
--
2.12.2
Kevin J. McCarthy
2018-09-15 22:51:47 UTC
Permalink
Post by John Hawkinson
I wonder if there's any appetite for reducing the default
"imap_keepalive," either uncondionally all the time [...] or
adaptively in response to servers identifying themselves
I feel your pain, but 5 minutes is way below the RFC specified minimum.
I think likely the admins of the server dropped the timeout value to
reduce load. Sniffing server identifiers and adjusting configs based on
that wouldn't be somewhere I want to take the code.
Post by John Hawkinson
In any case, defaults aside, it appears there are a few other places
where IMAP keepalives aren't being sent. The primary one that I hit is
when an external interactive viewer is launched (e.g. w3m for html
email)
Peeking in the function myself, in the "normal" case where the mailcap
contains a '%s', Mutt calls mutt_system(), which would send keepalives.
However, if the mailcap entry is missing a '%s' filename specifier, I
see Mutt instead pipes the message on stdin uses the filter code.

The patch looks fine, but I wonder if we need to apply this hammer to
every filter. Filters are (supposed to be) short lived and produce
output back into Mutt for display; I think this may be the only case in
the code where we a using it to launch an external program for
interaction.

Perhaps it would be better to just adjust the "is_pipe" case for
mutt_view_attachment(). Please give me a few days to look into that
first.

Thank you,

-Kevin
--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA
John Hawkinson
2018-09-15 23:33:21 UTC
Permalink
Thanks, Kevin.
Post by Kevin J. McCarthy
Post by John Hawkinson
I wonder if there's any appetite for reducing the default
...
Post by Kevin J. McCarthy
I feel your pain, but 5 minutes is way below the RFC specified minimum.
I think likely the admins of the server dropped the timeout value to
reduce load.
After extended conversations they claim not (however they also claim the timeout is 30 minutes, so clearly there are some issues), and of course, it's worth noting that over-aggressive timeouts can have the effect of increasing load, since they force [some] users to poll faster and restart connections. But anyhow, your position is reasonable and it's not watch the patch is about.

(It would be nice to hear from other Exchange 2013 -or-newer mutt users who can state with confidence that they don't have these problems with the default imap_keepalive of 300 seconds aka 5 minutes.)
Post by Kevin J. McCarthy
Peeking in the function myself, in the "normal" case where the mailcap
contains a '%s', Mutt calls mutt_system(), which would send keepalives.
However, if the mailcap entry is missing a '%s' filename specifier, I
see Mutt instead pipes the message on stdin uses the filter code.
It seems to be slightly more than that, like if "use_pager" is true; but this code's logic is sufficiently complex that I might be misreading it. It seems ripe for refactoring, but that's probably a bad idea too.
Post by Kevin J. McCarthy
The patch looks fine, but I wonder if we need to apply this hammer to
every filter. Filters are (supposed to be) short lived and produce
output back into Mutt for display; I think this may be the only case in
the code where we a using it to launch an external program for
interaction.
It seems to me that while filters are supposed to be short-lived, they may not always be, and in the cases where they are short-lived, this won't make much difference. (Counterargument: messing with signals is dicey and should be avoided any more than is strictly necessary).
Post by Kevin J. McCarthy
Perhaps it would be better to just adjust the "is_pipe" case for
mutt_view_attachment(). Please give me a few days to look into that
first.
Err, the "use_pipe" case. I'm not sure how you'd go about doing that without putting low-level signal code around relatively high-level stuff in mutt_view_attachment, or complicating the abstraction further, but I have no objection if you can come up with something better.

Certainly I'm not in any rush -- having suffered for the better part of a year and with a fix for the big problem in my config file and the small problem in my build tree, I'm happy to wait :)

--***@mit.edu
John Hawkinson
Kevin J. McCarthy
2018-09-16 01:33:29 UTC
Permalink
Post by John Hawkinson
Post by Kevin J. McCarthy
Peeking in the function myself, in the "normal" case where the mailcap
contains a '%s', Mutt calls mutt_system(), which would send keepalives.
However, if the mailcap entry is missing a '%s' filename specifier, I
see Mutt instead pipes the message on stdin uses the filter code.
It seems to be slightly more than that, like if "use_pager" is true;
Yup, but the use_pager case is for copiousoutput entries, which are
rendered in the pager. Here I was concerned with external viewers.
Post by John Hawkinson
Post by Kevin J. McCarthy
The patch looks fine, but I wonder if we need to apply this hammer to
every filter. Filters are (supposed to be) short lived and produce
output back into Mutt for display; I think this may be the only case in
the code where we a using it to launch an external program for
interaction.
It seems to me that while filters are supposed to be short-lived, they
may not always be, and in the cases where they are short-lived, this
won't make much difference. (Counterargument: messing with signals is
dicey and should be avoided any more than is strictly necessary).
Yeah, I don't like to add in the extra timeouts all over the place if
it's not really necessary. But maybe you're right, I'll think about it
when I have more time this week.
Post by John Hawkinson
Post by Kevin J. McCarthy
Perhaps it would be better to just adjust the "is_pipe" case for
mutt_view_attachment(). Please give me a few days to look into that
first.
Err, the "use_pipe" case.
Sorry yes, (!use_pager && use_pipe) was what I was thinking. I'm a bit
short on time today, so am just replying quickly. :-)
--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA
Kevin J. McCarthy
2018-09-18 03:01:22 UTC
Permalink
Post by Kevin J. McCarthy
Yeah, I don't like to add in the extra timeouts all over the place if
it's not really necessary. But maybe you're right, I'll think about it
when I have more time this week.
I've pushed up a fix in 4350694b that targets just this one case. Thank
you again for the bug report and patch.
--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA
John Hawkinson
2018-09-20 02:02:05 UTC
Permalink
I encountered another related case today, and even though it touches mutt_wait_filter(), I don't believe my patch would have solved it (at least, I just checked it out and

Open an attachment menu and hit '|' and pipe the attachment to, say
grep FOOBAR
and let mutt sit at the "Press any key to continue..." prompt for more than the IMAP timeout.

IMAP connection times out. I guess the problem is mutt_any_key_to_continue().

I wonder how many of these problems there are :(

--***@mit.edu
John Hawkinson
Post by Kevin J. McCarthy
I've pushed up a fix in 4350694b that targets just this one case. Thank
you again for the bug report and patch.
Kevin J. McCarthy
2018-09-20 16:17:16 UTC
Permalink
Post by John Hawkinson
Open an attachment menu and hit '|' and pipe the attachment to, say
grep FOOBAR
and let mutt sit at the "Press any key to continue..." prompt for more than the IMAP timeout.
IMAP connection times out. I guess the problem is
mutt_any_key_to_continue().
I wonder how many of these problems there are :(
Yes, Mutt does not try to send keepalives everywhere. Transient prompts
are one such case. There are a few others, but I consider these all
minor and don't particularly have the inclination to put keepalive
code in them.
--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA
Loading...