Discussion:
[PATCH] Add ~M pattern to match mime Content-Types.
Ammon Riley
2018-04-26 19:15:18 UTC
Permalink
Hi,

I'm scratching an itch, here. I want to limit email messages to those with
pdf attachments, or text/calendar, or images, etc. Is a merge request
on gitlab preferable to a patch like this?

Cheers,
Ammon

---
doc/manual.xml.head | 2 ++
doc/muttrc.man.head | 6 ++++++
mutt.h | 1 +
pattern.c | 30 +++++++++++++++++++++++++++++-
4 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/doc/manual.xml.head b/doc/manual.xml.head
index 0188803..3596c1b 100644
--- a/doc/manual.xml.head
+++ b/doc/manual.xml.head
@@ -5199,6 +5199,8 @@ shows several ways to select messages.
<row><entry>%L <emphasis>GROUP</emphasis></entry><entry>message either
originated or received by any member of
<emphasis>GROUP</emphasis></entry></row>
<row><entry>~l</entry><entry>messages addressed to a known mailing
list</entry></row>
<row><entry>~m
[<emphasis>MIN</emphasis>]-[<emphasis>MAX</emphasis>]</entry><entry>messages
in the range <emphasis>MIN</emphasis> to <emphasis>MAX</emphasis>
*)</entry></row>
+<row><entry>~M <emphasis>EXPR</emphasis></entry><entry>messages which
contain a mime Content-Type matching <emphasis>EXPR</emphasis></entry></row>
+<row><entry>=M <emphasis>STRING</emphasis></entry><entry>messages which
contain a mime Content-Type containing
<emphasis>STRING</emphasis></entry></row>
<row><entry>~n
[<emphasis>MIN</emphasis>]-[<emphasis>MAX</emphasis>]</entry><entry>messages
with a score in the range <emphasis>MIN</emphasis> to
<emphasis>MAX</emphasis> *)</entry></row>
<row><entry>~N</entry><entry>new messages</entry></row>
<row><entry>~O</entry><entry>old messages</entry></row>
diff --git a/doc/muttrc.man.head b/doc/muttrc.man.head
index 28591f0..b72e31d 100644
--- a/doc/muttrc.man.head
+++ b/doc/muttrc.man.head
@@ -575,6 +575,12 @@ messages either originated or received by any member
of \fIGROUP\fP
~m \fIMIN\fP-\fIMAX\fP
message in the range \fIMIN\fP to \fIMAX\fP
.TP
+~M \fIEXPR\fP
+messages which contain a mime Content-Type matching \fIEXPR\fP
+.TP
+=M \fISTRING\fP
+messages which contain a mime Content-Type containing \fISTRING\fP
+.TP
~n \fIMIN\fP-\fIMAX\fP
messages with a score in the range \fIMIN\fP to \fIMAX\fP
.TP
diff --git a/mutt.h b/mutt.h
index 4fe7ce4..7c5bec2 100644
--- a/mutt.h
+++ b/mutt.h
@@ -253,6 +253,7 @@ enum
MUTT_PGP_KEY,
MUTT_XLABEL,
MUTT_MIMEATTACH,
+ MUTT_MIMETYPE,

/* Options for Mailcap lookup */
MUTT_EDIT,
diff --git a/pattern.c b/pattern.c
index 285cbfa..2c70d5c 100644
--- a/pattern.c
+++ b/pattern.c
@@ -25,6 +25,7 @@
#include "keymap.h"
#include "mailbox.h"
#include "copy.h"
+#include "mime.h"

#include <string.h>
#include <stdlib.h>
@@ -76,6 +77,7 @@ Flags[] =
{ 'l', MUTT_LIST, 0, NULL },
{ 'L', MUTT_ADDRESS, 0, eat_regexp },
{ 'm', MUTT_MESSAGE, 0, eat_range },
+ { 'M', MUTT_MIMETYPE, MUTT_FULL_MSG, eat_regexp },
{ 'n', MUTT_SCORE, 0, eat_range },
{ 'N', MUTT_NEW, 0, NULL },
{ 'O', MUTT_OLD, 0, NULL },
@@ -1138,6 +1140,29 @@ static int match_threadchildren(struct pattern_t
*pat, pattern_exec_flag flags,
return 0;
}

+static int match_content_type(const pattern_t* pat, BODY *b)
+{
+ char buffer[STRING];
+ if (b == 0)
+ return 0;
+
+ if (snprintf(buffer, STRING, "%s/%s", TYPE (b), b->subtype) >= STRING)
+ buffer[STRING-1] = '\0';
+
+ if (b->subtype != 0 && (patmatch (pat, buffer) == 0))
+ return 1;
+ if (match_content_type (pat, b->parts))
+ return 1;
+ if (match_content_type (pat, b->next))
+ return 1;
+ return 0;
+}
+
+static int match_mime_content_type(const pattern_t *pat, CONTEXT *ctx,
HEADER *hdr)
+{
+ mutt_parse_mime_message(ctx, hdr);
+ return match_content_type(pat, hdr->content);
+}

/* Sets a value in the pattern_cache_t cache entry.
* Normalizes the "true" value to 2. */
@@ -1158,7 +1183,6 @@ static int is_pattern_cache_set (int cache_entry)
return cache_entry != 0;
}

-
/*
* flags: MUTT_MATCH_FULL_ADDRESS - match both personal and machine address
* cache: For repeated matches against the same HEADER, passing in
non-NULL will
@@ -1338,6 +1362,10 @@ mutt_pattern_exec (struct pattern_t *pat,
pattern_exec_flag flags, CONTEXT *ctx,
return (pat->not ^ (count >= pat->min && (pat->max == MUTT_MAXRANGE
||
count <= pat->max)));
}
+ case MUTT_MIMETYPE:
+ if (!ctx)
+ return 0;
+ return (pat->not ^ match_mime_content_type (pat, ctx, h));
case MUTT_UNREFERENCED:
return (pat->not ^ (h->thread && !h->thread->child));
}
--
2.8.0.rc3
Kevin J. McCarthy
2018-04-26 21:30:37 UTC
Permalink
Post by Ammon Riley
I'm scratching an itch, here. I want to limit email messages to those with
pdf attachments, or text/calendar, or images, etc. Is a merge request
on gitlab preferable to a patch like this?
Here is fine. I think it gives more people a change to look at it.
Would you mind resending the patch as an attachment, I'm having trouble
applying it because of line wrapping.
--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA
Ammon Riley
2018-04-26 21:59:33 UTC
Permalink
See attached!
Post by Kevin J. McCarthy
Post by Ammon Riley
I'm scratching an itch, here. I want to limit email messages to those
with
Post by Ammon Riley
pdf attachments, or text/calendar, or images, etc. Is a merge request
on gitlab preferable to a patch like this?
Here is fine. I think it gives more people a change to look at it.
Would you mind resending the patch as an attachment, I'm having trouble
applying it because of line wrapping.
--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA
Kevin J. McCarthy
2018-04-26 23:23:07 UTC
Permalink
Post by Ammon Riley
See attached!
I have few minor comments below. Otherwise it seems okay.

It seems like a useful addition, but I'd like to hear from others before
using one of our dwindling pattern letters. Does anyone else have
feedback for/against?
Post by Ammon Riley
diff --git a/doc/manual.xml.head b/doc/manual.xml.head
+<row><entry>~M <emphasis>EXPR</emphasis></entry><entry>messages which contain a mime Content-Type matching <emphasis>EXPR</emphasis></entry></row>
+<row><entry>=M <emphasis>STRING</emphasis></entry><entry>messages
which contain a mime Content-Type containing
<emphasis>STRING</emphasis></entry></row>
There is no need to add '=M' documentation. This is covered below in
the documentation where it says:
You can force Mutt to treat EXPR as a simple string instead of a
regular expression by using = instead of ~ in the pattern name."

The =b/=B/=h are explicity mentioned because of their IMAP behavior.
Post by Ammon Riley
diff --git a/doc/muttrc.man.head b/doc/muttrc.man.head
+=M \fISTRING\fP
+messages which contain a mime Content-Type containing \fISTRING\fP
ditto
Post by Ammon Riley
diff --git a/pattern.c b/pattern.c
+static int match_content_type(const pattern_t* pat, BODY *b)
+{
+ char buffer[STRING];
+ if (b == 0)
+ return 0;
We more commonly compare to NULL or just !b in the mutt source code.
Post by Ammon Riley
+
+ if (snprintf(buffer, STRING, "%s/%s", TYPE (b), b->subtype) >= STRING)
+ buffer[STRING-1] = '\0';
+
+ if (b->subtype != 0 && (patmatch (pat, buffer) == 0))
+ return 1;
I don't believe subtype should be NULL. Is the check necessary?
Post by Ammon Riley
+ if (match_content_type (pat, b->parts))
+ return 1;
+ if (match_content_type (pat, b->next))
+ return 1;
+ return 0;
+}
+
--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA
Ammon Riley
2018-04-27 00:14:27 UTC
Permalink
Post by Kevin J. McCarthy
Post by Ammon Riley
diff --git a/doc/manual.xml.head b/doc/manual.xml.head
+<row><entry>~M <emphasis>EXPR</emphasis></entry><entry>messages which contain a mime Content-Type matching <emphasis>EXPR</emphasis></entry></row>
+<row><entry>=M <emphasis>STRING</emphasis></entry><entry>messages
which contain a mime Content-Type containing
<emphasis>STRING</emphasis></entry></row>
There is no need to add '=M' documentation. This is covered below in
You can force Mutt to treat EXPR as a simple string instead of a
regular expression by using = instead of ~ in the pattern name."
The =b/=B/=h are explicity mentioned because of their IMAP behavior.
I did copy the =b/=B. I hadn't considered IMAP for this feature, as I'm
not using it. Since we have to parse the message to match content-type,
how would this behave under IMAP? Would it work on the server, or
does it have to be local? If it can work on the server, then perhaps I
should distinguish that -- I can imagine an IMAP user might not want to
download large PDF-containing messages while performing this limit.

I'll remove the =M lines for now.
Post by Kevin J. McCarthy
Post by Ammon Riley
diff --git a/pattern.c b/pattern.c
+static int match_content_type(const pattern_t* pat, BODY *b)
+{
+ char buffer[STRING];
+ if (b == 0)
+ return 0;
We more commonly compare to NULL or just !b in the mutt source code.
Fixed.
Post by Kevin J. McCarthy
Post by Ammon Riley
+ if (snprintf(buffer, STRING, "%s/%s", TYPE (b), b->subtype) >= STRING)
+ buffer[STRING-1] = '\0';
+
+ if (b->subtype != 0 && (patmatch (pat, buffer) == 0))
+ return 1;
I don't believe subtype should be NULL. Is the check necessary?
Looks like mutt_parse_content_type() always sets subtype to
a non-NULL value. Check has been dropped (and it was
in the wrong place anyway, since by this point it's baked
into buffer).

Updated patch attached.

A
Kevin J. McCarthy
2018-04-27 00:34:16 UTC
Permalink
Post by Ammon Riley
Post by Kevin J. McCarthy
The =b/=B/=h are explicity mentioned because of their IMAP behavior.
I did copy the =b/=B. I hadn't considered IMAP for this feature, as I'm
not using it. Since we have to parse the message to match content-type,
how would this behave under IMAP? Would it work on the server, or
does it have to be local? If it can work on the server, then perhaps I
should distinguish that -- I can imagine an IMAP user might not want to
download large PDF-containing messages while performing this limit.
It will download the message. I'll have to check myself if there is
some way to do it server-side.
Post by Ammon Riley
Updated patch attached.
Your new patch was too fast. :-) I realized I forgot to include one
other comment, below. I have to run, but I'll take another closer look
at the revised patch later tonight.
Post by Ammon Riley
diff --git a/pattern.c b/pattern.c
+static int match_content_type(const pattern_t* pat, BODY *b)
+{
+ char buffer[STRING];
+ if (!b)
+ return 0;
+
+ if (snprintf(buffer, STRING, "%s/%s", TYPE (b), b->subtype) >= STRING)
+ buffer[STRING-1] = '\0';
snprintf (unlike strncpy) will always add the terminating null byte.
--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA
Ammon Riley
2018-04-27 01:00:00 UTC
Permalink
Post by Kevin J. McCarthy
Your new patch was too fast. :-) I realized I forgot to include one
other comment, below. I have to run, but I'll take another closer look
at the revised patch later tonight.
3rd time's the charm, right?
Post by Kevin J. McCarthy
snprintf (unlike strncpy) will always add the terminating null byte.
Fixed. I should have remembered that, as I've had to deal with
the nonsense of Microsoft's _snprintf() *not* doing it.

A
Kevin J. McCarthy
2018-04-27 02:57:00 UTC
Permalink
Post by Ammon Riley
3rd time's the charm, right?
I think so - this version looks good. I'll give a few days for others
to chime in.

There don't appear to be any applicable IMAP SEARCH extensions for this
pattern. I suppose we could fetch BODYSTRUCTURE, and use that, but
since we don't currently fetch/parse that in Mutt, it would be a large
amount of work for a one-off search feature. I don't know that it's
worth it.

The code is a bit funny about subtype. There *are* other checks for
NULL in a few places, but not uniformly. So far, I can't find a path
that would lead to an unset subtype. It may be that at one time, all
the paths weren't covered. Or, of course, it may be that I'm missing
something! However, I really dislike voodoo-checks, so let's leave it
out.
--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA
Ammon Riley
2018-04-27 18:05:28 UTC
Permalink
Post by Kevin J. McCarthy
Post by Ammon Riley
3rd time's the charm, right?
I think so - this version looks good. I'll give a few days for others
to chime in.
There don't appear to be any applicable IMAP SEARCH extensions for this
pattern. I suppose we could fetch BODYSTRUCTURE, and use that, but
since we don't currently fetch/parse that in Mutt, it would be a large
amount of work for a one-off search feature. I don't know that it's
worth it.
I see that imap/TODO mentions using BODYSTRUCTURE in the
view-attachments menu. So it seems like if the work were done, it
could be used in more than one place.

However, looking at imap_read_headers(), I see that it fetches the
Content-Type header. There's a comment near the bottom that
reads "content built as a side-effect of mutt_read_rfc822_header",
which does indeed parse Content-Type headers, and builds the
part structure. It's not clear to me if the IMAP server returns *all*
of the Content-Type headers. If so, it seems like we already have
all of the same info that requesting BODYSTRUCTURE would
give us, and ~M won't pull the full body of the email locally.
Perhaps someone can clarify that? If I get the chance, I'll
try and configure mutt to access an IMAP server and check...

A
Kevin J. McCarthy
2018-04-27 19:09:32 UTC
Permalink
Post by Ammon Riley
Post by Kevin J. McCarthy
There don't appear to be any applicable IMAP SEARCH extensions for this
pattern. I suppose we could fetch BODYSTRUCTURE, and use that, but
since we don't currently fetch/parse that in Mutt, it would be a large
amount of work for a one-off search feature. I don't know that it's
worth it.
I see that imap/TODO mentions using BODYSTRUCTURE in the
view-attachments menu. So it seems like if the work were done, it
could be used in more than one place.
It could be. One of the limitations of Mutt is that it downloads
everything when pulling a message down. That's on the TODO list because
it could theoretically allow attachments on demand. However, this is
much more complicated then it appears because of a variety of issues
with how Mutt works.
Post by Ammon Riley
However, looking at imap_read_headers(), I see that it fetches the
Content-Type header. There's a comment near the bottom that
reads "content built as a side-effect of mutt_read_rfc822_header",
which does indeed parse Content-Type headers, and builds the
part structure. It's not clear to me if the IMAP server returns *all*
of the Content-Type headers.
No, mutt_read_rfc822_header() is dealing with the email header. The
IMAP server is only returning a content-type for the email header, not
the various parts.
--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA
Ammon Riley
2018-04-27 19:13:54 UTC
Permalink
Post by Kevin J. McCarthy
Post by Ammon Riley
However, looking at imap_read_headers(), I see that it fetches the
Content-Type header. There's a comment near the bottom that
reads "content built as a side-effect of mutt_read_rfc822_header",
which does indeed parse Content-Type headers, and builds the
part structure. It's not clear to me if the IMAP server returns *all*
of the Content-Type headers.
No, mutt_read_rfc822_header() is dealing with the email header. The
IMAP server is only returning a content-type for the email header, not
the various parts.
Then in that case, I'll amend the documentation for ~M to specify that
the search is done locally, so that IMAP users know it could potentially
be expensive.

A
Kevin J. McCarthy
2018-04-27 19:18:20 UTC
Permalink
Post by Ammon Riley
Post by Kevin J. McCarthy
Post by Ammon Riley
However, looking at imap_read_headers(), I see that it fetches the
Content-Type header. There's a comment near the bottom that
reads "content built as a side-effect of mutt_read_rfc822_header",
which does indeed parse Content-Type headers, and builds the
part structure. It's not clear to me if the IMAP server returns *all*
of the Content-Type headers.
No, mutt_read_rfc822_header() is dealing with the email header. The
IMAP server is only returning a content-type for the email header, not
the various parts.
Then in that case, I'll amend the documentation for ~M to specify that
the search is done locally, so that IMAP users know it could potentially
be expensive.
I was thinking about it a bit, and I think it would be good to label
the slower patterns with "***)" and add an explanation below.

However, if you'd like I'll be glad to do that after pushing your commit
up.
--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA
Ammon Riley
2018-04-27 19:57:01 UTC
Permalink
Post by Kevin J. McCarthy
Post by Ammon Riley
Post by Kevin J. McCarthy
Post by Ammon Riley
However, looking at imap_read_headers(), I see that it fetches the
Content-Type header. There's a comment near the bottom that
reads "content built as a side-effect of mutt_read_rfc822_header",
which does indeed parse Content-Type headers, and builds the
part structure. It's not clear to me if the IMAP server returns *all*
of the Content-Type headers.
No, mutt_read_rfc822_header() is dealing with the email header. The
IMAP server is only returning a content-type for the email header, not
the various parts.
Then in that case, I'll amend the documentation for ~M to specify that
the search is done locally, so that IMAP users know it could potentially
be expensive.
I was thinking about it a bit, and I think it would be good to label
the slower patterns with "***)" and add an explanation below.
However, if you'd like I'll be glad to do that after pushing your commit
up.
That sounds better. I'll let you do that, as I don't know off the top of
my head which are slow. :)

Thanks!

A
Omen Wild
2018-04-27 20:17:32 UTC
Permalink
Post by Kevin J. McCarthy
I was thinking about it a bit, and I think it would be good to label
the slower patterns with "***)" and add an explanation below.
One thing I've wondered, but have not read the code to verify, are the
operations run in descending order by speed? As in, for '~bfoo ~sbar' run
the subject match first as it's likely to be faster and will limit how
many bodies need to be searched.

Thanks
--
Your mission (should you decide to accept it) is to make CMS look like UNIX.
Kevin J. McCarthy
2018-04-27 20:37:40 UTC
Permalink
Post by Omen Wild
One thing I've wondered, but have not read the code to verify, are the
operations run in descending order by speed? As in, for '~bfoo ~sbar' run
the subject match first as it's likely to be faster and will limit how
many bodies need to be searched.
No, they are performed in the order you specify. It would be best to
put the ~sbar before the ~bfoo.
--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA
Kevin J. McCarthy
2018-05-03 17:21:08 UTC
Permalink
Post by Kevin J. McCarthy
Post by Ammon Riley
3rd time's the charm, right?
I think so - this version looks good. I'll give a few days for others
to chime in.
I pushed this up a couple days ago, but forgot to email the list.
Thanks again for the patch.
--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA
Ammon Riley
2018-05-03 17:42:07 UTC
Permalink
Post by Kevin J. McCarthy
Post by Kevin J. McCarthy
Post by Ammon Riley
3rd time's the charm, right?
I think so - this version looks good. I'll give a few days for others
to chime in.
I pushed this up a couple days ago, but forgot to email the list.
Thanks again for the patch.
You're more than welcome -- I've been using mutt for over 15 years, so
really, thank *you*!

A

Derek Martin
2018-05-01 16:59:27 UTC
Permalink
Post by Kevin J. McCarthy
Post by Ammon Riley
Post by Kevin J. McCarthy
The =b/=B/=h are explicity mentioned because of their IMAP behavior.
I did copy the =b/=B. I hadn't considered IMAP for this feature, as I'm
not using it. Since we have to parse the message to match content-type,
how would this behave under IMAP? Would it work on the server, or
does it have to be local? If it can work on the server, then perhaps I
should distinguish that -- I can imagine an IMAP user might not want to
download large PDF-containing messages while performing this limit.
It will download the message. I'll have to check myself if there is
some way to do it server-side.
Post by Ammon Riley
Updated patch attached.
Your new patch was too fast. :-) I realized I forgot to include one
other comment, below. I have to run, but I'll take another closer look
at the revised patch later tonight.
Post by Ammon Riley
diff --git a/pattern.c b/pattern.c
+static int match_content_type(const pattern_t* pat, BODY *b)
+{
+ char buffer[STRING];
+ if (!b)
+ return 0;
+
+ if (snprintf(buffer, STRING, "%s/%s", TYPE (b), b->subtype) >= STRING)
+ buffer[STRING-1] = '\0';
snprintf (unlike strncpy) will always add the terminating null byte.
Indeed. And snprintf() is generally better, but you MUST check the
return code to see that whatever you're snprintf()ing actually fit in
the buffer, and do something appropriate if it did not.

FWIW, I've said this before, but I think anywhere strncpy() is used,
code that does the above should replace it. This avoids silent
truncation (which strncpy() does), and checking the return is required
to make sure snprintf() actually did something, and you're not
operating on uninitialized memory (or whatever).
--
Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address. Replying to it will result in
undeliverable mail due to spam prevention. Sorry for the inconvenience.
Derek Martin
2018-05-01 17:54:12 UTC
Permalink
Post by Derek Martin
FWIW, I've said this before, but I think anywhere strncpy() is used,
code that does the above should replace it. This avoids silent
truncation (which strncpy() does), and checking the return is required
to make sure snprintf() actually did something, and you're not
operating on uninitialized memory (or whatever).
Sorry, this was badly worded. Checking the return value is required
to ensure that snprintf() wrote all the data (rc < size, NOT <= size),
assuming you want that. If you actually want it to silently truncate,
which you may if for example you're formatting the index, then you can
ignore it. That may or may not cause an error if compiled with -Wall
-Werror flags... the glibc folks have been doing a lot of that lately.
--
Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address. Replying to it will result in
undeliverable mail due to spam prevention. Sorry for the inconvenience.
Kevin J. McCarthy
2018-05-01 18:15:51 UTC
Permalink
Post by Derek Martin
Post by Derek Martin
FWIW, I've said this before, but I think anywhere strncpy() is used,
code that does the above should replace it. This avoids silent
truncation (which strncpy() does), and checking the return is required
to make sure snprintf() actually did something, and you're not
operating on uninitialized memory (or whatever).
Sorry, this was badly worded. Checking the return value is required
to ensure that snprintf() wrote all the data (rc < size, NOT <= size),
assuming you want that. If you actually want it to silently truncate,
which you may if for example you're formatting the index, then you can
ignore it. That may or may not cause an error if compiled with -Wall
-Werror flags... the glibc folks have been doing a lot of that lately.
Hi Derek,

I haven't forgotten or ignored your past emails, and this is still on my
list. I've been thinking about a couple ways to deal with this, that
I'll try after this release. The nice thing about the static buffers is
the speed, but I've been thinking of allocating a pool of BUFFER of
various sizes and using that for the critical parts of the code.
--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA
Derek Martin
2018-05-01 18:48:00 UTC
Permalink
On Tue, May 01, 2018 at 11:15:51AM -0700, Kevin J. McCarthy wrote:
[re: strncpy vs. snprintf...]
Post by Kevin J. McCarthy
Post by Derek Martin
Sorry, this was badly worded. Checking the return value is required
to ensure that snprintf() wrote all the data (rc < size, NOT <= size),
assuming you want that. If you actually want it to silently truncate,
which you may if for example you're formatting the index, then you can
ignore it. That may or may not cause an error if compiled with -Wall
-Werror flags... the glibc folks have been doing a lot of that lately.
Hi Derek,
I haven't forgotten or ignored your past emails, and this is still on my
list. I've been thinking about a couple ways to deal with this, that
I'll try after this release. The nice thing about the static buffers is
the speed, but I've been thinking of allocating a pool of BUFFER of
various sizes and using that for the critical parts of the code.
Great. That seems like a reasonable approach, though unless you're
doing a lot of these in a loop, I expect in most cases the UI I/O (and
human perception) will be the bottleneck, not the memory allocation,
and using malloc()/realloc() directly should be fine/imperceptible.
Note that since snprintf returns the exact size needed (with a modern
libc), you need at most one realloc() in all cases. If you want to be
sure it works with older libc's then the code is slightly more
complicated, requiring realloc() in a loop.

I imagine what you really want is something like a mutt_snprintf()
wrapper that allocs/reallocs the space dynamically as needed. The man
page actually has example code that does exactly that (using
vsnprintf), which could be used as a template. You'd have a few
implementation options:

- have an additional parameter which indicates whether or not
truncation is desired, and skip the realloc() when true.

- Or you could always realloc() when needed, and skip specifying the
size. Or both--when truncate is false the size can be ignored.
If truncate is not desired, the passed-in size could be used as a
hint for the initial string allocation.

- If you can safely assume a modern libc, you could initially use a
local char (size 1) to try the snprintf, which should basically
always fail but give you the size of the buffer you need to
allocate for the message to fit.

- If you can not assume a modern libc then alternatively you'd want
to use an initial size that's typical of the case, perhaps
implicitly assumed (say 30-100 chars or whatever), or based on the
size of the format string, or the passed-in size as above.

Lots of options... any of them should work. Just depends on what sort
of interface you'd prefer.
--
Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address. Replying to it will result in
undeliverable mail due to spam prevention. Sorry for the inconvenience.
Ammon Riley
2018-05-01 18:16:04 UTC
Permalink
Post by Derek Martin
Post by Derek Martin
FWIW, I've said this before, but I think anywhere strncpy() is used,
code that does the above should replace it. This avoids silent
truncation (which strncpy() does), and checking the return is required
to make sure snprintf() actually did something, and you're not
operating on uninitialized memory (or whatever).
Sorry, this was badly worded. Checking the return value is required
to ensure that snprintf() wrote all the data (rc < size, NOT <= size),
assuming you want that. If you actually want it to silently truncate,
which you may if for example you're formatting the index, then you can
ignore it.
In the case of this specific patch, I think it's safe to ignore, and let it
silently truncate. Per RFC 4288 (superseded by RFC 6838), the type
and subtype are restricted to 127 characters maximum, and should
be 64 characters or less. So STRING is long enough to hold the
maximums, including the "/" and terminating NULL, and won't
truncate any legal values.

Cheers,
Ammon
Loading...