safe_rename() and verifying the result of link(2)

Discussion:

Kevin J. McCarthy

2018-07-27 01:39:30 UTC

Gitlab ticket #61 reported an infinite loop (creating files) when using
maildir over sshfs. The merits of doing this are a separate discussion
- what I'm interested in is the behavior of safe_rename().

In this case, link(2) reports success. But to be extra careful,
safe_rename() performs a lstat() on the source and dest file and
compares the results. If the stats don't match, safe_rename() sets
errno=EEXIST, saying essentially to the caller the dest file already
existed.

The maildir/mh code, has a "forever" loop that increments a counter in
the dest filename and keeps trying upon EEXIST until the safe_rename()
succeeds. (see _maildir_commit_message() for an example).

Now, the problem is that sshfs is cheating, and is not actually creating
a hard link. Hence the lstat double-check fails, and mutt loops
creating files "forever".

My question is about the merit of performing the lstat double-check.
Does mutt need to be doing this? Does this work around some strange bug
with certain filesystems or NFS?

Would another possibility be to skip the lstat double check, or to fail
without setting EEXIST if the lstat check doesn't match?

Thanks for any advice on this.

--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA

Derek Martin

2018-07-27 18:07:42 UTC

Permalink

Post by Kevin J. McCarthy
My question is about the merit of performing the lstat double-check.
Does mutt need to be doing this? Does this work around some strange bug
with certain filesystems or NFS?

I can't immediately think of a way that NFS is likely to cause
(additional) problems here... I think the likely reason to do this is
to attempt to prevent a race where the source file is replaced at the
same time as the rename happens (creating a separate file) in which
case you should not unlink. The trouble is, if that's the purpose,
there's still a race between the stat and the unlink...

Post by Kevin J. McCarthy
Would another possibility be to skip the lstat double check, or to fail
without setting EEXIST if the lstat check doesn't match?

You could fail, but you need to set errno to *something*... I'm not
sure what would be appropriate. Another option is to do the lstat of
both, but save the original stat from the source, and then stat BOTH.
If the second stat of the source and the stat of the target match, OR
the first and second stat of the source match, unlink. Either way
though, you still have a race.

Post by Kevin J. McCarthy
Thanks for any advice on this.

Hope that helps.

--
Derek D. Martin http://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02
-=-=-=-=-
This message is posted from an invalid address. Replying to it will result in
undeliverable mail due to spam prevention. Sorry for the inconvenience.

Kevin J. McCarthy

2018-07-28 03:42:52 UTC

Permalink

Post by Derek Martin

Thank you Derek. I'm leaning towards just pulling out the
compare_stat() double check. I think the link() call is about as atomic
as we can get for the rename, and if it returns 0 we should accept that
it worked. Trying to fix other race conditions is... as you say, racy
;-)

--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA

Vincent Lefevre

2018-08-03 05:57:48 UTC

Permalink

sshfs has bugs by default. When I use it, I need the -o workaround=rename
option, otherwise some operations fail (I don't remember what exactly).
Has this been tried here, with Mutt?

--
Vincent Lefèvre <***@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Kevin J. McCarthy

2018-08-14 21:30:57 UTC

Permalink

Post by Vincent Lefevre

sshfs has bugs by default. When I use it, I need the -o workaround=rename
option, otherwise some operations fail (I don't remember what exactly).
Has this been tried here, with Mutt?

The reporter in #61 said "I disabled hardlinks for my filesystem and
that fixes the problem". Not sure if that's the same thing you tried.

I think if the link() returns 0, it's safe for the code to assume all
went well. So, the fix I plan to make is remove the compare_stat call
afterwards, but keep the stat calls for now:

lib.c:
@@ -517,12 +517,21 @@ int safe_rename (const char *src, const char *target)
* did already exist.
*/

+#if 0
+ /*
+ * Remove this check, because it causes problems with maildir on
+ * filesystems that don't properly support hard links, such as
+ * sshfs. The filesystem creates the link, but the resulting file
+ * is given a different inode number by the sshfs layer. This results in an
+ * infinite loop of created files.
+ */
if (compare_stat (&ssb, &tsb) == -1)
{
dprint (1, (debugfile, "safe_rename: stat blocks for %s and %s diverge; pretending EEXIST.\n", src, target));
errno = EEXIST;
return -1;
}
+#endif

/*
* Unlink the original link. Should we really ignore the return

--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA

Vincent Lefevre

2018-08-21 09:30:03 UTC

Permalink

Post by Kevin J. McCarthy
I think if the link() returns 0, it's safe for the code to assume all
went well.

The Linux link(2) man page says:

BUGS
On NFS filesystems, the return code may be wrong in case the NFS server
performs the link creation and dies before it can say so. Use stat(2)
to find out if the link got created.

But as I understand it, this means that the link got created, but the
system thinks that it wasn't, and link() returns a non-zero value.

So, I also think that if link() returns 0, then all went well.

Post by Kevin J. McCarthy
So, the fix I plan to make is remove the compare_stat call
@@ -517,12 +517,21 @@ int safe_rename (const char *src, const char *target)
* did already exist.
*/
+#if 0
+ /*
+ * Remove this check, because it causes problems with maildir on
+ * filesystems that don't properly support hard links, such as
+ * sshfs. The filesystem creates the link, but the resulting file
+ * is given a different inode number by the sshfs layer. This results in an
+ * infinite loop of created files.
+ */
if (compare_stat (&ssb, &tsb) == -1)
{
dprint (1, (debugfile, "safe_rename: stat blocks for %s and %s diverge; pretending EEXIST.\n", src, target));
errno = EEXIST;
return -1;
}
+#endif
/*
* Unlink the original link. Should we really ignore the return

Perhaps, but the sshfs(1) man page says:

-o disable_hardlink
link(2) will return with errno set to ENOSYS. Hard links
don't currently work perfectly on sshfs, and this confuses
some programs. If that happens try disabling hard links with
this option.

If hard links don't work perfectly (can there be more serious errors
than the inode number?), it may be better to disable them.

Kevin J. McCarthy

2018-08-21 14:07:15 UTC

Permalink

Post by Vincent Lefevre
But as I understand it, this means that the link got created, but the
system thinks that it wasn't, and link() returns a non-zero value.
So, I also think that if link() returns 0, then all went well.

That was my reading too. :-)

Post by Vincent Lefevre
[...]
If hard links don't work perfectly (can there be more serious errors
than the inode number?), it may be better to disable them.

I agree. My goal here isn't to "support" sshfs as much as it is to
prevent a disastrous link-creating infinite loop in Mutt; and to do so
without making Mutt less safe.

I don't like removing 20-year old safety checks, but I think it's okay
to do so for the case where link() returns 0. I've been slow to act in
order to give others a chance to chime in, and I greatly appreciate you
doing so.

--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA

Cameron Simpson

2018-08-21 21:13:21 UTC

Permalink

Post by Kevin J. McCarthy

That was my reading too. :-)

For what it's worth, that is my reading too.

Cheers,
Cameron Simpson <***@cskk.id.au>

Kevin J. McCarthy

2018-08-21 22:37:23 UTC

Permalink

Post by Cameron Simpson

Post by Kevin J. McCarthy

That was my reading too. :-)

For what it's worth, that is my reading too.

Thanks Cameron and everyone. I've just pushed the patch up to master.

--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA

Derek Martin

2018-08-22 15:04:12 UTC

Permalink

Post by Kevin J. McCarthy

Post by Cameron Simpson

Post by Kevin J. McCarthy

That was my reading too. :-)

For what it's worth, that is my reading too.

Thanks Cameron and everyone. I've just pushed the patch up to master.

I just saw all the new activity in this thread, and I got to thinking
about this some more. I think both of the reasons mentioned here have
legitimacy. Though as I previously pointed out if you stat() after the
link() and don't get a match (on a local file system), it's possible
that this is because in between the two calls, someone else replaced
the file (it's somewhat unlikely, but this is the nature of race
conditions). But I don't think there's a better way to effect the
desired outcome, since there's no atomic way to do a link() and also
stat() on the new link.

I actually think that the MH folder code is at fault here, not the
stat() call in safe_rename(), despite potential issues with the latter.
If safe_rename() fails with EEXIST, what logic suggests retrying
forever is a good idea? I can think of no reasons, but I can think of
one that says it is not: The user is unlikely to be in the process of
removing/renaming the file that Mutt is trying to rename, and probably
has no reason to imagine a need to do so, so the infinite loop seems
inevitable if safe_rename() fails.

Of course if the previous version of safe_rename() is retained, the
sshfs users may get an error when one has not actually occured, but
sshfs is a hack, so you get what you get...

Kevin J. McCarthy

2018-08-22 18:12:39 UTC

Permalink

Post by Derek Martin
I actually think that the MH folder code is at fault here, not the
stat() call in safe_rename(), despite potential issues with the latter.
If safe_rename() fails with EEXIST, what logic suggests retrying
forever is a good idea?

Thank you Derek. I first thought about changing the mh.c code, but the
code is actually incrementing a counter and regenerating time() as part
of the filename for each retry (take a quick look at
_maildir_commit_message()). I didn't want to put an arbitrary limit,
because Murphy's Law says eventually someone would hit it, and logically
there is no need.

Steffen's cautions apply to dotlock code, which is a different case and
is not affected by this change.

I'm inclined to keep the commit unless a regression appears or someone
can show proof the change is unsafe.

--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA

Derek Martin

2018-08-23 05:08:19 UTC

Permalink

Post by Kevin J. McCarthy

What I was actually suggesting is to not call out the EEXIST error
case differently from other error cases. Making that change has no
other effect on the retry logic in that function. But it does
eliminate the illogical retry forever if the message file already
exists for some reason.

Post by Kevin J. McCarthy
Steffen's cautions apply to dotlock code, which is a different case and
is not affected by this change.

It's fundamentally the same thing though. The mechanism for dotlock
works like this:

- create a secure temporary file (with O_EXCL).
This ensures that the file we're opening for the lock has not been
subverted by another process, potentially an attacker.
- stat the file
- link the file to canonical name
If the link succeeds, we have the lock, but the rc from link is
unreliable, so...
- stat the file again using the new link
Here, we compare the inode and/or make sure the link count has
increased, to ensure we're really dealing with the same file...
- write the PID to the lock file
- unlink the temporary file

Exact details may vary slightly, but that's the essence of it. This
is almost exactly what _maildir_commit_message() (and safe_rename())
does, for largely the same reasons, though the purpose of the file is
different.

Post by Kevin J. McCarthy
I'm inclined to keep the commit unless a regression appears or someone
can show proof the change is unsafe.

So, I just read safe_rename(), and I think there are a few problems
here:

- The stat comparison is supposed to catch the case where the return
value from link() incorrectly indicates failure, but it does not do
that because it only happens when link() succeeds. That defeats the
purpose of doing the check at all.

- If you fixed that, your change would obviously again neuter that
check.

- link() can follow symlinks on some systems. But safe_rename() uses
lstat() (which does not follow symlinks) to get the struct stats
for original and new links. If you fix this check, it should
probably use stat() for the original, and lstat() for the target.
This will fix safe_rename() for the case where the OS follows
symlinks, and catch the case where the new target is a hard link to
the original symlink (rather than the file). The only likely
scenario this would happen in real life is if someone was trying to
execute a symlink attack on the user's message file, which seems a
bit far-fetched, but it would be more correct.

- Minor issue of comment on unlink not checking the return value, but
it actually does...

FWIW the version I have handy is a bit out of date so it's possible my
comments aren't valid any longer, but I'd be surprised if this
function had changed much. It's also very late here, and I could be
misreading the code. =8^)

Kevin J. McCarthy

2018-08-23 17:11:22 UTC

Permalink

Post by Derek Martin
What I was actually suggesting is to not call out the EEXIST error
case differently from other error cases. Making that change has no
other effect on the retry logic in that function. But it does
eliminate the illogical retry forever if the message file already
exists for some reason.

The calling function (e.g. _maildir_commit_message()) is not trying to
safe_rename() to the same filename over and over though. It is trying
to find an available target filename, and keeps modifying the target
filename using an incremented Counter and time(). So it makes sense for
it to retry if the target filename already existed. Or am I
misunderstanding your point?

Post by Derek Martin

Post by Kevin J. McCarthy
Steffen's cautions apply to dotlock code, which is a different case and
is not affected by this change.

It's fundamentally the same thing though. The mechanism for dotlock
[...]
If the link succeeds, we have the lock, but the rc from link is
unreliable, so...

This is the important question. Under what circumstances is the rc from
the link() unreliable? Are you saying a "0" success return value should
not be trusted? Because the crux of my justification for the change is
that the link() returned success and there is no need for a double
check.

The man page indicates a __failure__ code from link() can't be trusted
for NFS (if the server happened to die but created the link), but
nothing is mentioned about a success return code being unreliable. I
haven't been working on this kind of stuff for years though, so I rely
on you all to tell me when I make stupid changes. I really need to know
if this is a stupid change.

Post by Derek Martin
- The stat comparison is supposed to catch the case where the return
value from link() incorrectly indicates failure, but it does not do
that because it only happens when link() succeeds. That defeats the
purpose of doing the check at all.

No, the stat comparison was a double check for if the link __succeeds__.
There is already failure handling code - reverting to trying a rename()
for certain errno values. It was merely Vincent's hypothesis that the
check was in the wrong place, but I don't believe that is correct.

Your point about stat vs lstat sounds important, but let's first resolve
whether the double-check needs to be restored or not.

--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA

Derek Martin

2018-08-23 19:50:31 UTC

Permalink

[Sorry, I know this is long...]

Post by Kevin J. McCarthy
The calling function (e.g. _maildir_commit_message()) is not trying to
safe_rename() to the same filename over and over though. It is trying
to find an available target filename, and keeps modifying the target
filename using an incremented Counter and time(). So it makes sense for
it to retry if the target filename already existed. Or am I
misunderstanding your point?

OK. I haven't looked that closely at it (clearly).

In that case, it basically is analogous to mkstemp, i.e. it tries to
create a file with a particlar pattern, fails if it can't create it
safely, and retries with a different pattern. But those functions
will fail after TMP_MAX retries, and I think this code should do the
same. That prevents the DoS (infinite loop) and lets Mutt alert the
user that something funny is going on... This should never happen
unless the user is being attacked in some fashion.

On my system, TMP_MAX is defined as 238328. I think that's plenty of
retries! And since it's defined by the system, it's not exactly
arbitrary... =8^)

Post by Kevin J. McCarthy

Post by Derek Martin

Post by Kevin J. McCarthy
Steffen's cautions apply to dotlock code, which is a different case and
is not affected by this change.

It's fundamentally the same thing though. The mechanism for dotlock
[...]
If the link succeeds, we have the lock, but the rc from link is
unreliable, so...

This is the important question. Under what circumstances is the rc from
the link() unreliable? Are you saying a "0" success return value should
not be trusted?

I read it as you do too, but 1. I think it's ambiguous enough that I
would not be 100% confident in that interpretation, and 2. this is for
Linux, and it's not clear that other platforms wouldn't have different
semantics (or even different file systems on Linux). I will note
that, for example, GnuPG always ignores the return value from link()
(common/dotlock.c:1070):

/* We ignore the return value of link() because it is unreliable. */
(void) link (h->tname, h->lockname);

if (stat (h->tname, &sb))
{
saveerrno = errno;
my_error_1 ("lock not made: Oops: stat of tmp file failed: %s\n",
strerror (errno));
/* In theory this might be a severe error: It is possible
that link succeeded but stat failed due to changed
permissions. We can't do anything about it, though. */
my_set_errno (saveerrno);
return -1;
}

Post by Kevin J. McCarthy

No, the stat comparison was a double check for if the link __succeeds__.

That's not usually how it's done, as far as I can tell/remember. I
think I understand why, and I would refer you again to the GnuPG code
above as an example. It always ignores the link return value and does
the stat comparison.

Remember, the reason safe_rename() exists is because rename() has a
race condition on NFS--it is required to be atomic by the spec but on
NFS it is not. Therefore rename should be used only as a last resort,
and only if we're pretty sure link() actually failed.

If your link() return value is reliable, there's no reason to do the
stat comparison. If only the error case is unreliable, then on error
the stat comparison is NECESSARY, to determine if link() actually
succeeded despite the error. If neither is reliable, you should just
always ignore link()'s RV and always do the stat comparison.

So again, I think safe_rename() is wrong, regarless, and the change
you made actually has no practical effect. If it were doing the right
thing, your change would prevent it from catching the case where

Post by Kevin J. McCarthy
There is already failure handling code - reverting to trying a rename()
for certain errno values. It was merely Vincent's hypothesis that the
check was in the wrong place, but I don't believe that is correct.

Note that if the link() actually succeeded but incorrectly reported a
failure, the rename will also fail, in the sense that there will be
two links to the file, where the old link will not be removed. This
can be demonstrated on a local file system with the following short
program:

-=-=-=-=-
$ cat rename.c
#include <stdio.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
#include <stdlib.h>

char *old = "linkfile";
char *new = "linktarg";

int main(int argc, char **argv)
{
int fd;
// Show status of both files
system("ls -li linkfile linktarg");

// create the "old" file
printf("creating %s...\n", old);
if ((fd = open(old, O_CREAT, 0644)) == 1){
fprintf(stderr, "creating %s failed\n", old);
return 1;
}
close(fd);
system("ls -li linkfile linktarg");

// On a local file system, this should succeed
if (link(old, new) == -1) fprintf(stderr, "link fail\n");
else printf("link(%s, %s) succeeded.\n", old, new);
system("ls -li linkfile linktarg");

// simulate rename after incorrect link fail
if (rename(old, new) == -1) fprintf(stderr, "rename fail\n");
else printf("rename(%s, %s) succeeded.\n", old, new);
system("ls -li linkfile linktarg");

return 0;
}

$ ./rn
ls: cannot access 'linkfile': No such file or directory
ls: cannot access 'linktarg': No such file or directory
creating linkfile...
ls: cannot access 'linktarg': No such file or directory
27026462 -rw-r--r-- 1 demartin staff 0 Aug 23 14:50 linkfile
link(linkfile, linktarg) succeeded.
27026462 -rw-r--r-- 2 demartin staff 0 Aug 23 14:50 linkfile
27026462 -rw-r--r-- 2 demartin staff 0 Aug 23 14:50 linktarg
rename(linkfile, linktarg) succeeded.
27026462 -rw-r--r-- 2 demartin staff 0 Aug 23 14:50 linkfile
27026462 -rw-r--r-- 2 demartin staff 0 Aug 23 14:50 linktarg

So, I think this further supports what I'm saying. To restate my
various points more clearly, the algorithm for safe_rename() should be
this:

-=-=-=-
try a regular link and get its return value.
stat the old and new files, and compare. if the same, the link worked!
if they are different {
try rename.
if rename fails {
return failure.
}
}
unlink(old), ignore ENOENT, but report other errors.
-=-=-=-

The reason for the last is, the unlink() serves to clean up BOTH the
case where the link succeeded, AND the case where rename() succeeded
but didn't clean up the original link. However normally if rename
succeeds, the old link will get cleaned up, so unlink will fail in
that case, with ENOENT.

Alternatively, if you're convinced that the link() success status is
reliable, you can do the stat comparison conditionally, only if it
returns failure. Again, the reason to do this is to determine if the
link actually succeeded, but brokenly reported failure due to NFS
quirks.

The former (ignoring link's return value and always doing the stat
comparison) is probably safest for most users. The latter will
apparently be less problematic for SSHFS-like quirks, but as we saw
there's an SSHFS option that makes it work without this behavior.

Post by Kevin J. McCarthy
Your point about stat vs lstat sounds important, but let's first resolve
whether the double-check needs to be restored or not.

Sure. Also note that rename() will possibly have different symlink
semantics, depending on which link() symlink behavior is in effect on
your platform. That may make this even more complicated.

Isn't POSIX fun? =8^)

Vincent Lefevre

2018-08-24 09:35:50 UTC

Permalink

Post by Derek Martin
The former (ignoring link's return value and always doing the stat
comparison) is probably safest for most users. The latter will
apparently be less problematic for SSHFS-like quirks, but as we saw
there's an SSHFS option that makes it work without this behavior.

You have no proof of that. If I understand correctly, you suggest to
ignore link's return value in all cases because of a potential race
condition on some old system. But then, if you use the SSHFS option
to disable hard links, so that Mutt will use rename() instead, then
you also need to consider race conditions it can introduce... If
you think that rename() is at least as good as the link() stuff,
then safe_rename is useless: you can just use rename(). :-)

Derek Martin

2018-08-24 19:33:16 UTC

Permalink

Post by Vincent Lefevre

You have no proof of that.

No proof of what? I think I can prove that everything I said is
documented, or follows logically from things that are documented. I
think mostly, I've already done so, though perhaps some of the sources
need to be referenced.

Post by Vincent Lefevre
If I understand correctly, you suggest to ignore link's return value
in all cases because of a potential race condition on some old
system.

You do not understand correctly.

The main issue is, and always was, the unreliability of the return
value of link() over NFS, which the man page on Linux indicates is a
current problem. There's also ideficiency of clarity about how that
unreliability manifests on Linux, and a general lack of info about the
problem from a portability perspective.

The purpose of safe_rename() (according to its comments) is to attempt
to provide an equivalent option that is more reliable over NFS. But
there's a reliable test using stat(), irrespective of the reliability
of the return value of link(), to determine if link() actually worked,
which is known to work over NFS. So we should ignore the link()
return value and use the reliable stat() test, always.

The exception to this is SSHFS, which (IIUC) depending on its
configuration, fakes link() by copying the file. Doing that would
obviously make the stat() check ALWAYS fail. I would propose that we
ignore SSHFS since it's a hack for which we know that semantics that
POSIX requires are not supported. And as you say, it seems to have a
workaround that works well enough considering it's a hack.

But, in the event someone thinks SSHFS should be supported as a
first-class file system, the alternative I suggested should
work better for SSHFS because the stat() test will always fail. The
alternative may fail randomly if the NFS server crashes, but a) that
will be far more rare than always, and b) SSHFS is a hack so who
cares?

Is that clear?

Post by Vincent Lefevre
But then, if you use the SSHFS option to disable hard links, so that
Mutt will use rename() instead, then you also need to consider race
conditions it can introduce...

1. No we don't, because SSHFS is a hack and who cares? The user
should be prepared to deal with its inability to provide semantics
that POSIX requires.

2. I've previously said that rename() is NOT as good as link(), and
should be used only as a last resort. The algorithm (both
versions) I described take that into account.

That said, in practice, I actually think it would be fine to eliminate
safe_rename() entirely and just use rename(). The problem that it
tries to solve should be sufficiently rare that I think it's
reasonable to make the user deal with it manually. And as I've said
many times in the (mostly distant) past, reading your mail over NFS is
stupid anyway, precisely because of problems like these. Probably
many people care about Mutt working correctly over NFS (as much as
possible)... I am not one of them.

Kevin J. McCarthy

2018-08-27 01:09:19 UTC

Permalink

Post by Derek Martin
Alternatively, if you're convinced that the link() success status is
reliable, you can do the stat comparison conditionally, only if it
returns failure. Again, the reason to do this is to determine if the
link actually succeeded, but brokenly reported failure due to NFS
quirks.

I've been thinking, researching, and rereading the thread. Everywhere I
can actually find a concrete description of the problem with link()'s
return value, the explanation has been a failure code that was in fact a
success (because of NFS).

So, I'm convinced trusting a 0 return value is acceptable, and am going
to keep the current committed behavior for that case.

I'm not too excited about changing the -1 return value case, because no
issues have been reported in the 20-year old function related to that.
But I admit it's documented, both historically and in the man page, and
so it deserves handling. Your sample code and the man page for rename()
also both mention the problem of leaving the oldpath if it is a hard
link to newpath.

So, I will attempt and post a patch for that fix in a new thread.

--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA

Steffen Nurpmeso

2018-08-23 23:16:42 UTC

Permalink

Derek Martin wrote in <***@bladeshadow.org>:
|On Wed, Aug 22, 2018 at 11:12:39AM -0700, Kevin J. McCarthy wrote:
|> On Wed, Aug 22, 2018 at 10:04:12AM -0500, Derek Martin wrote:
...
|> Steffen's cautions apply to dotlock code, which is a different case and
|> is not affected by this change.
|
|It's fundamentally the same thing though. The mechanism for dotlock
|works like this:
|
| - create a secure temporary file (with O_EXCL).
| This ensures that the file we're opening for the lock has not been
| subverted by another process, potentially an attacker.
| - stat the file

This does not happen for the traditional BSD code.

| - link the file to canonical name
| If the link succeeds, we have the lock, but the rc from link is
| unreliable, so...
| - stat the file again using the new link
| Here, we compare the inode and/or make sure the link count has
| increased, to ensure we're really dealing with the same file...

No, instead stat(2) is called on the temporary file, and if that
has a link count of 2 then we have won the race on the lock file.

| - write the PID to the lock file
| - unlink the temporary file
|
|Exact details may vary slightly, but that's the essence of it. This
|is almost exactly what _maildir_commit_message() (and safe_rename())
|does, for largely the same reasons, though the purpose of the file is
|different.

I do not know.

--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)

Vincent Lefevre

2018-08-21 23:02:29 UTC

Permalink

Post by Kevin J. McCarthy
I don't like removing 20-year old safety checks, but I think it's okay
to do so for the case where link() returns 0.

I was also hesitant, but after thinking more about it:

* That's an unusual "safety check" (I doubt other software does
the same kind of check after link() with return value 0).

* It is not clear whether this has ever been usuful (nothing in
comment or in the commit log).

* It might even have been a mistake. I suspect that the goal was
for NFS, but then the test should have been for non-zero return
values instead of zero.

Steffen Nurpmeso

2018-08-22 13:35:16 UTC

Permalink

Vincent Lefevre wrote in <***@zira.vinc17.org>:
|On 2018-08-21 07:07:15 -0700, Kevin J. McCarthy wrote:
|> I don't like removing 20-year old safety checks, but I think it's okay
|> to do so for the case where link() returns 0.
|
|I was also hesitant, but after thinking more about it:
|
|* That's an unusual "safety check" (I doubt other software does
| the same kind of check after link() with return value 0).

Quite the opposite is true. The original BSD code for creating
dotlock files does this, always.

|* It is not clear whether this has ever been usuful (nothing in
| comment or in the commit log).

According to some statement on a list you are also subscribed to
this NFS problem has been fixed "at the early 90s". The BSD code
i refer to came up later as far as i now (without having digged to
thoroughly).

|* It might even have been a mistake. I suspect that the goal was
| for NFS, but then the test should have been for non-zero return
| values instead of zero.

The code i refer to was not about renaming some file, but about
savely creating a temporary dotlock file, of course, where several
processes may create the very same file, and it must be ensured
that it is you who won the race. Since i have no overview over
Unix systems but *BSD and Linux (2.0.x, 2.2.x, 2.4.x, then
a decade not, then 4.x), and Solaris via OpenCSW.org, and an old
UnixWare VM but which i have not used since my old machine died,
i would not dare to change this. Maybe Christos Zoulas of NetBSD
would know more, but then i think i am not the right one to ask.

I for one will not change the code for the MUA i maintain, because
one of the (very far) future goals is (at least partial) support
of elder Unix from before Y2K (from before 1999-01-11, when
i first used Linux, to be exact).

--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)

Vincent Lefevre

2018-08-24 09:11:57 UTC

Permalink

Post by Steffen Nurpmeso
|* It is not clear whether this has ever been usuful (nothing in
| comment or in the commit log).
According to some statement on a list you are also subscribed to
this NFS problem has been fixed "at the early 90s".

This is quite surprising. Even the discussion at

https://www.experts-exchange.com/questions/10078625/atomic-locking-over-NFS-with-link-2-stat-2.html

in 1998 doesn't mention this problem (it just mentions the case where
the link(2) call failed though the link was created).

Several old documents say not to use the return value of the link(2)
call, but do not say whether this is only due to the case where
link(2) fails but the link was created, or the other way round.

Steffen Nurpmeso

2018-08-24 16:54:16 UTC

Permalink

Vincent Lefevre wrote in <***@zira.vinc17.org>:
|On 2018-08-22 15:35:16 +0200, Steffen Nurpmeso wrote:
|> Vincent Lefevre wrote in <***@zira.vinc17.org>:
|>|* It is not clear whether this has ever been usuful (nothing in
|>| comment or in the commit log).
|>
|> According to some statement on a list you are also subscribed to
|> this NFS problem has been fixed "at the early 90s".
|
|This is quite surprising. Even the discussion at
|
| https://www.experts-exchange.com/questions/10078625/atomic-locking-over-\
| NFS-with-link-2-stat-2.html
|
|in 1998 doesn't mention this problem (it just mentions the case where
|the link(2) call failed though the link was created).
|
|Several old documents say not to use the return value of the link(2)
|call, but do not say whether this is only due to the case where
|link(2) fails but the link was created, or the other way round.

Oh, wait! This was false rememberance, i referred to a message
from Casper Dik of Oracle who wrote on 2015-12-31

/* Create a unique file. O_EXCL does not really work over NFS so we follow
* - make a mostly unique filename and try to create it
* - link the unique filename to our target
* - get the link count of the target
* - unlink the mostly unique filename
* - if the link count was 2, then we are ok; else we've failed */

The problem of not being able to create a file with O_EXCL was, I think,
fixed in NFSv3 (if not, certainly in NFSv4)

Casper

so this was not about link but about O_EXCL. About a year later
(2016-11-02) there was a pair of message in between Stèphane and
Jörg about links via NFS, as in "IIRC there were issues with ln on
NFS for instance." and "Could you please explain what you have in
mind? I would like to understand whether there really is a NFS
problem or whether there is just a NFS bug in Linux", but nothing
more than that.

--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)

Vincent Lefevre

2018-08-31 15:55:41 UTC

Permalink

Post by Steffen Nurpmeso
Oh, wait! This was false rememberance, i referred to a message
from Casper Dik of Oracle who wrote on 2015-12-31

The problem of not being able to create a file with O_EXCL was, I think,
fixed in NFSv3 (if not, certainly in NFSv4)
Casper
so this was not about link but about O_EXCL.

Yes, a well-known issue.

Post by Steffen Nurpmeso
About a year later
(2016-11-02) there was a pair of message in between Stèphane and
Jörg about links via NFS, as in "IIRC there were issues with ln on
NFS for instance." and "Could you please explain what you have in
mind? I would like to understand whether there really is a NFS
problem or whether there is just a NFS bug in Linux", but nothing
more than that.

Perhaps it could just be the one already mentioned:

On NFS filesystems, the return code may be wrong in case the NFS server
performs the link creation and dies before it can say so. Use stat(2)
to find out if the link got created.

which, I suppose, is impossible to solve.

Steffen Nurpmeso

2018-09-02 00:33:46 UTC

Permalink

Vincent Lefevre wrote in <***@cventin.lip.ens-lyon.fr>:
|On 2018-08-24 18:54:16 +0200, Steffen Nurpmeso wrote:
|> Oh, wait! This was false rememberance, i referred to a message
|> from Casper Dik of Oracle who wrote on 2015-12-31
|>
|>>/* Create a unique file. O_EXCL does not really work over NFS so \
|>>we follow
|>> * the following trick (inspired by S.R. van den Berg):
|>> * - make a mostly unique filename and try to create it
|>> * - link the unique filename to our target
|>> * - get the link count of the target
|>> * - unlink the mostly unique filename
|>> * - if the link count was 2, then we are ok; else we've failed */
|>
|> The problem of not being able to create a file with O_EXCL was, \
|> I think,
|> fixed in NFSv3 (if not, certainly in NFSv4)
|>
|> Casper
|>
|> so this was not about link but about O_EXCL.
|
|Yes, a well-known issue.

Well. RFC 1094 from 1989, section 2.2.10, Create File, says

The file "name" is created in the directory given by "dir". The
initial attributes of the new file are given by "attributes". A
reply "status" of NFS_OK indicates that the file was created, and
reply "file" and reply "attributes" are its file handle and
attributes. Any other reply "status" means that the operation failed
and no file was created.

Notes: This routine should pass an exclusive create flag, meaning
"create the file only if it is not already there".

So well-known yes, but i have no context of knowledge. Maybe
i should ask or look in some NFS sources.

|> About a year later
|> (2016-11-02) there was a pair of message in between Stèphane and
|> Jörg about links via NFS, as in "IIRC there were issues with ln on
|> NFS for instance." and "Could you please explain what you have in
|> mind? I would like to understand whether there really is a NFS
|> problem or whether there is just a NFS bug in Linux", but nothing
|> more than that.
|
|Perhaps it could just be the one already mentioned:
|
| On NFS filesystems, the return code may be wrong in case the NFS server
| performs the link creation and dies before it can say so. Use stat(2)
| to find out if the link got created.
|
|which, I suppose, is impossible to solve.

..The network connection can break at any time. RFC 1094 says

Also, most server failures occur between operations, not
between the receipt of an operation and the response. Finally,

A nice encouragement.

although actual server failures may be rare, in complex networks,
failures of any network, router, or bridge may be indistinguishable
from a server failure.

One could also go and look locally whether the file exists or not.
No, i mean, it is hard to say and depends on the context. If you
are linking/xy over a hundreds files in order an additional stat
is a problem except (maybe) for the last file, if you are only
doing one it may very well provide confidence. Of course it is
still racy, some other process may surely have had its rights to
overwrite/remove/move away the target in the meantime. Maybe
doing a stat on the target directory in order to verify that the
server is still alive would be it. Since: which software is
actually flexible enough to create a (real) batch if batching as
above would be possible?

A nice Sunday i wish.

--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)