Message ID | CAOg9mSTDxoLfHceXOJAwNZ6mhwZs4o=d9sQUjWcAQsQbUPiE4w@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
I added a signal handler to my test program today, and now I see the giant difference between wait_for_completion_interruptible_timeout, which is what you get when you use the -intr mount option, and wait_for_completion_killable_timeout, which is what you get when you don't, so I retract what I said about the -intr mount option not being relevant <g>... It also seems like it would be easy (and correct) to modify the patch a little so that -EINTR would still be returned on the off-chance that the interrupt was caught before any data was written... -Mike On Thu, Mar 3, 2016 at 5:25 PM, Mike Marshall <hubcap@omnibond.com> wrote: > Here is what I have come up with to try and make our return to > interrupted writes more acceptable... in my tests it seems to > work. My test involved running the client-core with a tiny IO > buffer size (4k) and a C program with a large write buffer(32M). > That way there was plenty of time for me to fire off the C program > and hit Ctrl-C while the write was chugging along. > > The return value from do_readv_writev always matches the size > of the file written by the aborted C program in my tests. > > I changed the C program around so that sometimes I ran it with > (O_CREAT | O_RDWR) on open and sometimes with (O_CREAT | O_RDWR | O_APPEND) > and it seemed to do the right thing. I didn't try to set up a > signal handler so that the signal wasn't fatal to the process, > I guess that would be a test to actually see and verify the correct > short return code to write... > > Do you all think this looks like it should work in principle? > > BTW: in the distant past someone else attempted to solve this problem > the "nfs intr" way - we have an intr mount option, and that's why there's > all that sweating over whether or not stuff is "interruptible" in > waitqueue.c... I'm not sure if our intr mount option is relevant anymore > given the way the op handling code has evolved... > > diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c > index 6f2e0f7..4349c9d 100644 > --- a/fs/orangefs/file.c > +++ b/fs/orangefs/file.c > @@ -180,21 +180,54 @@ populate_shared_memory: > } > > if (ret < 0) { > - /* > - * don't write an error to syslog on signaled operation > - * termination unless we've got debugging turned on, as > - * this can happen regularly (i.e. ctrl-c) > - */ > - if (ret == -EINTR) > + if (ret == -EINTR) { > + /* > + * We can't return EINTR if any data was written, > + * it's not POSIX. It is minimally acceptable > + * to give a partial write, the way NFS does. > + * > + * It would be optimal to return all or nothing, > + * but if a userspace write is bigger than > + * an IO buffer, and the interrupt occurs > + * between buffer writes, that would not be > + * possible. > + */ > + switch (new_op->op_state - OP_VFS_STATE_GIVEN_UP) { > + /* > + * If the op was waiting when the interrupt > + * occurred, then the client-core did not > + * trigger the write. > + */ > + case OP_VFS_STATE_WAITING: > + ret = 0; > + break; > + /* > + * If the op was in progress when the interrupt > + * occurred, then the client-core was able to > + * trigger the write. > + */ > + case OP_VFS_STATE_INPROGR: > + ret = total_size; > + break; > + default: > + gossip_err("%s: unexpected op state :%d:.\n", > + __func__, > + new_op->op_state); > + ret = 0; > + break; > + } > gossip_debug(GOSSIP_FILE_DEBUG, > - "%s: returning error %ld\n", __func__, > - (long)ret); > - else > + "%s: got EINTR, state:%d: %p\n", > + __func__, > + new_op->op_state, > + new_op); > + } else { > gossip_err("%s: error in %s handle %pU, returning %zd\n", > __func__, > type == ORANGEFS_IO_READ ? > "read from" : "write to", > handle, ret); > + } > if (orangefs_cancel_op_in_progress(new_op)) > return ret; > > > On Sat, Jan 23, 2016 at 6:35 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> On Sat, Jan 23, 2016 at 2:46 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: >>> >>> What should we get? -EINTR, despite having written some data? >> >> No, that's not acceptable. >> >> Either all or nothing (which is POSIX) or the NFS 'intr' mount >> behavior (partial write return, -EINTR only when nothing was written >> at all). And, like NFS, a mount option might be a good thing. >> >> And of course, for the usual reasons, fatal signals are special in >> that for them we generally say "screw posix, nobody sees the return >> value anyway", but even there the filesystem might as well still >> return the partial return value (just to not introduce yet another >> special case). >> >> In fact, I think that with our "fatal signals interrupt" behavior, >> nobody should likely use the "intr" mount option on NFS. Even if the >> semantics may be "better", there are likely simply just too many >> programs that don't check the return value of "write()" at all, much >> less handle partial writes correctly. >> >> (And yes, our "screw posix" behavior wrt fatal signals is strictly >> wrong even _despite_ the fact that nobody sees the return value - >> other processes can still obviously see that the whole write wasn't >> done. But blocking on a fatal signal is _so_ annoying that it's one of >> those things where we just say "posix was wrong on this one, and if we >> squint a bit we look _almost_ like we're compliant"). >> >> Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c index 6f2e0f7..4349c9d 100644 --- a/fs/orangefs/file.c +++ b/fs/orangefs/file.c @@ -180,21 +180,54 @@ populate_shared_memory: } if (ret < 0) { - /* - * don't write an error to syslog on signaled operation - * termination unless we've got debugging turned on, as - * this can happen regularly (i.e. ctrl-c) - */ - if (ret == -EINTR) + if (ret == -EINTR) { + /* + * We can't return EINTR if any data was written, + * it's not POSIX. It is minimally acceptable + * to give a partial write, the way NFS does. + * + * It would be optimal to return all or nothing, + * but if a userspace write is bigger than + * an IO buffer, and the interrupt occurs + * between buffer writes, that would not be + * possible. + */ + switch (new_op->op_state - OP_VFS_STATE_GIVEN_UP) { + /* + * If the op was waiting when the interrupt + * occurred, then the client-core did not + * trigger the write. + */ + case OP_VFS_STATE_WAITING: + ret = 0; + break; + /* + * If the op was in progress when the interrupt + * occurred, then the client-core was able to + * trigger the write. + */ + case OP_VFS_STATE_INPROGR: + ret = total_size; + break; + default: + gossip_err("%s: unexpected op state :%d:.\n", + __func__, + new_op->op_state); + ret = 0; + break; + } gossip_debug(GOSSIP_FILE_DEBUG, - "%s: returning error %ld\n", __func__, - (long)ret); - else + "%s: got EINTR, state:%d: %p\n", + __func__, + new_op->op_state, + new_op); + } else { gossip_err("%s: error in %s handle %pU, returning %zd\n", __func__, type == ORANGEFS_IO_READ ? "read from" : "write to", handle, ret); + } if (orangefs_cancel_op_in_progress(new_op)) return ret;