Message ID | 20201215190127.1765595-1-david.edmondson@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: report errno when flock fcntl fails | expand |
15.12.2020 22:01, David Edmondson wrote: > When a call to fcntl(2) for the purpose of manipulating file locks > fails, report the error returned by fcntl. > > Signed-off-by: David Edmondson<david.edmondson@oracle.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Am 15.12.2020 um 20:01 hat David Edmondson geschrieben: > When a call to fcntl(2) for the purpose of manipulating file locks > fails, report the error returned by fcntl. > > Signed-off-by: David Edmondson <david.edmondson@oracle.com> Is appending "Resource temporarily unavailable" in the common case (a file locked by another process) actually an improvement for the message or more confusing? It would be good to mention the motivation for the change in the commit message. Either way, this breaks some qemu-iotests cases whose reference output needs to be updated. Kevin
On Wednesday, 2020-12-16 at 12:29:40 +01, Kevin Wolf wrote: > Am 15.12.2020 um 20:01 hat David Edmondson geschrieben: >> When a call to fcntl(2) for the purpose of manipulating file locks >> fails, report the error returned by fcntl. >> >> Signed-off-by: David Edmondson <david.edmondson@oracle.com> > > Is appending "Resource temporarily unavailable" in the common case (a > file locked by another process) actually an improvement for the message > or more confusing? It would be good to mention the motivation for the > change in the commit message. Distinguishing between the common case and others is at least a start when trying to figure out why it failed. We have potentially useful information to assist in diagnosis, why throw it away? In the case I encountered the failure appears to have been caused by SELinux misconfiguration. > Either way, this breaks some qemu-iotests cases whose reference output > needs to be updated. Will fix and send a v2. dme.
Am 16.12.2020 um 12:38 hat David Edmondson geschrieben: > On Wednesday, 2020-12-16 at 12:29:40 +01, Kevin Wolf wrote: > > > Am 15.12.2020 um 20:01 hat David Edmondson geschrieben: > >> When a call to fcntl(2) for the purpose of manipulating file locks > >> fails, report the error returned by fcntl. > >> > >> Signed-off-by: David Edmondson <david.edmondson@oracle.com> > > > > Is appending "Resource temporarily unavailable" in the common case (a > > file locked by another process) actually an improvement for the message > > or more confusing? It would be good to mention the motivation for the > > change in the commit message. > > Distinguishing between the common case and others is at least a start > when trying to figure out why it failed. We have potentially useful > information to assist in diagnosis, why throw it away? I agree in principle, just when I saw the result, it felt more confusing than helpful. Maybe the best option would be to include the errno only if it's different from EAGAIN? Then the qemu-iotests reference output would probably stay unchanged, too. > In the case I encountered the failure appears to have been caused by > SELinux misconfiguration. Ah, so you got EPERM? It can useful information in the future, let's include it in the commit message. > > Either way, this breaks some qemu-iotests cases whose reference output > > needs to be updated. > > Will fix and send a v2. Thanks! Kevin
On Wednesday, 2020-12-16 at 12:57:46 +01, Kevin Wolf wrote: > Am 16.12.2020 um 12:38 hat David Edmondson geschrieben: >> On Wednesday, 2020-12-16 at 12:29:40 +01, Kevin Wolf wrote: >> >> > Am 15.12.2020 um 20:01 hat David Edmondson geschrieben: >> >> When a call to fcntl(2) for the purpose of manipulating file locks >> >> fails, report the error returned by fcntl. >> >> >> >> Signed-off-by: David Edmondson <david.edmondson@oracle.com> >> > >> > Is appending "Resource temporarily unavailable" in the common case (a >> > file locked by another process) actually an improvement for the message >> > or more confusing? It would be good to mention the motivation for the >> > change in the commit message. >> >> Distinguishing between the common case and others is at least a start >> when trying to figure out why it failed. We have potentially useful >> information to assist in diagnosis, why throw it away? > > I agree in principle, just when I saw the result, it felt more confusing > than helpful. Maybe the best option would be to include the errno only > if it's different from EAGAIN? Then the qemu-iotests reference output > would probably stay unchanged, too. This is a pretty low-level error report even today - unless you understand the implementation then it doesn't shout "the file is being shared". Given that, I don't see any value in eliding the EAGAIN message, as I have to know that the lack of the errno string means that it was EAGAIN, meaning that another process holds a lock. >> In the case I encountered the failure appears to have been caused by >> SELinux misconfiguration. > > Ah, so you got EPERM? Yes. > It can useful information in the future, let's include it in the commit > message. Okay. dme.
Am 16.12.2020 um 14:06 hat David Edmondson geschrieben: > On Wednesday, 2020-12-16 at 12:57:46 +01, Kevin Wolf wrote: > > > Am 16.12.2020 um 12:38 hat David Edmondson geschrieben: > >> On Wednesday, 2020-12-16 at 12:29:40 +01, Kevin Wolf wrote: > >> > >> > Am 15.12.2020 um 20:01 hat David Edmondson geschrieben: > >> >> When a call to fcntl(2) for the purpose of manipulating file locks > >> >> fails, report the error returned by fcntl. > >> >> > >> >> Signed-off-by: David Edmondson <david.edmondson@oracle.com> > >> > > >> > Is appending "Resource temporarily unavailable" in the common case (a > >> > file locked by another process) actually an improvement for the message > >> > or more confusing? It would be good to mention the motivation for the > >> > change in the commit message. > >> > >> Distinguishing between the common case and others is at least a start > >> when trying to figure out why it failed. We have potentially useful > >> information to assist in diagnosis, why throw it away? > > > > I agree in principle, just when I saw the result, it felt more confusing > > than helpful. Maybe the best option would be to include the errno only > > if it's different from EAGAIN? Then the qemu-iotests reference output > > would probably stay unchanged, too. > > This is a pretty low-level error report even today - unless you > understand the implementation then it doesn't shout "the file is being > shared". Hm, certainly true for raw_apply_lock_bytes(), which shouldn't normally fail, so I agree that we don't need to simplify messages there. From raw_check_lock_bytes(), you get messages like 'Failed to get "write" lock', which I think is fairly obvious? I'm not saying that it's perfect and couldn't be improved, of course, but it doesn't feel horrible. > Given that, I don't see any value in eliding the EAGAIN message, as I > have to know that the lack of the errno string means that it was EAGAIN, > meaning that another process holds a lock. When you debug an error, you'll look at the code anyway. And a normal user won't know what EAGAIN or "Resource temporarily unavailable" means even if it's displayed. Temporarily sounds more like it will go away by itself, not that I have to shut down a different process first. I wonder if anyone else has an opinion on this? If people think that displaying it is preferable or it doesn't matter much, we can add it despite my concerns. Kevin
16.12.2020 16:53, Kevin Wolf wrote: > Am 16.12.2020 um 14:06 hat David Edmondson geschrieben: >> On Wednesday, 2020-12-16 at 12:57:46 +01, Kevin Wolf wrote: >> >>> Am 16.12.2020 um 12:38 hat David Edmondson geschrieben: >>>> On Wednesday, 2020-12-16 at 12:29:40 +01, Kevin Wolf wrote: >>>> >>>>> Am 15.12.2020 um 20:01 hat David Edmondson geschrieben: >>>>>> When a call to fcntl(2) for the purpose of manipulating file locks >>>>>> fails, report the error returned by fcntl. >>>>>> >>>>>> Signed-off-by: David Edmondson <david.edmondson@oracle.com> >>>>> >>>>> Is appending "Resource temporarily unavailable" in the common case (a >>>>> file locked by another process) actually an improvement for the message >>>>> or more confusing? It would be good to mention the motivation for the >>>>> change in the commit message. >>>> >>>> Distinguishing between the common case and others is at least a start >>>> when trying to figure out why it failed. We have potentially useful >>>> information to assist in diagnosis, why throw it away? >>> >>> I agree in principle, just when I saw the result, it felt more confusing >>> than helpful. Maybe the best option would be to include the errno only >>> if it's different from EAGAIN? Then the qemu-iotests reference output >>> would probably stay unchanged, too. >> >> This is a pretty low-level error report even today - unless you >> understand the implementation then it doesn't shout "the file is being >> shared". > > Hm, certainly true for raw_apply_lock_bytes(), which shouldn't normally > fail, so I agree that we don't need to simplify messages there. > > From raw_check_lock_bytes(), you get messages like 'Failed to get > "write" lock', which I think is fairly obvious? I'm not saying that it's > perfect and couldn't be improved, of course, but it doesn't feel > horrible. > >> Given that, I don't see any value in eliding the EAGAIN message, as I >> have to know that the lack of the errno string means that it was EAGAIN, >> meaning that another process holds a lock. > > When you debug an error, you'll look at the code anyway. And a normal > user won't know what EAGAIN or "Resource temporarily unavailable" means > even if it's displayed. Temporarily sounds more like it will go away by > itself, not that I have to shut down a different process first. > > I wonder if anyone else has an opinion on this? If people think that > displaying it is preferable or it doesn't matter much, we can add it > despite my concerns. > [a bit offtopic, but related] I think, there are two kinds of errors: 1. Simple errors, when user (or in our case, most probably some vm management tool developer) understand the error, understand what he is doing wrong and don't report the problem to Qemu developers. These errors should look well, do not contain information which may confuse the user. 2. Errors that should not happen. Often the reason is some bug, sometimes, it's actually type [1] error, but user didn't understand what's going wrong.. These errors are reported to Qemu developers.. For these errors it doesn't matter how they look for the user, user doesn't understand what's going on anyway. These errors should contain as much information as possible.. And it's a pity, that in pursuit of well-formed error messages for [1], we loose information, and have to deal with well-formed, but not informative error messages, wasting time on debugging. Even such simple thing like determine, from what line of code the error comes not always obvious, due to different reasons. I'd prefer that error message always contain the call-stack with a function parameters, not saying about errno :)
diff --git a/block/file-posix.c b/block/file-posix.c index 9804681d5c..f866fc9742 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -836,7 +836,7 @@ static int raw_apply_lock_bytes(BDRVRawState *s, int fd, if ((perm_lock_bits & bit) && !(locked_perm & bit)) { ret = qemu_lock_fd(fd, off, 1, false); if (ret) { - error_setg(errp, "Failed to lock byte %d", off); + error_setg_errno(errp, -ret, "Failed to lock byte %d", off); return ret; } else if (s) { s->locked_perm |= bit; @@ -844,7 +844,7 @@ static int raw_apply_lock_bytes(BDRVRawState *s, int fd, } else if (unlock && (locked_perm & bit) && !(perm_lock_bits & bit)) { ret = qemu_unlock_fd(fd, off, 1); if (ret) { - error_setg(errp, "Failed to unlock byte %d", off); + error_setg_errno(errp, -ret, "Failed to unlock byte %d", off); return ret; } else if (s) { s->locked_perm &= ~bit; @@ -857,7 +857,7 @@ static int raw_apply_lock_bytes(BDRVRawState *s, int fd, if ((shared_perm_lock_bits & bit) && !(locked_shared_perm & bit)) { ret = qemu_lock_fd(fd, off, 1, false); if (ret) { - error_setg(errp, "Failed to lock byte %d", off); + error_setg_errno(errp, -ret, "Failed to lock byte %d", off); return ret; } else if (s) { s->locked_shared_perm |= bit; @@ -866,7 +866,7 @@ static int raw_apply_lock_bytes(BDRVRawState *s, int fd, !(shared_perm_lock_bits & bit)) { ret = qemu_unlock_fd(fd, off, 1); if (ret) { - error_setg(errp, "Failed to unlock byte %d", off); + error_setg_errno(errp, -ret, "Failed to unlock byte %d", off); return ret; } else if (s) { s->locked_shared_perm &= ~bit; @@ -890,9 +890,9 @@ static int raw_check_lock_bytes(int fd, uint64_t perm, uint64_t shared_perm, ret = qemu_lock_fd_test(fd, off, 1, true); if (ret) { char *perm_name = bdrv_perm_names(p); - error_setg(errp, - "Failed to get \"%s\" lock", - perm_name); + error_setg_errno(errp, -ret, + "Failed to get \"%s\" lock", + perm_name); g_free(perm_name); return ret; } @@ -905,9 +905,9 @@ static int raw_check_lock_bytes(int fd, uint64_t perm, uint64_t shared_perm, ret = qemu_lock_fd_test(fd, off, 1, true); if (ret) { char *perm_name = bdrv_perm_names(p); - error_setg(errp, - "Failed to get shared \"%s\" lock", - perm_name); + error_setg_errno(errp, -ret, + "Failed to get shared \"%s\" lock", + perm_name); g_free(perm_name); return ret; }
When a call to fcntl(2) for the purpose of manipulating file locks fails, report the error returned by fcntl. Signed-off-by: David Edmondson <david.edmondson@oracle.com> --- block/file-posix.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)