Message ID | 20210203124156.425775-1-ptikhomirov@virtuozzo.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fcntl: make F_GETOWN(EX) return 0 on dead owner task | expand |
On Wed, Feb 03, 2021 at 03:41:56PM +0300, Pavel Tikhomirov wrote: > Currently there is no way to differentiate the file with alive owner > from the file with dead owner but pid of the owner reused. That's why > CRIU can't actually know if it needs to restore file owner or not, > because if it restores owner but actual owner was dead, this can > introduce unexpected signals to the "false"-owner (which reused the > pid). Hi! Thanks for the patch. You know I manage to forget the fowner internals. Could you please enlighten me -- when owner is set with some pid we do f_setown_ex __f_setown f_modown filp->f_owner.pid = get_pid(pid); Thus pid get refcount incremented. Then the owner exits but refcounter should be still up and running and pid should not be reused, no? Or I miss something obvious? The patch itself looks ok on a first glance.
On 2/3/21 10:32 PM, Cyrill Gorcunov wrote: > On Wed, Feb 03, 2021 at 03:41:56PM +0300, Pavel Tikhomirov wrote: >> Currently there is no way to differentiate the file with alive owner >> from the file with dead owner but pid of the owner reused. That's why >> CRIU can't actually know if it needs to restore file owner or not, >> because if it restores owner but actual owner was dead, this can >> introduce unexpected signals to the "false"-owner (which reused the >> pid). > > Hi! Thanks for the patch. You know I manage to forget the fowner internals. > Could you please enlighten me -- when owner is set with some pid we do > > f_setown_ex > __f_setown > f_modown > filp->f_owner.pid = get_pid(pid); > > Thus pid get refcount incremented. Hi, and yes you are right about refcount is held. Then the owner exits but refcounter > should be still up and running and pid should not be reused, no? Or > I miss something obvious? AFAICS if pid is held only by 1) fowner refcount and by 2) single process (without threads, group and session for simplicity), on process exit we go through: do_exit exit_notify release_task __exit_signal __unhash_process detach_pid __change_pid free_pid idr_remove So pid is removed from idr, and after that alloc_pid can reuse pid numbers even if old pid structure is still alive and is still held by fowner. Also I've added criu-zdtm test which reproduces the problem: https://src.openvz.org/projects/OVZ/repos/criu/commits/e25904c35dbc535f6837e55da58ca0f5a5caf4b3#test/zdtm/static/file_fown_reuse.c Hope this answers your question, Thanks! > > The patch itself looks ok on a first glance. >
On Thu, Feb 04, 2021 at 12:35:42AM +0300, Pavel Tikhomirov wrote: > > AFAICS if pid is held only by 1) fowner refcount and by 2) single process > (without threads, group and session for simplicity), on process exit we go > through: > > do_exit > exit_notify > release_task > __exit_signal > __unhash_process > detach_pid > __change_pid > free_pid > idr_remove > > So pid is removed from idr, and after that alloc_pid can reuse pid numbers > even if old pid structure is still alive and is still held by fowner. ... > Hope this answers your question, Thanks! Yeah, indeed, thanks! So the change is sane still I'm a bit worried about backward compatibility, gimme some time I'll try to refresh my memory first, in a couple of days or weekend (though here are a number of experienced developers CC'ed maybe they reply even faster).
On Wed, Feb 03, 2021 at 03:41:56PM +0300, Pavel Tikhomirov wrote: > Currently there is no way to differentiate the file with alive owner > from the file with dead owner but pid of the owner reused. That's why > CRIU can't actually know if it needs to restore file owner or not, > because if it restores owner but actual owner was dead, this can > introduce unexpected signals to the "false"-owner (which reused the > pid). > > Let's change the api, so that F_GETOWN(EX) returns 0 in case actual > owner is dead already. > > Cc: Jeff Layton <jlayton@kernel.org> > Cc: "J. Bruce Fields" <bfields@fieldses.org> > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > Cc: linux-fsdevel@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: Cyrill Gorcunov <gorcunov@gmail.com> > Cc: Andrei Vagin <avagin@gmail.com> > Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> I can't imagine a scenario where we could break some backward compatibility with this change, so Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
On Thu, 2021-02-04 at 01:17 +0300, Cyrill Gorcunov wrote: > On Thu, Feb 04, 2021 at 12:35:42AM +0300, Pavel Tikhomirov wrote: > > > > AFAICS if pid is held only by 1) fowner refcount and by 2) single process > > (without threads, group and session for simplicity), on process exit we go > > through: > > > > do_exit > > exit_notify > > release_task > > __exit_signal > > __unhash_process > > detach_pid > > __change_pid > > free_pid > > idr_remove > > > > So pid is removed from idr, and after that alloc_pid can reuse pid numbers > > even if old pid structure is still alive and is still held by fowner. > ... > > Hope this answers your question, Thanks! > > Yeah, indeed, thanks! So the change is sane still I'm > a bit worried about backward compatibility, gimme some > time I'll try to refresh my memory first, in a couple > of days or weekend (though here are a number of experienced > developers CC'ed maybe they reply even faster). I always find it helpful to refer to the POSIX spec [1] for this sort of thing. In this case, it says: F_GETOWN If fildes refers to a socket, get the process ID or process group ID specified to receive SIGURG signals when out-of-band data is available. Positive values shall indicate a process ID; negative values, other than -1, shall indicate a process group ID; the value zero shall indicate that no SIGURG signals are to be sent. If fildes does not refer to a socket, the results are unspecified. In the event that the PID is reused, the kernel won't send signals to the replacement task, correct? Assuming that's the case, then this patch looks fine to me too. I'll plan to pick it for linux-next later today, and we can hopefully get this into v5.12. [1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/fcntl.html#tag_16_122
On 2/8/21 3:31 PM, Jeff Layton wrote: > On Thu, 2021-02-04 at 01:17 +0300, Cyrill Gorcunov wrote: >> On Thu, Feb 04, 2021 at 12:35:42AM +0300, Pavel Tikhomirov wrote: >>> >>> AFAICS if pid is held only by 1) fowner refcount and by 2) single process >>> (without threads, group and session for simplicity), on process exit we go >>> through: >>> >>> do_exit >>> exit_notify >>> release_task >>> __exit_signal >>> __unhash_process >>> detach_pid >>> __change_pid >>> free_pid >>> idr_remove >>> >>> So pid is removed from idr, and after that alloc_pid can reuse pid numbers >>> even if old pid structure is still alive and is still held by fowner. >> ... >>> Hope this answers your question, Thanks! >> >> Yeah, indeed, thanks! So the change is sane still I'm >> a bit worried about backward compatibility, gimme some >> time I'll try to refresh my memory first, in a couple >> of days or weekend (though here are a number of experienced >> developers CC'ed maybe they reply even faster). > > I always find it helpful to refer to the POSIX spec [1] for this sort of > thing. In this case, it says: > > F_GETOWN > If fildes refers to a socket, get the process ID or process group ID > specified to receive SIGURG signals when out-of-band data is available. > Positive values shall indicate a process ID; negative values, other than > -1, shall indicate a process group ID; the value zero shall indicate > that no SIGURG signals are to be sent. If fildes does not refer to a > socket, the results are unspecified. > > In the event that the PID is reused, the kernel won't send signals to > the replacement task, correct? Correct. Looks like only places to send signal to owner are send_sigio() and send_sigurg() (at least nobody else dereferences fown->pid_type). And in both places we lookup for task to send signal to with pid_task() or do_each_pid_task() (similar to what I do in patch) and will not find any task if pid was reused. Thus no signal would be sent. > Assuming that's the case, then this patch > looks fine to me too. I'll plan to pick it for linux-next later today, > and we can hopefully get this into v5.12. > > [1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/fcntl.html#tag_16_122 > Thanks for finding it!
On Mon, 2021-02-08 at 15:57 +0300, Pavel Tikhomirov wrote: > > On 2/8/21 3:31 PM, Jeff Layton wrote: > > On Thu, 2021-02-04 at 01:17 +0300, Cyrill Gorcunov wrote: > > > On Thu, Feb 04, 2021 at 12:35:42AM +0300, Pavel Tikhomirov wrote: > > > > > > > > AFAICS if pid is held only by 1) fowner refcount and by 2) single process > > > > (without threads, group and session for simplicity), on process exit we go > > > > through: > > > > > > > > do_exit > > > > exit_notify > > > > release_task > > > > __exit_signal > > > > __unhash_process > > > > detach_pid > > > > __change_pid > > > > free_pid > > > > idr_remove > > > > > > > > So pid is removed from idr, and after that alloc_pid can reuse pid numbers > > > > even if old pid structure is still alive and is still held by fowner. > > > ... > > > > Hope this answers your question, Thanks! > > > > > > Yeah, indeed, thanks! So the change is sane still I'm > > > a bit worried about backward compatibility, gimme some > > > time I'll try to refresh my memory first, in a couple > > > of days or weekend (though here are a number of experienced > > > developers CC'ed maybe they reply even faster). > > > > I always find it helpful to refer to the POSIX spec [1] for this sort of > > thing. In this case, it says: > > > > F_GETOWN > > If fildes refers to a socket, get the process ID or process group ID > > specified to receive SIGURG signals when out-of-band data is available. > > Positive values shall indicate a process ID; negative values, other than > > -1, shall indicate a process group ID; the value zero shall indicate > > that no SIGURG signals are to be sent. If fildes does not refer to a > > socket, the results are unspecified. > > > > In the event that the PID is reused, the kernel won't send signals to > > the replacement task, correct? > > Correct. Looks like only places to send signal to owner are send_sigio() > and send_sigurg() (at least nobody else dereferences fown->pid_type). > And in both places we lookup for task to send signal to with pid_task() > or do_each_pid_task() (similar to what I do in patch) and will not find > any task if pid was reused. Thus no signal would be sent. > Thanks for confirming it. I queued it up for linux-next (with a small amendment to the changelog), and it should be there later today or tomorrow. It might not hurt to roll up a manpage patch too if you have the cycles. It'd be nice to spell out what a 0 return means there. > > Assuming that's the case, then this patch > > looks fine to me too. I'll plan to pick it for linux-next later today, > > and we can hopefully get this into v5.12. > > > > [1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/fcntl.html#tag_16_122 > > > > Thanks for finding it! > No problem. That site is worth bookmarking for this sort of thing... ;)
diff --git a/fs/fcntl.c b/fs/fcntl.c index 05b36b28f2e8..483ef8861376 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -148,11 +148,15 @@ void f_delown(struct file *filp) pid_t f_getown(struct file *filp) { - pid_t pid; + pid_t pid = 0; read_lock(&filp->f_owner.lock); - pid = pid_vnr(filp->f_owner.pid); - if (filp->f_owner.pid_type == PIDTYPE_PGID) - pid = -pid; + rcu_read_lock(); + if (pid_task(filp->f_owner.pid, filp->f_owner.pid_type)) { + pid = pid_vnr(filp->f_owner.pid); + if (filp->f_owner.pid_type == PIDTYPE_PGID) + pid = -pid; + } + rcu_read_unlock(); read_unlock(&filp->f_owner.lock); return pid; } @@ -200,11 +204,14 @@ static int f_setown_ex(struct file *filp, unsigned long arg) static int f_getown_ex(struct file *filp, unsigned long arg) { struct f_owner_ex __user *owner_p = (void __user *)arg; - struct f_owner_ex owner; + struct f_owner_ex owner = {}; int ret = 0; read_lock(&filp->f_owner.lock); - owner.pid = pid_vnr(filp->f_owner.pid); + rcu_read_lock(); + if (pid_task(filp->f_owner.pid, filp->f_owner.pid_type)) + owner.pid = pid_vnr(filp->f_owner.pid); + rcu_read_unlock(); switch (filp->f_owner.pid_type) { case PIDTYPE_PID: owner.type = F_OWNER_TID;
Currently there is no way to differentiate the file with alive owner from the file with dead owner but pid of the owner reused. That's why CRIU can't actually know if it needs to restore file owner or not, because if it restores owner but actual owner was dead, this can introduce unexpected signals to the "false"-owner (which reused the pid). Let's change the api, so that F_GETOWN(EX) returns 0 in case actual owner is dead already. Cc: Jeff Layton <jlayton@kernel.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Andrei Vagin <avagin@gmail.com> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> --- fs/fcntl.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)