diff mbox series

nfsd: Add support for mapping sticky bit into NFS4 ACL

Message ID 20240912221523.23648-1-pali@kernel.org (mailing list archive)
State New
Headers show
Series nfsd: Add support for mapping sticky bit into NFS4 ACL | expand

Commit Message

Pali Rohár Sept. 12, 2024, 10:15 p.m. UTC
Currently the directory restricted deletion flag (sticky bit, SVTX) is not
mapped into NFS4 ACL by knfsd. Which means that the NFS4 ACL client is not
aware of the fact that it cannot delete some files if the sticky bit is set
on the server on parent directory. If the client copies sticky bit
directory recursively to some other storage which implements the NFS4-like
ACL (e.g. to NTFS filesystem or to SMB server) then the directory's
restricted deletion flag is completely lost.

This change aims to improve interoperability of the POSIX directory
restricted deletion flag (sticky bit, SVTX) and NFS4 ACL structure in
knfsd. It transparently maps POSIX directory's sticky bit into NFS4 ACL
and vice-versa similarly like it already maps POSIX ACL into NFS4 ACL.
It covers NFS4 GETATTR ACL, NFS4 SETATTR ACL, NFS4 CREATE+OPEN operations.
When creating a new object over NFS4, it is possible to optionally specify
NFS4 ACL, so this is covered too.

For client SETATTR ACL, CREATE with ACL and OPEN-create with ACL
operations, detection pattern for restricted deletion flag is ACE entry
INHERIT_ONLY NO_PROPAGATE ALLOW OWNER@ DELETE together with check that
nobody except the OWNER@ has DELETE_CHILD permission. Note that the OWNER@
does not have to have DELETE_CHILD permission in case it does not have
write access to directory.

For client GETATTR ACL operation, when restricted deletion flag is present
in inode, following ACE entries are prepended before any other ACEs:

  ALLOW OWNER@ DELETE_CHILD                                 (1)
  DENY EVERYONE@ DELETE_CHILD
  INHERIT_ONLY NO_PROPAGATE DENY user_owner_of_dir DELETE   (3)
  INHERIT_ONLY NO_PROPAGATE DENY OWNER@ DELETE              (4)
  INHERIT_ONLY NO_PROPAGATE DENY user1 DELETE               (ACL user1)
  ...
  INHERIT_ONLY NO_PROPAGATE DENY userN DELETE               (ACL userN)
  INHERIT_ONLY NO_PROPAGATE ALLOW OWNER@ DELETE

ACE entry (1) is present only when user OWNER@ has write access (can modify
directory, including removing child entries), because it is possible to
have sticky bit set also on read-only directory.

ACE entry (3) is present only when user OWNER@ does not have write access
(cannot modify directory) and it is required to override effect of the last
ACE entry which allows child entry OWNER@ to remove entry itself. This is
needed for example for POSIX mode 1577.

ACE entry (4) is present only when anybody else except the directory owner
has no write access to the directory. This is the case when sticky bit is
set but nobody can use it because of missing directory write access. So
this explicit DENY covers this edge case.

ACE entries (ACL user1...userN) are for POSIX users which do not have write
access to directory and therefore they cannot remove directory entries
which they own. This is again needed for overriding the effect of the last
ACE entry.

When restricted deletion flag is not present then nothing is added.

This is probably the best approximation of the directory restricted
deletion flag. It covers directory's OWNER@ grant access, child OWNER@
grant access and also restrictions for all other users. It also covers the
situation when OWNER@ or some POSIX user does not have write access to the
directory, and also covers situation when nobody except directory owner has
write access to the directory.

What this does not cover is the restriction when some POSIX group does not
have directory write access, and another POSIX group has directory write
access. This is probably not possible to express in NFS4 ACL language
without ability to evaluate if user is member of some group or not. NFS4
ACL in this case says for the owner of the file that the delete operation
is allowed.

The whole change is only about the mapping of the sticky bit into NFS4 ACL
and only for NFS4 GET and SET ACL operations. It does not change any access
permission checks.

Signed-off-by: Pali Rohár <pali@kernel.org>
---
 fs/nfsd/acl.h      |   2 +-
 fs/nfsd/nfs4acl.c  | 242 ++++++++++++++++++++++++++++++++++++++++++---
 fs/nfsd/nfs4proc.c |  31 +++++-
 3 files changed, 255 insertions(+), 20 deletions(-)

Comments

Chuck Lever Sept. 17, 2024, 3:39 p.m. UTC | #1
On Fri, Sep 13, 2024 at 12:15:23AM +0200, Pali Rohár wrote:
> Currently the directory restricted deletion flag (sticky bit, SVTX) is not
> mapped into NFS4 ACL by knfsd. Which means that the NFS4 ACL client is not
> aware of the fact that it cannot delete some files if the sticky bit is set
> on the server on parent directory. If the client copies sticky bit
> directory recursively to some other storage which implements the NFS4-like
> ACL (e.g. to NTFS filesystem or to SMB server) then the directory's
> restricted deletion flag is completely lost.
> 
> This change aims to improve interoperability of the POSIX directory
> restricted deletion flag (sticky bit, SVTX) and NFS4 ACL structure in
> knfsd. It transparently maps POSIX directory's sticky bit into NFS4 ACL
> and vice-versa similarly like it already maps POSIX ACL into NFS4 ACL.
> It covers NFS4 GETATTR ACL, NFS4 SETATTR ACL, NFS4 CREATE+OPEN operations.
> When creating a new object over NFS4, it is possible to optionally specify
> NFS4 ACL, so this is covered too.
> 
> For client SETATTR ACL, CREATE with ACL and OPEN-create with ACL
> operations, detection pattern for restricted deletion flag is ACE entry
> INHERIT_ONLY NO_PROPAGATE ALLOW OWNER@ DELETE together with check that
> nobody except the OWNER@ has DELETE_CHILD permission. Note that the OWNER@
> does not have to have DELETE_CHILD permission in case it does not have
> write access to directory.
> 
> For client GETATTR ACL operation, when restricted deletion flag is present
> in inode, following ACE entries are prepended before any other ACEs:
> 
>   ALLOW OWNER@ DELETE_CHILD                                 (1)
>   DENY EVERYONE@ DELETE_CHILD
>   INHERIT_ONLY NO_PROPAGATE DENY user_owner_of_dir DELETE   (3)
>   INHERIT_ONLY NO_PROPAGATE DENY OWNER@ DELETE              (4)
>   INHERIT_ONLY NO_PROPAGATE DENY user1 DELETE               (ACL user1)
>   ...
>   INHERIT_ONLY NO_PROPAGATE DENY userN DELETE               (ACL userN)
>   INHERIT_ONLY NO_PROPAGATE ALLOW OWNER@ DELETE
> 
> ACE entry (1) is present only when user OWNER@ has write access (can modify
> directory, including removing child entries), because it is possible to
> have sticky bit set also on read-only directory.
> 
> ACE entry (3) is present only when user OWNER@ does not have write access
> (cannot modify directory) and it is required to override effect of the last
> ACE entry which allows child entry OWNER@ to remove entry itself. This is
> needed for example for POSIX mode 1577.
> 
> ACE entry (4) is present only when anybody else except the directory owner
> has no write access to the directory. This is the case when sticky bit is
> set but nobody can use it because of missing directory write access. So
> this explicit DENY covers this edge case.
> 
> ACE entries (ACL user1...userN) are for POSIX users which do not have write
> access to directory and therefore they cannot remove directory entries
> which they own. This is again needed for overriding the effect of the last
> ACE entry.
> 
> When restricted deletion flag is not present then nothing is added.
> 
> This is probably the best approximation of the directory restricted
> deletion flag. It covers directory's OWNER@ grant access, child OWNER@
> grant access and also restrictions for all other users. It also covers the
> situation when OWNER@ or some POSIX user does not have write access to the
> directory, and also covers situation when nobody except directory owner has
> write access to the directory.
> 
> What this does not cover is the restriction when some POSIX group does not
> have directory write access, and another POSIX group has directory write
> access. This is probably not possible to express in NFS4 ACL language
> without ability to evaluate if user is member of some group or not. NFS4
> ACL in this case says for the owner of the file that the delete operation
> is allowed.
> 
> The whole change is only about the mapping of the sticky bit into NFS4 ACL
> and only for NFS4 GET and SET ACL operations. It does not change any access
> permission checks.
> 
> Signed-off-by: Pali Rohár <pali@kernel.org>
> ---
>  fs/nfsd/acl.h      |   2 +-
>  fs/nfsd/nfs4acl.c  | 242 ++++++++++++++++++++++++++++++++++++++++++---
>  fs/nfsd/nfs4proc.c |  31 +++++-
>  3 files changed, 255 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/nfsd/acl.h b/fs/nfsd/acl.h
> index 4b7324458a94..e7e7909bf03a 100644
> --- a/fs/nfsd/acl.h
> +++ b/fs/nfsd/acl.h
> @@ -48,6 +48,6 @@ __be32 nfs4_acl_write_who(struct xdr_stream *xdr, int who);
>  int nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
>  		struct nfs4_acl **acl);
>  __be32 nfsd4_acl_to_attr(enum nfs_ftype4 type, struct nfs4_acl *acl,
> -			 struct nfsd_attrs *attr);
> +			 struct nfsd_attrs *attr, int *dir_sticky);
>  
>  #endif /* LINUX_NFS4_ACL_H */
> diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c
> index 96e786b5e544..6a53772c2a13 100644
> --- a/fs/nfsd/nfs4acl.c
> +++ b/fs/nfsd/nfs4acl.c
> @@ -46,6 +46,7 @@
>  #define NFS4_ACL_TYPE_DEFAULT	0x01
>  #define NFS4_ACL_DIR		0x02
>  #define NFS4_ACL_OWNER		0x04
> +#define NFS4_ACL_DIR_STICKY	0x08
>  
>  /* mode bit translations: */
>  #define NFS4_READ_MODE (NFS4_ACE_READ_DATA)
> @@ -73,7 +74,7 @@ mask_from_posix(unsigned short perm, unsigned int flags)
>  		mask |= NFS4_READ_MODE;
>  	if (perm & ACL_WRITE)
>  		mask |= NFS4_WRITE_MODE;
> -	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR))
> +	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR) && !(flags & NFS4_ACL_DIR_STICKY))
>  		mask |= NFS4_ACE_DELETE_CHILD;
>  	if (perm & ACL_EXECUTE)
>  		mask |= NFS4_EXECUTE_MODE;
> @@ -89,7 +90,7 @@ deny_mask_from_posix(unsigned short perm, u32 flags)
>  		mask |= NFS4_READ_MODE;
>  	if (perm & ACL_WRITE)
>  		mask |= NFS4_WRITE_MODE;
> -	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR))
> +	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR) && !(flags & NFS4_ACL_DIR_STICKY))
>  		mask |= NFS4_ACE_DELETE_CHILD;
>  	if (perm & ACL_EXECUTE)
>  		mask |= NFS4_EXECUTE_MODE;
> @@ -110,7 +111,7 @@ low_mode_from_nfs4(u32 perm, unsigned short *mode, unsigned int flags)
>  {
>  	u32 write_mode = NFS4_WRITE_MODE;
>  
> -	if (flags & NFS4_ACL_DIR)
> +	if ((flags & NFS4_ACL_DIR) && !(flags | NFS4_ACL_DIR_STICKY))
>  		write_mode |= NFS4_ACE_DELETE_CHILD;
>  	*mode = 0;
>  	if ((perm & NFS4_READ_MODE) == NFS4_READ_MODE)
> @@ -123,7 +124,7 @@ low_mode_from_nfs4(u32 perm, unsigned short *mode, unsigned int flags)
>  
>  static short ace2type(struct nfs4_ace *);
>  static void _posix_to_nfsv4_one(struct posix_acl *, struct nfs4_acl *,
> -				unsigned int);
> +				unsigned int, kuid_t);
>  
>  int
>  nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
> @@ -142,8 +143,11 @@ nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
>  	if (IS_ERR(pacl))
>  		return PTR_ERR(pacl);
>  
> -	/* allocate for worst case: one (deny, allow) pair each: */
> -	size += 2 * pacl->a_count;
> +	/*
> +	 * allocate for worst case: one (deny, allow) pair for each
> +	 * plus for restricted deletion flag (sticky bit): 4 + one for each
> +	 */
> +	size += 2 * pacl->a_count + (4 + pacl->a_count);
>  
>  	if (S_ISDIR(inode->i_mode)) {
>  		flags = NFS4_ACL_DIR;
> @@ -155,6 +159,10 @@ nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
>  
>  		if (dpacl)
>  			size += 2 * dpacl->a_count;
> +
> +		/* propagate restricted deletion flag (sticky bit) */
> +		if (inode->i_mode & S_ISVTX)
> +			flags |= NFS4_ACL_DIR_STICKY;
>  	}
>  
>  	*acl = kmalloc(nfs4_acl_bytes(size), GFP_KERNEL);
> @@ -164,10 +172,10 @@ nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
>  	}
>  	(*acl)->naces = 0;
>  
> -	_posix_to_nfsv4_one(pacl, *acl, flags & ~NFS4_ACL_TYPE_DEFAULT);
> +	_posix_to_nfsv4_one(pacl, *acl, flags & ~NFS4_ACL_TYPE_DEFAULT, inode->i_uid);
>  
>  	if (dpacl)
> -		_posix_to_nfsv4_one(dpacl, *acl, flags | NFS4_ACL_TYPE_DEFAULT);
> +		_posix_to_nfsv4_one(dpacl, *acl, (flags | NFS4_ACL_TYPE_DEFAULT) & ~NFS4_ACL_DIR_STICKY, inode->i_uid);
>  
>  out:
>  	posix_acl_release(dpacl);
> @@ -231,7 +239,7 @@ summarize_posix_acl(struct posix_acl *acl, struct posix_acl_summary *pas)
>  /* We assume the acl has been verified with posix_acl_valid. */
>  static void
>  _posix_to_nfsv4_one(struct posix_acl *pacl, struct nfs4_acl *acl,
> -						unsigned int flags)
> +		    unsigned int flags, kuid_t owner_uid)
>  {
>  	struct posix_acl_entry *pa, *group_owner_entry;
>  	struct nfs4_ace *ace;
> @@ -243,9 +251,149 @@ _posix_to_nfsv4_one(struct posix_acl *pacl, struct nfs4_acl *acl,
>  	BUG_ON(pacl->a_count < 3);
>  	summarize_posix_acl(pacl, &pas);
>  
> -	pa = pacl->a_entries;
>  	ace = acl->aces + acl->naces;
>  
> +	/*
> +	 * Translate restricted deletion flag (sticky bit, SVTX) set on the
> +	 * directory to following NFS4 ACEs prepended before any other ACEs:
> +	 *
> +	 *   ALLOW OWNER@ DELETE_CHILD                                 (1)
> +	 *   DENY EVERYONE@ DELETE_CHILD
> +	 *   INHERIT_ONLY NO_PROPAGATE DENY user_owner_of_dir DELETE   (3)
> +	 *   INHERIT_ONLY NO_PROPAGATE DENY OWNER@ DELETE              (4)
> +	 *   INHERIT_ONLY NO_PROPAGATE DENY user1 DELETE               (ACL user1)
> +	 *   ...
> +	 *   INHERIT_ONLY NO_PROPAGATE DENY userN DELETE               (ACL userN)
> +	 *   INHERIT_ONLY NO_PROPAGATE ALLOW OWNER@ DELETE
> +	 *
> +	 *   (1) - only if user-owner has write access
> +	 *   (3) - only if user-owner does not have write access
> +	 *   (4) - only if non-user-owner does not have write access
> +	 *   (ACL user1) - only if user1 does not have write access
> +	 *   (ACL userN) - only if userN does not have write access
> +	 *
> +	 * These ACEs describe behavior of set restricted deletion flag (sticky
> +	 * bit) on directory as they allow only owner of individual child entries
> +	 * and owner of the directory to delete individual child entries.
> +	 * Everyone else is denied to remove child entries in this directory.
> +	 *
> +	 * For deleting entry in NFS4 ACL model it is needed either DELETE_CHILD
> +	 * permission (access mask) from the parent directory or DELETE permission
> +	 * (access mask) on the child. Just one of these two permissions is enough.
> +	 * So if there is explicit DENY DELETE_CHILD on the parent together with
> +	 * explicit ALLOW DELETE on the child, it means that deleting is allowed
> +	 * (evaluation of permissions is independent in NFS4 ACL model). OWNER@
> +	 * for inherited ACEs means owner of the child entry and not the owner
> +	 * of the parent from which was ACE inherited.
> +	 *
> +	 * This translation is imperfect just for a case when some group
> +	 * (including group-owner and others-group) does not have write access
> +	 * to directory. Handled is only the edge case (via rule 4) when
> +	 * everyone else except owner has no write access to the directory.
> +	 * This information is not present in NFS4 ACL because it looks like that
> +	 * this is not possible to express in this ACL model. So for ACL consumer
> +	 * it could look like that owner of the file can delete its own file even
> +	 * when group or other mode bits of the directory do not allow it.
> +	 */
> +	if (flags & NFS4_ACL_DIR_STICKY) {
> +		/*
> +		 * Explicitly allow directory owner to delete child entries
> +		 * if directory owner has write access
> +		 */
> +		if (pas.owner & ACL_WRITE) {
> +			ace->type = NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE;
> +			ace->flag = 0;
> +			ace->access_mask = NFS4_ACE_DELETE_CHILD;
> +			ace->whotype = NFS4_ACL_WHO_OWNER;
> +			ace++;
> +			acl->naces++;
> +		}
> +
> +		/*
> +		 * And then deny deleting child entries for all other users
> +		 * which do not have explicit DELETE permission granted by
> +		 * last rule in this section.
> +		 */
> +		ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
> +		ace->flag = 0;
> +		ace->access_mask = NFS4_ACE_DELETE_CHILD;
> +		ace->whotype = NFS4_ACL_WHO_EVERYONE;
> +		ace++;
> +		acl->naces++;
> +
> +		/*
> +		 * Do not grant directory owner to delete child entries (by the
> +		 * last rule in this section) if it does not have write access.
> +		 */
> +		if (!(pas.owner & ACL_WRITE)) {
> +			ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
> +			ace->flag = NFS4_INHERITANCE_FLAGS |
> +				    NFS4_ACE_INHERIT_ONLY_ACE |
> +				    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
> +			ace->access_mask = NFS4_ACE_DELETE;
> +			ace->whotype = NFS4_ACL_WHO_NAMED;
> +			ace->who_uid = owner_uid;
> +			ace++;
> +			acl->naces++;
> +		}
> +
> +		if (!(pas.users & ACL_WRITE) && !(pas.group & ACL_WRITE) &&
> +		    !(pas.groups & ACL_WRITE) && !(pas.other & ACL_WRITE)) {
> +			/*
> +			 * Do not grant child owner who is not directory owner
> +			 * (handled by the first rule in this section) to
> +			 * delete own child entries if there is no possible
> +			 * directory write permission (checked for named users,
> +			 * group-owner, named groups and others-groups).
> +			 * This handles special edge case when only directory
> +			 * owner has write access to directory.
> +			 */
> +			ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
> +			ace->flag = NFS4_INHERITANCE_FLAGS |
> +				    NFS4_ACE_INHERIT_ONLY_ACE |
> +				    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
> +			ace->access_mask = NFS4_ACE_DELETE;
> +			ace->whotype = NFS4_ACL_WHO_OWNER;
> +			ace++;
> +			acl->naces++;
> +		} else {
> +			/*
> +			 * Do not grant individual named users to delete child
> +			 * entries (by the last rule in this section) if user
> +			 * does not have write access to directory.
> +			 */
> +			for (pa = pacl->a_entries + 1; pa->e_tag == ACL_USER; pa++) {
> +				if (pa->e_perm & pas.mask & ACL_WRITE)
> +					continue;
> +				ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
> +				ace->flag = NFS4_INHERITANCE_FLAGS |
> +					    NFS4_ACE_INHERIT_ONLY_ACE |
> +					    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
> +				ace->access_mask = NFS4_ACE_DELETE;
> +				ace->whotype = NFS4_ACL_WHO_NAMED;
> +				ace->who_uid = pa->e_uid;
> +				ace++;
> +				acl->naces++;
> +			}
> +		}
> +
> +		/*
> +		 * Above rules filtered out users which do not have write access
> +		 * to the directory. Now allow child-owner to delete its own
> +		 * directory entries.
> +		 */
> +		ace->type = NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE;
> +		ace->flag = NFS4_INHERITANCE_FLAGS |
> +			    NFS4_ACE_INHERIT_ONLY_ACE |
> +			    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
> +		ace->access_mask = NFS4_ACE_DELETE;
> +		ace->whotype = NFS4_ACL_WHO_OWNER;
> +		ace++;
> +		acl->naces++;
> +	}
> +
> +	pa = pacl->a_entries;
> +
>  	/* We could deny everything not granted by the owner: */
>  	deny = ~pas.owner;
>  	/*
> @@ -517,7 +665,8 @@ posix_state_to_acl(struct posix_acl_state *state, unsigned int flags)
>  
>  	pace = pacl->a_entries;
>  	pace->e_tag = ACL_USER_OBJ;
> -	low_mode_from_nfs4(state->owner.allow, &pace->e_perm, flags);
> +	/* owner is never affected by restricted deletion flag, so clear NFS4_ACL_DIR_STICKY */
> +	low_mode_from_nfs4(state->owner.allow, &pace->e_perm, flags & ~NFS4_ACL_DIR_STICKY);
>  
>  	for (i=0; i < state->users->n; i++) {
>  		pace++;
> @@ -691,9 +840,14 @@ static void process_one_v4_ace(struct posix_acl_state *state,
>  
>  static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
>  		struct posix_acl **pacl, struct posix_acl **dpacl,
> +		int *dir_sticky,
>  		unsigned int flags)
>  {
>  	struct posix_acl_state effective_acl_state, default_acl_state;
> +	bool dir_allow_nonowner_delete_child = false;
> +	bool dir_deny_everyone_delete_child = false;
> +	bool dir_allow_child_owner_delete = false;
> +	unsigned int eflags = 0;
>  	struct nfs4_ace *ace;
>  	int ret;
>  
> @@ -705,6 +859,28 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
>  		goto out_estate;
>  	ret = -EINVAL;
>  	for (ace = acl->aces; ace < acl->aces + acl->naces; ace++) {
> +		/*
> +		 * Process and parse ACE entry INHERIT_ONLY NO_PROPAGATE DELETE
> +		 * for detecting restricted deletion flag (sticky bit). Allow
> +		 * SYNCHRONIZE access mask to be present as NFS4 clients can
> +		 * include this access mask together with any other one.
> +		 *
> +		 * It needs to be done before validation as NFS4_SUPPORTED_FLAGS
> +		 * does not contain NFS4_ACE_NO_PROPAGATE_INHERIT_ACE and this
> +		 * ACE must not throw error.
> +		 */
> +		if ((flags & NFS4_ACL_DIR) &&
> +		    !(ace->flag & ~(NFS4_SUPPORTED_FLAGS|NFS4_ACE_NO_PROPAGATE_INHERIT_ACE)) &&
> +		    (ace->flag & NFS4_INHERITANCE_FLAGS) &&
> +		    (ace->flag & NFS4_ACE_INHERIT_ONLY_ACE) &&
> +		    (ace->flag & NFS4_ACE_NO_PROPAGATE_INHERIT_ACE) &&
> +		    (ace->access_mask & ~NFS4_ACE_SYNCHRONIZE) == NFS4_ACE_DELETE) {
> +			if (ace->type == NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE &&
> +			    ace->whotype == NFS4_ACL_WHO_OWNER)
> +				dir_allow_child_owner_delete = true;
> +			continue;
> +		}
> +
>  		if (ace->type != NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE &&
>  		    ace->type != NFS4_ACE_ACCESS_DENIED_ACE_TYPE)
>  			goto out_dstate;
> @@ -725,6 +901,38 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
>  
>  		if (!(ace->flag & NFS4_ACE_INHERIT_ONLY_ACE))
>  			process_one_v4_ace(&effective_acl_state, ace);
> +
> +		/*
> +		 * Process and parse ACE entry with DELETE_CHILD access mask
> +		 * for detecting restricted deletion flag (sticky bit).
> +		 */
> +		if ((flags & NFS4_ACL_DIR) &&
> +		    !(ace->flag & NFS4_ACE_INHERIT_ONLY_ACE) &&
> +		    (ace->access_mask & NFS4_ACE_DELETE_CHILD)) {
> +			if (ace->type == NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE &&
> +			    !dir_deny_everyone_delete_child &&
> +			    ace->whotype != NFS4_ACL_WHO_OWNER)
> +				dir_allow_nonowner_delete_child = true;
> +			else if (ace->type == NFS4_ACE_ACCESS_DENIED_ACE_TYPE &&
> +				 ace->whotype == NFS4_ACL_WHO_EVERYONE)
> +				dir_deny_everyone_delete_child = true;
> +		}
> +	}
> +
> +	/*
> +	 * Recognize restricted deletion flag (sticky bit) from directory ACL
> +	 * if ACEs on directory allow only owner of directory child entry to
> +	 * delete entry itself.
> +	 *
> +	 * This is relaxed check for rules generated by _posix_to_nfsv4_one().
> +	 * Relaxed check of restricted deletion flag is for security reasons
> +	 * and means that permissions would be more stricter, to prevent
> +	 * granting more access than what was specified in NFS4 ACL packet.
> +	 */
> +	if (flags & NFS4_ACL_DIR) {
> +		*dir_sticky = !dir_allow_nonowner_delete_child && dir_allow_child_owner_delete;
> +		if (*dir_sticky)
> +			eflags |= NFS4_ACL_DIR_STICKY;
>  	}
>  
>  	/*
> @@ -750,7 +958,7 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
>  			default_acl_state.other = effective_acl_state.other;
>  	}
>  
> -	*pacl = posix_state_to_acl(&effective_acl_state, flags);
> +	*pacl = posix_state_to_acl(&effective_acl_state, flags | eflags);
>  	if (IS_ERR(*pacl)) {
>  		ret = PTR_ERR(*pacl);
>  		*pacl = NULL;
> @@ -776,19 +984,23 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
>  }
>  
>  __be32 nfsd4_acl_to_attr(enum nfs_ftype4 type, struct nfs4_acl *acl,
> -			 struct nfsd_attrs *attr)
> +			 struct nfsd_attrs *attr, int *dir_sticky)
>  {
>  	int host_error;
>  	unsigned int flags = 0;
>  
> -	if (!acl)
> +	if (!acl) {
> +		if (type == NF4DIR)
> +			*dir_sticky = -1;
>  		return nfs_ok;
> +	}
>  
>  	if (type == NF4DIR)
>  		flags = NFS4_ACL_DIR;
>  
>  	host_error = nfs4_acl_nfsv4_to_posix(acl, &attr->na_pacl,
> -					     &attr->na_dpacl, flags);
> +					     &attr->na_dpacl, dir_sticky,
> +					     flags);
>  	if (host_error == -EINVAL)
>  		return nfserr_attrnotsupp;
>  	else
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 0f67f4a7b8b2..56aeb745d108 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -259,7 +259,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		return nfserrno(host_err);
>  
>  	if (is_create_with_attrs(open))
> -		nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs);
> +		nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs, NULL);
>  
>  	inode_lock_nested(inode, I_MUTEX_PARENT);
>  
> @@ -791,6 +791,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	};
>  	struct svc_fh resfh;
>  	__be32 status;
> +	int dir_sticky;
>  	dev_t rdev;
>  
>  	fh_init(&resfh, NFS4_FHSIZE);
> @@ -804,7 +805,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	if (status)
>  		return status;
>  
> -	status = nfsd4_acl_to_attr(create->cr_type, create->cr_acl, &attrs);
> +	status = nfsd4_acl_to_attr(create->cr_type, create->cr_acl, &attrs, &dir_sticky);
>  	current->fs->umask = create->cr_umask;
>  	switch (create->cr_type) {
>  	case NF4LNK:
> @@ -848,6 +849,11 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  		break;
>  
>  	case NF4DIR:
> +		if (dir_sticky == 1) {
> +			/* Set directory sticky bit deduced from the ACL attr. */
> +			create->cr_iattr.ia_valid |= ATTR_MODE;
> +			create->cr_iattr.ia_mode |= S_ISVTX;
> +		}
>  		create->cr_iattr.ia_valid &= ~ATTR_SIZE;
>  		status = nfsd_create(rqstp, &cstate->current_fh,
>  				     create->cr_name, create->cr_namelen,
> @@ -1144,6 +1150,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	struct inode *inode;
>  	__be32 status = nfs_ok;
>  	bool save_no_wcc;
> +	int dir_sticky;
>  	int err;
>  
>  	if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
> @@ -1165,10 +1172,26 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  
>  	inode = cstate->current_fh.fh_dentry->d_inode;
>  	status = nfsd4_acl_to_attr(S_ISDIR(inode->i_mode) ? NF4DIR : NF4REG,
> -				   setattr->sa_acl, &attrs);
> -
> +				   setattr->sa_acl, &attrs, &dir_sticky);
>  	if (status)
>  		goto out;
> +
> +	if (S_ISDIR(inode->i_mode) && dir_sticky != -1 && !!(inode->i_mode & S_ISVTX) != dir_sticky) {
> +		/*
> +		 * Set directory sticky bit deduced from the ACL attr.
> +		 * Do not clear sticky bit if it was explicitly set in MODE attr
> +		 * but was not deduced from ACL attr because clients can send
> +		 * both MODE and ACL attrs where sticky bit is only in MODE attr.
> +		 */
> +		if (!(attrs.na_iattr->ia_valid & ATTR_MODE))
> +			attrs.na_iattr->ia_mode = inode->i_mode;
> +		if (dir_sticky)
> +			attrs.na_iattr->ia_mode |= S_ISVTX;
> +		else if (!(attrs.na_iattr->ia_valid & ATTR_MODE))
> +			attrs.na_iattr->ia_mode &= ~S_ISVTX;
> +		attrs.na_iattr->ia_valid |= ATTR_MODE;
> +	}
> +
>  	save_no_wcc = cstate->current_fh.fh_no_wcc;
>  	cstate->current_fh.fh_no_wcc = true;
>  	status = nfsd_setattr(rqstp, &cstate->current_fh, &attrs, NULL);
> -- 
> 2.20.1
> 

Hi Pali -

Apologies for the delayed response.

Being somewhat un-expert in things ACL, I'm not sure if this is the
correct approach, or if it's right for the POSIX ACL-only
implementation we have in Linux. I'm going to research this a bit
and get back to you.
Pali Rohár Sept. 17, 2024, 4:10 p.m. UTC | #2
On Tuesday 17 September 2024 11:39:58 Chuck Lever wrote:
> On Fri, Sep 13, 2024 at 12:15:23AM +0200, Pali Rohár wrote:
> > Currently the directory restricted deletion flag (sticky bit, SVTX) is not
> > mapped into NFS4 ACL by knfsd. Which means that the NFS4 ACL client is not
> > aware of the fact that it cannot delete some files if the sticky bit is set
> > on the server on parent directory. If the client copies sticky bit
> > directory recursively to some other storage which implements the NFS4-like
> > ACL (e.g. to NTFS filesystem or to SMB server) then the directory's
> > restricted deletion flag is completely lost.
> > 
> > This change aims to improve interoperability of the POSIX directory
> > restricted deletion flag (sticky bit, SVTX) and NFS4 ACL structure in
> > knfsd. It transparently maps POSIX directory's sticky bit into NFS4 ACL
> > and vice-versa similarly like it already maps POSIX ACL into NFS4 ACL.
> > It covers NFS4 GETATTR ACL, NFS4 SETATTR ACL, NFS4 CREATE+OPEN operations.
> > When creating a new object over NFS4, it is possible to optionally specify
> > NFS4 ACL, so this is covered too.
> > 
> > For client SETATTR ACL, CREATE with ACL and OPEN-create with ACL
> > operations, detection pattern for restricted deletion flag is ACE entry
> > INHERIT_ONLY NO_PROPAGATE ALLOW OWNER@ DELETE together with check that
> > nobody except the OWNER@ has DELETE_CHILD permission. Note that the OWNER@
> > does not have to have DELETE_CHILD permission in case it does not have
> > write access to directory.
> > 
> > For client GETATTR ACL operation, when restricted deletion flag is present
> > in inode, following ACE entries are prepended before any other ACEs:
> > 
> >   ALLOW OWNER@ DELETE_CHILD                                 (1)
> >   DENY EVERYONE@ DELETE_CHILD
> >   INHERIT_ONLY NO_PROPAGATE DENY user_owner_of_dir DELETE   (3)
> >   INHERIT_ONLY NO_PROPAGATE DENY OWNER@ DELETE              (4)
> >   INHERIT_ONLY NO_PROPAGATE DENY user1 DELETE               (ACL user1)
> >   ...
> >   INHERIT_ONLY NO_PROPAGATE DENY userN DELETE               (ACL userN)
> >   INHERIT_ONLY NO_PROPAGATE ALLOW OWNER@ DELETE
> > 
> > ACE entry (1) is present only when user OWNER@ has write access (can modify
> > directory, including removing child entries), because it is possible to
> > have sticky bit set also on read-only directory.
> > 
> > ACE entry (3) is present only when user OWNER@ does not have write access
> > (cannot modify directory) and it is required to override effect of the last
> > ACE entry which allows child entry OWNER@ to remove entry itself. This is
> > needed for example for POSIX mode 1577.
> > 
> > ACE entry (4) is present only when anybody else except the directory owner
> > has no write access to the directory. This is the case when sticky bit is
> > set but nobody can use it because of missing directory write access. So
> > this explicit DENY covers this edge case.
> > 
> > ACE entries (ACL user1...userN) are for POSIX users which do not have write
> > access to directory and therefore they cannot remove directory entries
> > which they own. This is again needed for overriding the effect of the last
> > ACE entry.
> > 
> > When restricted deletion flag is not present then nothing is added.
> > 
> > This is probably the best approximation of the directory restricted
> > deletion flag. It covers directory's OWNER@ grant access, child OWNER@
> > grant access and also restrictions for all other users. It also covers the
> > situation when OWNER@ or some POSIX user does not have write access to the
> > directory, and also covers situation when nobody except directory owner has
> > write access to the directory.
> > 
> > What this does not cover is the restriction when some POSIX group does not
> > have directory write access, and another POSIX group has directory write
> > access. This is probably not possible to express in NFS4 ACL language
> > without ability to evaluate if user is member of some group or not. NFS4
> > ACL in this case says for the owner of the file that the delete operation
> > is allowed.
> > 
> > The whole change is only about the mapping of the sticky bit into NFS4 ACL
> > and only for NFS4 GET and SET ACL operations. It does not change any access
> > permission checks.
> > 
> > Signed-off-by: Pali Rohár <pali@kernel.org>
> > ---
> >  fs/nfsd/acl.h      |   2 +-
> >  fs/nfsd/nfs4acl.c  | 242 ++++++++++++++++++++++++++++++++++++++++++---
> >  fs/nfsd/nfs4proc.c |  31 +++++-
> >  3 files changed, 255 insertions(+), 20 deletions(-)
> > 
> > diff --git a/fs/nfsd/acl.h b/fs/nfsd/acl.h
> > index 4b7324458a94..e7e7909bf03a 100644
> > --- a/fs/nfsd/acl.h
> > +++ b/fs/nfsd/acl.h
> > @@ -48,6 +48,6 @@ __be32 nfs4_acl_write_who(struct xdr_stream *xdr, int who);
> >  int nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
> >  		struct nfs4_acl **acl);
> >  __be32 nfsd4_acl_to_attr(enum nfs_ftype4 type, struct nfs4_acl *acl,
> > -			 struct nfsd_attrs *attr);
> > +			 struct nfsd_attrs *attr, int *dir_sticky);
> >  
> >  #endif /* LINUX_NFS4_ACL_H */
> > diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c
> > index 96e786b5e544..6a53772c2a13 100644
> > --- a/fs/nfsd/nfs4acl.c
> > +++ b/fs/nfsd/nfs4acl.c
> > @@ -46,6 +46,7 @@
> >  #define NFS4_ACL_TYPE_DEFAULT	0x01
> >  #define NFS4_ACL_DIR		0x02
> >  #define NFS4_ACL_OWNER		0x04
> > +#define NFS4_ACL_DIR_STICKY	0x08
> >  
> >  /* mode bit translations: */
> >  #define NFS4_READ_MODE (NFS4_ACE_READ_DATA)
> > @@ -73,7 +74,7 @@ mask_from_posix(unsigned short perm, unsigned int flags)
> >  		mask |= NFS4_READ_MODE;
> >  	if (perm & ACL_WRITE)
> >  		mask |= NFS4_WRITE_MODE;
> > -	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR))
> > +	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR) && !(flags & NFS4_ACL_DIR_STICKY))
> >  		mask |= NFS4_ACE_DELETE_CHILD;
> >  	if (perm & ACL_EXECUTE)
> >  		mask |= NFS4_EXECUTE_MODE;
> > @@ -89,7 +90,7 @@ deny_mask_from_posix(unsigned short perm, u32 flags)
> >  		mask |= NFS4_READ_MODE;
> >  	if (perm & ACL_WRITE)
> >  		mask |= NFS4_WRITE_MODE;
> > -	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR))
> > +	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR) && !(flags & NFS4_ACL_DIR_STICKY))
> >  		mask |= NFS4_ACE_DELETE_CHILD;
> >  	if (perm & ACL_EXECUTE)
> >  		mask |= NFS4_EXECUTE_MODE;
> > @@ -110,7 +111,7 @@ low_mode_from_nfs4(u32 perm, unsigned short *mode, unsigned int flags)
> >  {
> >  	u32 write_mode = NFS4_WRITE_MODE;
> >  
> > -	if (flags & NFS4_ACL_DIR)
> > +	if ((flags & NFS4_ACL_DIR) && !(flags | NFS4_ACL_DIR_STICKY))
> >  		write_mode |= NFS4_ACE_DELETE_CHILD;
> >  	*mode = 0;
> >  	if ((perm & NFS4_READ_MODE) == NFS4_READ_MODE)
> > @@ -123,7 +124,7 @@ low_mode_from_nfs4(u32 perm, unsigned short *mode, unsigned int flags)
> >  
> >  static short ace2type(struct nfs4_ace *);
> >  static void _posix_to_nfsv4_one(struct posix_acl *, struct nfs4_acl *,
> > -				unsigned int);
> > +				unsigned int, kuid_t);
> >  
> >  int
> >  nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
> > @@ -142,8 +143,11 @@ nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
> >  	if (IS_ERR(pacl))
> >  		return PTR_ERR(pacl);
> >  
> > -	/* allocate for worst case: one (deny, allow) pair each: */
> > -	size += 2 * pacl->a_count;
> > +	/*
> > +	 * allocate for worst case: one (deny, allow) pair for each
> > +	 * plus for restricted deletion flag (sticky bit): 4 + one for each
> > +	 */
> > +	size += 2 * pacl->a_count + (4 + pacl->a_count);
> >  
> >  	if (S_ISDIR(inode->i_mode)) {
> >  		flags = NFS4_ACL_DIR;
> > @@ -155,6 +159,10 @@ nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
> >  
> >  		if (dpacl)
> >  			size += 2 * dpacl->a_count;
> > +
> > +		/* propagate restricted deletion flag (sticky bit) */
> > +		if (inode->i_mode & S_ISVTX)
> > +			flags |= NFS4_ACL_DIR_STICKY;
> >  	}
> >  
> >  	*acl = kmalloc(nfs4_acl_bytes(size), GFP_KERNEL);
> > @@ -164,10 +172,10 @@ nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
> >  	}
> >  	(*acl)->naces = 0;
> >  
> > -	_posix_to_nfsv4_one(pacl, *acl, flags & ~NFS4_ACL_TYPE_DEFAULT);
> > +	_posix_to_nfsv4_one(pacl, *acl, flags & ~NFS4_ACL_TYPE_DEFAULT, inode->i_uid);
> >  
> >  	if (dpacl)
> > -		_posix_to_nfsv4_one(dpacl, *acl, flags | NFS4_ACL_TYPE_DEFAULT);
> > +		_posix_to_nfsv4_one(dpacl, *acl, (flags | NFS4_ACL_TYPE_DEFAULT) & ~NFS4_ACL_DIR_STICKY, inode->i_uid);
> >  
> >  out:
> >  	posix_acl_release(dpacl);
> > @@ -231,7 +239,7 @@ summarize_posix_acl(struct posix_acl *acl, struct posix_acl_summary *pas)
> >  /* We assume the acl has been verified with posix_acl_valid. */
> >  static void
> >  _posix_to_nfsv4_one(struct posix_acl *pacl, struct nfs4_acl *acl,
> > -						unsigned int flags)
> > +		    unsigned int flags, kuid_t owner_uid)
> >  {
> >  	struct posix_acl_entry *pa, *group_owner_entry;
> >  	struct nfs4_ace *ace;
> > @@ -243,9 +251,149 @@ _posix_to_nfsv4_one(struct posix_acl *pacl, struct nfs4_acl *acl,
> >  	BUG_ON(pacl->a_count < 3);
> >  	summarize_posix_acl(pacl, &pas);
> >  
> > -	pa = pacl->a_entries;
> >  	ace = acl->aces + acl->naces;
> >  
> > +	/*
> > +	 * Translate restricted deletion flag (sticky bit, SVTX) set on the
> > +	 * directory to following NFS4 ACEs prepended before any other ACEs:
> > +	 *
> > +	 *   ALLOW OWNER@ DELETE_CHILD                                 (1)
> > +	 *   DENY EVERYONE@ DELETE_CHILD
> > +	 *   INHERIT_ONLY NO_PROPAGATE DENY user_owner_of_dir DELETE   (3)
> > +	 *   INHERIT_ONLY NO_PROPAGATE DENY OWNER@ DELETE              (4)
> > +	 *   INHERIT_ONLY NO_PROPAGATE DENY user1 DELETE               (ACL user1)
> > +	 *   ...
> > +	 *   INHERIT_ONLY NO_PROPAGATE DENY userN DELETE               (ACL userN)
> > +	 *   INHERIT_ONLY NO_PROPAGATE ALLOW OWNER@ DELETE
> > +	 *
> > +	 *   (1) - only if user-owner has write access
> > +	 *   (3) - only if user-owner does not have write access
> > +	 *   (4) - only if non-user-owner does not have write access
> > +	 *   (ACL user1) - only if user1 does not have write access
> > +	 *   (ACL userN) - only if userN does not have write access
> > +	 *
> > +	 * These ACEs describe behavior of set restricted deletion flag (sticky
> > +	 * bit) on directory as they allow only owner of individual child entries
> > +	 * and owner of the directory to delete individual child entries.
> > +	 * Everyone else is denied to remove child entries in this directory.
> > +	 *
> > +	 * For deleting entry in NFS4 ACL model it is needed either DELETE_CHILD
> > +	 * permission (access mask) from the parent directory or DELETE permission
> > +	 * (access mask) on the child. Just one of these two permissions is enough.
> > +	 * So if there is explicit DENY DELETE_CHILD on the parent together with
> > +	 * explicit ALLOW DELETE on the child, it means that deleting is allowed
> > +	 * (evaluation of permissions is independent in NFS4 ACL model). OWNER@
> > +	 * for inherited ACEs means owner of the child entry and not the owner
> > +	 * of the parent from which was ACE inherited.
> > +	 *
> > +	 * This translation is imperfect just for a case when some group
> > +	 * (including group-owner and others-group) does not have write access
> > +	 * to directory. Handled is only the edge case (via rule 4) when
> > +	 * everyone else except owner has no write access to the directory.
> > +	 * This information is not present in NFS4 ACL because it looks like that
> > +	 * this is not possible to express in this ACL model. So for ACL consumer
> > +	 * it could look like that owner of the file can delete its own file even
> > +	 * when group or other mode bits of the directory do not allow it.
> > +	 */
> > +	if (flags & NFS4_ACL_DIR_STICKY) {
> > +		/*
> > +		 * Explicitly allow directory owner to delete child entries
> > +		 * if directory owner has write access
> > +		 */
> > +		if (pas.owner & ACL_WRITE) {
> > +			ace->type = NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE;
> > +			ace->flag = 0;
> > +			ace->access_mask = NFS4_ACE_DELETE_CHILD;
> > +			ace->whotype = NFS4_ACL_WHO_OWNER;
> > +			ace++;
> > +			acl->naces++;
> > +		}
> > +
> > +		/*
> > +		 * And then deny deleting child entries for all other users
> > +		 * which do not have explicit DELETE permission granted by
> > +		 * last rule in this section.
> > +		 */
> > +		ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
> > +		ace->flag = 0;
> > +		ace->access_mask = NFS4_ACE_DELETE_CHILD;
> > +		ace->whotype = NFS4_ACL_WHO_EVERYONE;
> > +		ace++;
> > +		acl->naces++;
> > +
> > +		/*
> > +		 * Do not grant directory owner to delete child entries (by the
> > +		 * last rule in this section) if it does not have write access.
> > +		 */
> > +		if (!(pas.owner & ACL_WRITE)) {
> > +			ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
> > +			ace->flag = NFS4_INHERITANCE_FLAGS |
> > +				    NFS4_ACE_INHERIT_ONLY_ACE |
> > +				    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
> > +			ace->access_mask = NFS4_ACE_DELETE;
> > +			ace->whotype = NFS4_ACL_WHO_NAMED;
> > +			ace->who_uid = owner_uid;
> > +			ace++;
> > +			acl->naces++;
> > +		}
> > +
> > +		if (!(pas.users & ACL_WRITE) && !(pas.group & ACL_WRITE) &&
> > +		    !(pas.groups & ACL_WRITE) && !(pas.other & ACL_WRITE)) {
> > +			/*
> > +			 * Do not grant child owner who is not directory owner
> > +			 * (handled by the first rule in this section) to
> > +			 * delete own child entries if there is no possible
> > +			 * directory write permission (checked for named users,
> > +			 * group-owner, named groups and others-groups).
> > +			 * This handles special edge case when only directory
> > +			 * owner has write access to directory.
> > +			 */
> > +			ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
> > +			ace->flag = NFS4_INHERITANCE_FLAGS |
> > +				    NFS4_ACE_INHERIT_ONLY_ACE |
> > +				    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
> > +			ace->access_mask = NFS4_ACE_DELETE;
> > +			ace->whotype = NFS4_ACL_WHO_OWNER;
> > +			ace++;
> > +			acl->naces++;
> > +		} else {
> > +			/*
> > +			 * Do not grant individual named users to delete child
> > +			 * entries (by the last rule in this section) if user
> > +			 * does not have write access to directory.
> > +			 */
> > +			for (pa = pacl->a_entries + 1; pa->e_tag == ACL_USER; pa++) {
> > +				if (pa->e_perm & pas.mask & ACL_WRITE)
> > +					continue;
> > +				ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
> > +				ace->flag = NFS4_INHERITANCE_FLAGS |
> > +					    NFS4_ACE_INHERIT_ONLY_ACE |
> > +					    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
> > +				ace->access_mask = NFS4_ACE_DELETE;
> > +				ace->whotype = NFS4_ACL_WHO_NAMED;
> > +				ace->who_uid = pa->e_uid;
> > +				ace++;
> > +				acl->naces++;
> > +			}
> > +		}
> > +
> > +		/*
> > +		 * Above rules filtered out users which do not have write access
> > +		 * to the directory. Now allow child-owner to delete its own
> > +		 * directory entries.
> > +		 */
> > +		ace->type = NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE;
> > +		ace->flag = NFS4_INHERITANCE_FLAGS |
> > +			    NFS4_ACE_INHERIT_ONLY_ACE |
> > +			    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
> > +		ace->access_mask = NFS4_ACE_DELETE;
> > +		ace->whotype = NFS4_ACL_WHO_OWNER;
> > +		ace++;
> > +		acl->naces++;
> > +	}
> > +
> > +	pa = pacl->a_entries;
> > +
> >  	/* We could deny everything not granted by the owner: */
> >  	deny = ~pas.owner;
> >  	/*
> > @@ -517,7 +665,8 @@ posix_state_to_acl(struct posix_acl_state *state, unsigned int flags)
> >  
> >  	pace = pacl->a_entries;
> >  	pace->e_tag = ACL_USER_OBJ;
> > -	low_mode_from_nfs4(state->owner.allow, &pace->e_perm, flags);
> > +	/* owner is never affected by restricted deletion flag, so clear NFS4_ACL_DIR_STICKY */
> > +	low_mode_from_nfs4(state->owner.allow, &pace->e_perm, flags & ~NFS4_ACL_DIR_STICKY);
> >  
> >  	for (i=0; i < state->users->n; i++) {
> >  		pace++;
> > @@ -691,9 +840,14 @@ static void process_one_v4_ace(struct posix_acl_state *state,
> >  
> >  static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
> >  		struct posix_acl **pacl, struct posix_acl **dpacl,
> > +		int *dir_sticky,
> >  		unsigned int flags)
> >  {
> >  	struct posix_acl_state effective_acl_state, default_acl_state;
> > +	bool dir_allow_nonowner_delete_child = false;
> > +	bool dir_deny_everyone_delete_child = false;
> > +	bool dir_allow_child_owner_delete = false;
> > +	unsigned int eflags = 0;
> >  	struct nfs4_ace *ace;
> >  	int ret;
> >  
> > @@ -705,6 +859,28 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
> >  		goto out_estate;
> >  	ret = -EINVAL;
> >  	for (ace = acl->aces; ace < acl->aces + acl->naces; ace++) {
> > +		/*
> > +		 * Process and parse ACE entry INHERIT_ONLY NO_PROPAGATE DELETE
> > +		 * for detecting restricted deletion flag (sticky bit). Allow
> > +		 * SYNCHRONIZE access mask to be present as NFS4 clients can
> > +		 * include this access mask together with any other one.
> > +		 *
> > +		 * It needs to be done before validation as NFS4_SUPPORTED_FLAGS
> > +		 * does not contain NFS4_ACE_NO_PROPAGATE_INHERIT_ACE and this
> > +		 * ACE must not throw error.
> > +		 */
> > +		if ((flags & NFS4_ACL_DIR) &&
> > +		    !(ace->flag & ~(NFS4_SUPPORTED_FLAGS|NFS4_ACE_NO_PROPAGATE_INHERIT_ACE)) &&
> > +		    (ace->flag & NFS4_INHERITANCE_FLAGS) &&
> > +		    (ace->flag & NFS4_ACE_INHERIT_ONLY_ACE) &&
> > +		    (ace->flag & NFS4_ACE_NO_PROPAGATE_INHERIT_ACE) &&
> > +		    (ace->access_mask & ~NFS4_ACE_SYNCHRONIZE) == NFS4_ACE_DELETE) {
> > +			if (ace->type == NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE &&
> > +			    ace->whotype == NFS4_ACL_WHO_OWNER)
> > +				dir_allow_child_owner_delete = true;
> > +			continue;
> > +		}
> > +
> >  		if (ace->type != NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE &&
> >  		    ace->type != NFS4_ACE_ACCESS_DENIED_ACE_TYPE)
> >  			goto out_dstate;
> > @@ -725,6 +901,38 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
> >  
> >  		if (!(ace->flag & NFS4_ACE_INHERIT_ONLY_ACE))
> >  			process_one_v4_ace(&effective_acl_state, ace);
> > +
> > +		/*
> > +		 * Process and parse ACE entry with DELETE_CHILD access mask
> > +		 * for detecting restricted deletion flag (sticky bit).
> > +		 */
> > +		if ((flags & NFS4_ACL_DIR) &&
> > +		    !(ace->flag & NFS4_ACE_INHERIT_ONLY_ACE) &&
> > +		    (ace->access_mask & NFS4_ACE_DELETE_CHILD)) {
> > +			if (ace->type == NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE &&
> > +			    !dir_deny_everyone_delete_child &&
> > +			    ace->whotype != NFS4_ACL_WHO_OWNER)
> > +				dir_allow_nonowner_delete_child = true;
> > +			else if (ace->type == NFS4_ACE_ACCESS_DENIED_ACE_TYPE &&
> > +				 ace->whotype == NFS4_ACL_WHO_EVERYONE)
> > +				dir_deny_everyone_delete_child = true;
> > +		}
> > +	}
> > +
> > +	/*
> > +	 * Recognize restricted deletion flag (sticky bit) from directory ACL
> > +	 * if ACEs on directory allow only owner of directory child entry to
> > +	 * delete entry itself.
> > +	 *
> > +	 * This is relaxed check for rules generated by _posix_to_nfsv4_one().
> > +	 * Relaxed check of restricted deletion flag is for security reasons
> > +	 * and means that permissions would be more stricter, to prevent
> > +	 * granting more access than what was specified in NFS4 ACL packet.
> > +	 */
> > +	if (flags & NFS4_ACL_DIR) {
> > +		*dir_sticky = !dir_allow_nonowner_delete_child && dir_allow_child_owner_delete;
> > +		if (*dir_sticky)
> > +			eflags |= NFS4_ACL_DIR_STICKY;
> >  	}
> >  
> >  	/*
> > @@ -750,7 +958,7 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
> >  			default_acl_state.other = effective_acl_state.other;
> >  	}
> >  
> > -	*pacl = posix_state_to_acl(&effective_acl_state, flags);
> > +	*pacl = posix_state_to_acl(&effective_acl_state, flags | eflags);
> >  	if (IS_ERR(*pacl)) {
> >  		ret = PTR_ERR(*pacl);
> >  		*pacl = NULL;
> > @@ -776,19 +984,23 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
> >  }
> >  
> >  __be32 nfsd4_acl_to_attr(enum nfs_ftype4 type, struct nfs4_acl *acl,
> > -			 struct nfsd_attrs *attr)
> > +			 struct nfsd_attrs *attr, int *dir_sticky)
> >  {
> >  	int host_error;
> >  	unsigned int flags = 0;
> >  
> > -	if (!acl)
> > +	if (!acl) {
> > +		if (type == NF4DIR)
> > +			*dir_sticky = -1;
> >  		return nfs_ok;
> > +	}
> >  
> >  	if (type == NF4DIR)
> >  		flags = NFS4_ACL_DIR;
> >  
> >  	host_error = nfs4_acl_nfsv4_to_posix(acl, &attr->na_pacl,
> > -					     &attr->na_dpacl, flags);
> > +					     &attr->na_dpacl, dir_sticky,
> > +					     flags);
> >  	if (host_error == -EINVAL)
> >  		return nfserr_attrnotsupp;
> >  	else
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index 0f67f4a7b8b2..56aeb745d108 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -259,7 +259,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  		return nfserrno(host_err);
> >  
> >  	if (is_create_with_attrs(open))
> > -		nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs);
> > +		nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs, NULL);
> >  
> >  	inode_lock_nested(inode, I_MUTEX_PARENT);
> >  
> > @@ -791,6 +791,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  	};
> >  	struct svc_fh resfh;
> >  	__be32 status;
> > +	int dir_sticky;
> >  	dev_t rdev;
> >  
> >  	fh_init(&resfh, NFS4_FHSIZE);
> > @@ -804,7 +805,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  	if (status)
> >  		return status;
> >  
> > -	status = nfsd4_acl_to_attr(create->cr_type, create->cr_acl, &attrs);
> > +	status = nfsd4_acl_to_attr(create->cr_type, create->cr_acl, &attrs, &dir_sticky);
> >  	current->fs->umask = create->cr_umask;
> >  	switch (create->cr_type) {
> >  	case NF4LNK:
> > @@ -848,6 +849,11 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  		break;
> >  
> >  	case NF4DIR:
> > +		if (dir_sticky == 1) {
> > +			/* Set directory sticky bit deduced from the ACL attr. */
> > +			create->cr_iattr.ia_valid |= ATTR_MODE;
> > +			create->cr_iattr.ia_mode |= S_ISVTX;
> > +		}
> >  		create->cr_iattr.ia_valid &= ~ATTR_SIZE;
> >  		status = nfsd_create(rqstp, &cstate->current_fh,
> >  				     create->cr_name, create->cr_namelen,
> > @@ -1144,6 +1150,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  	struct inode *inode;
> >  	__be32 status = nfs_ok;
> >  	bool save_no_wcc;
> > +	int dir_sticky;
> >  	int err;
> >  
> >  	if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
> > @@ -1165,10 +1172,26 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  
> >  	inode = cstate->current_fh.fh_dentry->d_inode;
> >  	status = nfsd4_acl_to_attr(S_ISDIR(inode->i_mode) ? NF4DIR : NF4REG,
> > -				   setattr->sa_acl, &attrs);
> > -
> > +				   setattr->sa_acl, &attrs, &dir_sticky);
> >  	if (status)
> >  		goto out;
> > +
> > +	if (S_ISDIR(inode->i_mode) && dir_sticky != -1 && !!(inode->i_mode & S_ISVTX) != dir_sticky) {
> > +		/*
> > +		 * Set directory sticky bit deduced from the ACL attr.
> > +		 * Do not clear sticky bit if it was explicitly set in MODE attr
> > +		 * but was not deduced from ACL attr because clients can send
> > +		 * both MODE and ACL attrs where sticky bit is only in MODE attr.
> > +		 */
> > +		if (!(attrs.na_iattr->ia_valid & ATTR_MODE))
> > +			attrs.na_iattr->ia_mode = inode->i_mode;
> > +		if (dir_sticky)
> > +			attrs.na_iattr->ia_mode |= S_ISVTX;
> > +		else if (!(attrs.na_iattr->ia_valid & ATTR_MODE))
> > +			attrs.na_iattr->ia_mode &= ~S_ISVTX;
> > +		attrs.na_iattr->ia_valid |= ATTR_MODE;
> > +	}
> > +
> >  	save_no_wcc = cstate->current_fh.fh_no_wcc;
> >  	cstate->current_fh.fh_no_wcc = true;
> >  	status = nfsd_setattr(rqstp, &cstate->current_fh, &attrs, NULL);
> > -- 
> > 2.20.1
> > 
> 
> Hi Pali -
> 
> Apologies for the delayed response.
> 
> Being somewhat un-expert in things ACL, I'm not sure if this is the
> correct approach, or if it's right for the POSIX ACL-only
> implementation we have in Linux. I'm going to research this a bit
> and get back to you.
> 
> -- 
> Chuck Lever

Hello Chuck, thank you reply.

Just to note that this does not affect POSIX ACL-only storage on Linux
server. Everything is same as before (it is POSIX ACL-only and
everything is evaluated via POSIX ACLs).

This change is just what Linux NFS4 server provides to NFS4 clients,
i.e. not for POSIX clients. As Linux NFS4 server already maps POSIX-ACL
into NFS4-ACL format, I think that it makes sense to also map
POSIX-sticky bit into NFS4-ACL format. It allows better interop for NFS4
clients which use NFS4 ACLs.

What is this change trying to achieve is to allow Linux server to serve
POSIX things (like sticky bit) for NFS4-ACL clients which are not POSIX
aware, and serve this sticky bit in NFS4-ACL language. And same for
opposite direction, to translate NFS4 ACL rules which describe sticky
bit into the POSIX sticky bit in mode.

I hope that this is more clear now.

Pali
Chuck Lever Sept. 20, 2024, 7:19 p.m. UTC | #3
On Tue, Sep 17, 2024 at 06:10:50PM +0200, Pali Rohár wrote:
> On Tuesday 17 September 2024 11:39:58 Chuck Lever wrote:
> > On Fri, Sep 13, 2024 at 12:15:23AM +0200, Pali Rohár wrote:
> > > Currently the directory restricted deletion flag (sticky bit, SVTX) is not
> > > mapped into NFS4 ACL by knfsd. Which means that the NFS4 ACL client is not
> > > aware of the fact that it cannot delete some files if the sticky bit is set
> > > on the server on parent directory. If the client copies sticky bit
> > > directory recursively to some other storage which implements the NFS4-like
> > > ACL (e.g. to NTFS filesystem or to SMB server) then the directory's
> > > restricted deletion flag is completely lost.
> > > 
> > > This change aims to improve interoperability of the POSIX directory
> > > restricted deletion flag (sticky bit, SVTX) and NFS4 ACL structure in
> > > knfsd. It transparently maps POSIX directory's sticky bit into NFS4 ACL
> > > and vice-versa similarly like it already maps POSIX ACL into NFS4 ACL.
> > > It covers NFS4 GETATTR ACL, NFS4 SETATTR ACL, NFS4 CREATE+OPEN operations.
> > > When creating a new object over NFS4, it is possible to optionally specify
> > > NFS4 ACL, so this is covered too.
> > > 
> > > For client SETATTR ACL, CREATE with ACL and OPEN-create with ACL
> > > operations, detection pattern for restricted deletion flag is ACE entry
> > > INHERIT_ONLY NO_PROPAGATE ALLOW OWNER@ DELETE together with check that
> > > nobody except the OWNER@ has DELETE_CHILD permission. Note that the OWNER@
> > > does not have to have DELETE_CHILD permission in case it does not have
> > > write access to directory.
> > > 
> > > For client GETATTR ACL operation, when restricted deletion flag is present
> > > in inode, following ACE entries are prepended before any other ACEs:
> > > 
> > >   ALLOW OWNER@ DELETE_CHILD                                 (1)
> > >   DENY EVERYONE@ DELETE_CHILD
> > >   INHERIT_ONLY NO_PROPAGATE DENY user_owner_of_dir DELETE   (3)
> > >   INHERIT_ONLY NO_PROPAGATE DENY OWNER@ DELETE              (4)
> > >   INHERIT_ONLY NO_PROPAGATE DENY user1 DELETE               (ACL user1)
> > >   ...
> > >   INHERIT_ONLY NO_PROPAGATE DENY userN DELETE               (ACL userN)
> > >   INHERIT_ONLY NO_PROPAGATE ALLOW OWNER@ DELETE
> > > 
> > > ACE entry (1) is present only when user OWNER@ has write access (can modify
> > > directory, including removing child entries), because it is possible to
> > > have sticky bit set also on read-only directory.
> > > 
> > > ACE entry (3) is present only when user OWNER@ does not have write access
> > > (cannot modify directory) and it is required to override effect of the last
> > > ACE entry which allows child entry OWNER@ to remove entry itself. This is
> > > needed for example for POSIX mode 1577.
> > > 
> > > ACE entry (4) is present only when anybody else except the directory owner
> > > has no write access to the directory. This is the case when sticky bit is
> > > set but nobody can use it because of missing directory write access. So
> > > this explicit DENY covers this edge case.
> > > 
> > > ACE entries (ACL user1...userN) are for POSIX users which do not have write
> > > access to directory and therefore they cannot remove directory entries
> > > which they own. This is again needed for overriding the effect of the last
> > > ACE entry.
> > > 
> > > When restricted deletion flag is not present then nothing is added.
> > > 
> > > This is probably the best approximation of the directory restricted
> > > deletion flag. It covers directory's OWNER@ grant access, child OWNER@
> > > grant access and also restrictions for all other users. It also covers the
> > > situation when OWNER@ or some POSIX user does not have write access to the
> > > directory, and also covers situation when nobody except directory owner has
> > > write access to the directory.
> > > 
> > > What this does not cover is the restriction when some POSIX group does not
> > > have directory write access, and another POSIX group has directory write
> > > access. This is probably not possible to express in NFS4 ACL language
> > > without ability to evaluate if user is member of some group or not. NFS4
> > > ACL in this case says for the owner of the file that the delete operation
> > > is allowed.
> > > 
> > > The whole change is only about the mapping of the sticky bit into NFS4 ACL
> > > and only for NFS4 GET and SET ACL operations. It does not change any access
> > > permission checks.
> > > 
> > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > ---
> > >  fs/nfsd/acl.h      |   2 +-
> > >  fs/nfsd/nfs4acl.c  | 242 ++++++++++++++++++++++++++++++++++++++++++---
> > >  fs/nfsd/nfs4proc.c |  31 +++++-
> > >  3 files changed, 255 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/acl.h b/fs/nfsd/acl.h
> > > index 4b7324458a94..e7e7909bf03a 100644
> > > --- a/fs/nfsd/acl.h
> > > +++ b/fs/nfsd/acl.h
> > > @@ -48,6 +48,6 @@ __be32 nfs4_acl_write_who(struct xdr_stream *xdr, int who);
> > >  int nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
> > >  		struct nfs4_acl **acl);
> > >  __be32 nfsd4_acl_to_attr(enum nfs_ftype4 type, struct nfs4_acl *acl,
> > > -			 struct nfsd_attrs *attr);
> > > +			 struct nfsd_attrs *attr, int *dir_sticky);
> > >  
> > >  #endif /* LINUX_NFS4_ACL_H */
> > > diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c
> > > index 96e786b5e544..6a53772c2a13 100644
> > > --- a/fs/nfsd/nfs4acl.c
> > > +++ b/fs/nfsd/nfs4acl.c
> > > @@ -46,6 +46,7 @@
> > >  #define NFS4_ACL_TYPE_DEFAULT	0x01
> > >  #define NFS4_ACL_DIR		0x02
> > >  #define NFS4_ACL_OWNER		0x04
> > > +#define NFS4_ACL_DIR_STICKY	0x08
> > >  
> > >  /* mode bit translations: */
> > >  #define NFS4_READ_MODE (NFS4_ACE_READ_DATA)
> > > @@ -73,7 +74,7 @@ mask_from_posix(unsigned short perm, unsigned int flags)
> > >  		mask |= NFS4_READ_MODE;
> > >  	if (perm & ACL_WRITE)
> > >  		mask |= NFS4_WRITE_MODE;
> > > -	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR))
> > > +	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR) && !(flags & NFS4_ACL_DIR_STICKY))
> > >  		mask |= NFS4_ACE_DELETE_CHILD;
> > >  	if (perm & ACL_EXECUTE)
> > >  		mask |= NFS4_EXECUTE_MODE;
> > > @@ -89,7 +90,7 @@ deny_mask_from_posix(unsigned short perm, u32 flags)
> > >  		mask |= NFS4_READ_MODE;
> > >  	if (perm & ACL_WRITE)
> > >  		mask |= NFS4_WRITE_MODE;
> > > -	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR))
> > > +	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR) && !(flags & NFS4_ACL_DIR_STICKY))
> > >  		mask |= NFS4_ACE_DELETE_CHILD;
> > >  	if (perm & ACL_EXECUTE)
> > >  		mask |= NFS4_EXECUTE_MODE;
> > > @@ -110,7 +111,7 @@ low_mode_from_nfs4(u32 perm, unsigned short *mode, unsigned int flags)
> > >  {
> > >  	u32 write_mode = NFS4_WRITE_MODE;
> > >  
> > > -	if (flags & NFS4_ACL_DIR)
> > > +	if ((flags & NFS4_ACL_DIR) && !(flags | NFS4_ACL_DIR_STICKY))
> > >  		write_mode |= NFS4_ACE_DELETE_CHILD;
> > >  	*mode = 0;
> > >  	if ((perm & NFS4_READ_MODE) == NFS4_READ_MODE)
> > > @@ -123,7 +124,7 @@ low_mode_from_nfs4(u32 perm, unsigned short *mode, unsigned int flags)
> > >  
> > >  static short ace2type(struct nfs4_ace *);
> > >  static void _posix_to_nfsv4_one(struct posix_acl *, struct nfs4_acl *,
> > > -				unsigned int);
> > > +				unsigned int, kuid_t);
> > >  
> > >  int
> > >  nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
> > > @@ -142,8 +143,11 @@ nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
> > >  	if (IS_ERR(pacl))
> > >  		return PTR_ERR(pacl);
> > >  
> > > -	/* allocate for worst case: one (deny, allow) pair each: */
> > > -	size += 2 * pacl->a_count;
> > > +	/*
> > > +	 * allocate for worst case: one (deny, allow) pair for each
> > > +	 * plus for restricted deletion flag (sticky bit): 4 + one for each
> > > +	 */
> > > +	size += 2 * pacl->a_count + (4 + pacl->a_count);
> > >  
> > >  	if (S_ISDIR(inode->i_mode)) {
> > >  		flags = NFS4_ACL_DIR;
> > > @@ -155,6 +159,10 @@ nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
> > >  
> > >  		if (dpacl)
> > >  			size += 2 * dpacl->a_count;
> > > +
> > > +		/* propagate restricted deletion flag (sticky bit) */
> > > +		if (inode->i_mode & S_ISVTX)
> > > +			flags |= NFS4_ACL_DIR_STICKY;
> > >  	}
> > >  
> > >  	*acl = kmalloc(nfs4_acl_bytes(size), GFP_KERNEL);
> > > @@ -164,10 +172,10 @@ nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
> > >  	}
> > >  	(*acl)->naces = 0;
> > >  
> > > -	_posix_to_nfsv4_one(pacl, *acl, flags & ~NFS4_ACL_TYPE_DEFAULT);
> > > +	_posix_to_nfsv4_one(pacl, *acl, flags & ~NFS4_ACL_TYPE_DEFAULT, inode->i_uid);
> > >  
> > >  	if (dpacl)
> > > -		_posix_to_nfsv4_one(dpacl, *acl, flags | NFS4_ACL_TYPE_DEFAULT);
> > > +		_posix_to_nfsv4_one(dpacl, *acl, (flags | NFS4_ACL_TYPE_DEFAULT) & ~NFS4_ACL_DIR_STICKY, inode->i_uid);
> > >  
> > >  out:
> > >  	posix_acl_release(dpacl);
> > > @@ -231,7 +239,7 @@ summarize_posix_acl(struct posix_acl *acl, struct posix_acl_summary *pas)
> > >  /* We assume the acl has been verified with posix_acl_valid. */
> > >  static void
> > >  _posix_to_nfsv4_one(struct posix_acl *pacl, struct nfs4_acl *acl,
> > > -						unsigned int flags)
> > > +		    unsigned int flags, kuid_t owner_uid)
> > >  {
> > >  	struct posix_acl_entry *pa, *group_owner_entry;
> > >  	struct nfs4_ace *ace;
> > > @@ -243,9 +251,149 @@ _posix_to_nfsv4_one(struct posix_acl *pacl, struct nfs4_acl *acl,
> > >  	BUG_ON(pacl->a_count < 3);
> > >  	summarize_posix_acl(pacl, &pas);
> > >  
> > > -	pa = pacl->a_entries;
> > >  	ace = acl->aces + acl->naces;
> > >  
> > > +	/*
> > > +	 * Translate restricted deletion flag (sticky bit, SVTX) set on the
> > > +	 * directory to following NFS4 ACEs prepended before any other ACEs:
> > > +	 *
> > > +	 *   ALLOW OWNER@ DELETE_CHILD                                 (1)
> > > +	 *   DENY EVERYONE@ DELETE_CHILD
> > > +	 *   INHERIT_ONLY NO_PROPAGATE DENY user_owner_of_dir DELETE   (3)
> > > +	 *   INHERIT_ONLY NO_PROPAGATE DENY OWNER@ DELETE              (4)
> > > +	 *   INHERIT_ONLY NO_PROPAGATE DENY user1 DELETE               (ACL user1)
> > > +	 *   ...
> > > +	 *   INHERIT_ONLY NO_PROPAGATE DENY userN DELETE               (ACL userN)
> > > +	 *   INHERIT_ONLY NO_PROPAGATE ALLOW OWNER@ DELETE
> > > +	 *
> > > +	 *   (1) - only if user-owner has write access
> > > +	 *   (3) - only if user-owner does not have write access
> > > +	 *   (4) - only if non-user-owner does not have write access
> > > +	 *   (ACL user1) - only if user1 does not have write access
> > > +	 *   (ACL userN) - only if userN does not have write access
> > > +	 *
> > > +	 * These ACEs describe behavior of set restricted deletion flag (sticky
> > > +	 * bit) on directory as they allow only owner of individual child entries
> > > +	 * and owner of the directory to delete individual child entries.
> > > +	 * Everyone else is denied to remove child entries in this directory.
> > > +	 *
> > > +	 * For deleting entry in NFS4 ACL model it is needed either DELETE_CHILD
> > > +	 * permission (access mask) from the parent directory or DELETE permission
> > > +	 * (access mask) on the child. Just one of these two permissions is enough.
> > > +	 * So if there is explicit DENY DELETE_CHILD on the parent together with
> > > +	 * explicit ALLOW DELETE on the child, it means that deleting is allowed
> > > +	 * (evaluation of permissions is independent in NFS4 ACL model). OWNER@
> > > +	 * for inherited ACEs means owner of the child entry and not the owner
> > > +	 * of the parent from which was ACE inherited.
> > > +	 *
> > > +	 * This translation is imperfect just for a case when some group
> > > +	 * (including group-owner and others-group) does not have write access
> > > +	 * to directory. Handled is only the edge case (via rule 4) when
> > > +	 * everyone else except owner has no write access to the directory.
> > > +	 * This information is not present in NFS4 ACL because it looks like that
> > > +	 * this is not possible to express in this ACL model. So for ACL consumer
> > > +	 * it could look like that owner of the file can delete its own file even
> > > +	 * when group or other mode bits of the directory do not allow it.
> > > +	 */
> > > +	if (flags & NFS4_ACL_DIR_STICKY) {
> > > +		/*
> > > +		 * Explicitly allow directory owner to delete child entries
> > > +		 * if directory owner has write access
> > > +		 */
> > > +		if (pas.owner & ACL_WRITE) {
> > > +			ace->type = NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE;
> > > +			ace->flag = 0;
> > > +			ace->access_mask = NFS4_ACE_DELETE_CHILD;
> > > +			ace->whotype = NFS4_ACL_WHO_OWNER;
> > > +			ace++;
> > > +			acl->naces++;
> > > +		}
> > > +
> > > +		/*
> > > +		 * And then deny deleting child entries for all other users
> > > +		 * which do not have explicit DELETE permission granted by
> > > +		 * last rule in this section.
> > > +		 */
> > > +		ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
> > > +		ace->flag = 0;
> > > +		ace->access_mask = NFS4_ACE_DELETE_CHILD;
> > > +		ace->whotype = NFS4_ACL_WHO_EVERYONE;
> > > +		ace++;
> > > +		acl->naces++;
> > > +
> > > +		/*
> > > +		 * Do not grant directory owner to delete child entries (by the
> > > +		 * last rule in this section) if it does not have write access.
> > > +		 */
> > > +		if (!(pas.owner & ACL_WRITE)) {
> > > +			ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
> > > +			ace->flag = NFS4_INHERITANCE_FLAGS |
> > > +				    NFS4_ACE_INHERIT_ONLY_ACE |
> > > +				    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
> > > +			ace->access_mask = NFS4_ACE_DELETE;
> > > +			ace->whotype = NFS4_ACL_WHO_NAMED;
> > > +			ace->who_uid = owner_uid;
> > > +			ace++;
> > > +			acl->naces++;
> > > +		}
> > > +
> > > +		if (!(pas.users & ACL_WRITE) && !(pas.group & ACL_WRITE) &&
> > > +		    !(pas.groups & ACL_WRITE) && !(pas.other & ACL_WRITE)) {
> > > +			/*
> > > +			 * Do not grant child owner who is not directory owner
> > > +			 * (handled by the first rule in this section) to
> > > +			 * delete own child entries if there is no possible
> > > +			 * directory write permission (checked for named users,
> > > +			 * group-owner, named groups and others-groups).
> > > +			 * This handles special edge case when only directory
> > > +			 * owner has write access to directory.
> > > +			 */
> > > +			ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
> > > +			ace->flag = NFS4_INHERITANCE_FLAGS |
> > > +				    NFS4_ACE_INHERIT_ONLY_ACE |
> > > +				    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
> > > +			ace->access_mask = NFS4_ACE_DELETE;
> > > +			ace->whotype = NFS4_ACL_WHO_OWNER;
> > > +			ace++;
> > > +			acl->naces++;
> > > +		} else {
> > > +			/*
> > > +			 * Do not grant individual named users to delete child
> > > +			 * entries (by the last rule in this section) if user
> > > +			 * does not have write access to directory.
> > > +			 */
> > > +			for (pa = pacl->a_entries + 1; pa->e_tag == ACL_USER; pa++) {
> > > +				if (pa->e_perm & pas.mask & ACL_WRITE)
> > > +					continue;
> > > +				ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
> > > +				ace->flag = NFS4_INHERITANCE_FLAGS |
> > > +					    NFS4_ACE_INHERIT_ONLY_ACE |
> > > +					    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
> > > +				ace->access_mask = NFS4_ACE_DELETE;
> > > +				ace->whotype = NFS4_ACL_WHO_NAMED;
> > > +				ace->who_uid = pa->e_uid;
> > > +				ace++;
> > > +				acl->naces++;
> > > +			}
> > > +		}
> > > +
> > > +		/*
> > > +		 * Above rules filtered out users which do not have write access
> > > +		 * to the directory. Now allow child-owner to delete its own
> > > +		 * directory entries.
> > > +		 */
> > > +		ace->type = NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE;
> > > +		ace->flag = NFS4_INHERITANCE_FLAGS |
> > > +			    NFS4_ACE_INHERIT_ONLY_ACE |
> > > +			    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
> > > +		ace->access_mask = NFS4_ACE_DELETE;
> > > +		ace->whotype = NFS4_ACL_WHO_OWNER;
> > > +		ace++;
> > > +		acl->naces++;
> > > +	}
> > > +
> > > +	pa = pacl->a_entries;
> > > +
> > >  	/* We could deny everything not granted by the owner: */
> > >  	deny = ~pas.owner;
> > >  	/*
> > > @@ -517,7 +665,8 @@ posix_state_to_acl(struct posix_acl_state *state, unsigned int flags)
> > >  
> > >  	pace = pacl->a_entries;
> > >  	pace->e_tag = ACL_USER_OBJ;
> > > -	low_mode_from_nfs4(state->owner.allow, &pace->e_perm, flags);
> > > +	/* owner is never affected by restricted deletion flag, so clear NFS4_ACL_DIR_STICKY */
> > > +	low_mode_from_nfs4(state->owner.allow, &pace->e_perm, flags & ~NFS4_ACL_DIR_STICKY);
> > >  
> > >  	for (i=0; i < state->users->n; i++) {
> > >  		pace++;
> > > @@ -691,9 +840,14 @@ static void process_one_v4_ace(struct posix_acl_state *state,
> > >  
> > >  static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
> > >  		struct posix_acl **pacl, struct posix_acl **dpacl,
> > > +		int *dir_sticky,
> > >  		unsigned int flags)
> > >  {
> > >  	struct posix_acl_state effective_acl_state, default_acl_state;
> > > +	bool dir_allow_nonowner_delete_child = false;
> > > +	bool dir_deny_everyone_delete_child = false;
> > > +	bool dir_allow_child_owner_delete = false;
> > > +	unsigned int eflags = 0;
> > >  	struct nfs4_ace *ace;
> > >  	int ret;
> > >  
> > > @@ -705,6 +859,28 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
> > >  		goto out_estate;
> > >  	ret = -EINVAL;
> > >  	for (ace = acl->aces; ace < acl->aces + acl->naces; ace++) {
> > > +		/*
> > > +		 * Process and parse ACE entry INHERIT_ONLY NO_PROPAGATE DELETE
> > > +		 * for detecting restricted deletion flag (sticky bit). Allow
> > > +		 * SYNCHRONIZE access mask to be present as NFS4 clients can
> > > +		 * include this access mask together with any other one.
> > > +		 *
> > > +		 * It needs to be done before validation as NFS4_SUPPORTED_FLAGS
> > > +		 * does not contain NFS4_ACE_NO_PROPAGATE_INHERIT_ACE and this
> > > +		 * ACE must not throw error.
> > > +		 */
> > > +		if ((flags & NFS4_ACL_DIR) &&
> > > +		    !(ace->flag & ~(NFS4_SUPPORTED_FLAGS|NFS4_ACE_NO_PROPAGATE_INHERIT_ACE)) &&
> > > +		    (ace->flag & NFS4_INHERITANCE_FLAGS) &&
> > > +		    (ace->flag & NFS4_ACE_INHERIT_ONLY_ACE) &&
> > > +		    (ace->flag & NFS4_ACE_NO_PROPAGATE_INHERIT_ACE) &&
> > > +		    (ace->access_mask & ~NFS4_ACE_SYNCHRONIZE) == NFS4_ACE_DELETE) {
> > > +			if (ace->type == NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE &&
> > > +			    ace->whotype == NFS4_ACL_WHO_OWNER)
> > > +				dir_allow_child_owner_delete = true;
> > > +			continue;
> > > +		}
> > > +
> > >  		if (ace->type != NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE &&
> > >  		    ace->type != NFS4_ACE_ACCESS_DENIED_ACE_TYPE)
> > >  			goto out_dstate;
> > > @@ -725,6 +901,38 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
> > >  
> > >  		if (!(ace->flag & NFS4_ACE_INHERIT_ONLY_ACE))
> > >  			process_one_v4_ace(&effective_acl_state, ace);
> > > +
> > > +		/*
> > > +		 * Process and parse ACE entry with DELETE_CHILD access mask
> > > +		 * for detecting restricted deletion flag (sticky bit).
> > > +		 */
> > > +		if ((flags & NFS4_ACL_DIR) &&
> > > +		    !(ace->flag & NFS4_ACE_INHERIT_ONLY_ACE) &&
> > > +		    (ace->access_mask & NFS4_ACE_DELETE_CHILD)) {
> > > +			if (ace->type == NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE &&
> > > +			    !dir_deny_everyone_delete_child &&
> > > +			    ace->whotype != NFS4_ACL_WHO_OWNER)
> > > +				dir_allow_nonowner_delete_child = true;
> > > +			else if (ace->type == NFS4_ACE_ACCESS_DENIED_ACE_TYPE &&
> > > +				 ace->whotype == NFS4_ACL_WHO_EVERYONE)
> > > +				dir_deny_everyone_delete_child = true;
> > > +		}
> > > +	}
> > > +
> > > +	/*
> > > +	 * Recognize restricted deletion flag (sticky bit) from directory ACL
> > > +	 * if ACEs on directory allow only owner of directory child entry to
> > > +	 * delete entry itself.
> > > +	 *
> > > +	 * This is relaxed check for rules generated by _posix_to_nfsv4_one().
> > > +	 * Relaxed check of restricted deletion flag is for security reasons
> > > +	 * and means that permissions would be more stricter, to prevent
> > > +	 * granting more access than what was specified in NFS4 ACL packet.
> > > +	 */
> > > +	if (flags & NFS4_ACL_DIR) {
> > > +		*dir_sticky = !dir_allow_nonowner_delete_child && dir_allow_child_owner_delete;
> > > +		if (*dir_sticky)
> > > +			eflags |= NFS4_ACL_DIR_STICKY;
> > >  	}
> > >  
> > >  	/*
> > > @@ -750,7 +958,7 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
> > >  			default_acl_state.other = effective_acl_state.other;
> > >  	}
> > >  
> > > -	*pacl = posix_state_to_acl(&effective_acl_state, flags);
> > > +	*pacl = posix_state_to_acl(&effective_acl_state, flags | eflags);
> > >  	if (IS_ERR(*pacl)) {
> > >  		ret = PTR_ERR(*pacl);
> > >  		*pacl = NULL;
> > > @@ -776,19 +984,23 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
> > >  }
> > >  
> > >  __be32 nfsd4_acl_to_attr(enum nfs_ftype4 type, struct nfs4_acl *acl,
> > > -			 struct nfsd_attrs *attr)
> > > +			 struct nfsd_attrs *attr, int *dir_sticky)
> > >  {
> > >  	int host_error;
> > >  	unsigned int flags = 0;
> > >  
> > > -	if (!acl)
> > > +	if (!acl) {
> > > +		if (type == NF4DIR)
> > > +			*dir_sticky = -1;
> > >  		return nfs_ok;
> > > +	}
> > >  
> > >  	if (type == NF4DIR)
> > >  		flags = NFS4_ACL_DIR;
> > >  
> > >  	host_error = nfs4_acl_nfsv4_to_posix(acl, &attr->na_pacl,
> > > -					     &attr->na_dpacl, flags);
> > > +					     &attr->na_dpacl, dir_sticky,
> > > +					     flags);
> > >  	if (host_error == -EINVAL)
> > >  		return nfserr_attrnotsupp;
> > >  	else
> > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > index 0f67f4a7b8b2..56aeb745d108 100644
> > > --- a/fs/nfsd/nfs4proc.c
> > > +++ b/fs/nfsd/nfs4proc.c
> > > @@ -259,7 +259,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > >  		return nfserrno(host_err);
> > >  
> > >  	if (is_create_with_attrs(open))
> > > -		nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs);
> > > +		nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs, NULL);
> > >  
> > >  	inode_lock_nested(inode, I_MUTEX_PARENT);
> > >  
> > > @@ -791,6 +791,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > >  	};
> > >  	struct svc_fh resfh;
> > >  	__be32 status;
> > > +	int dir_sticky;
> > >  	dev_t rdev;
> > >  
> > >  	fh_init(&resfh, NFS4_FHSIZE);
> > > @@ -804,7 +805,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > >  	if (status)
> > >  		return status;
> > >  
> > > -	status = nfsd4_acl_to_attr(create->cr_type, create->cr_acl, &attrs);
> > > +	status = nfsd4_acl_to_attr(create->cr_type, create->cr_acl, &attrs, &dir_sticky);
> > >  	current->fs->umask = create->cr_umask;
> > >  	switch (create->cr_type) {
> > >  	case NF4LNK:
> > > @@ -848,6 +849,11 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > >  		break;
> > >  
> > >  	case NF4DIR:
> > > +		if (dir_sticky == 1) {
> > > +			/* Set directory sticky bit deduced from the ACL attr. */
> > > +			create->cr_iattr.ia_valid |= ATTR_MODE;
> > > +			create->cr_iattr.ia_mode |= S_ISVTX;
> > > +		}
> > >  		create->cr_iattr.ia_valid &= ~ATTR_SIZE;
> > >  		status = nfsd_create(rqstp, &cstate->current_fh,
> > >  				     create->cr_name, create->cr_namelen,
> > > @@ -1144,6 +1150,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > >  	struct inode *inode;
> > >  	__be32 status = nfs_ok;
> > >  	bool save_no_wcc;
> > > +	int dir_sticky;
> > >  	int err;
> > >  
> > >  	if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
> > > @@ -1165,10 +1172,26 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > >  
> > >  	inode = cstate->current_fh.fh_dentry->d_inode;
> > >  	status = nfsd4_acl_to_attr(S_ISDIR(inode->i_mode) ? NF4DIR : NF4REG,
> > > -				   setattr->sa_acl, &attrs);
> > > -
> > > +				   setattr->sa_acl, &attrs, &dir_sticky);
> > >  	if (status)
> > >  		goto out;
> > > +
> > > +	if (S_ISDIR(inode->i_mode) && dir_sticky != -1 && !!(inode->i_mode & S_ISVTX) != dir_sticky) {
> > > +		/*
> > > +		 * Set directory sticky bit deduced from the ACL attr.
> > > +		 * Do not clear sticky bit if it was explicitly set in MODE attr
> > > +		 * but was not deduced from ACL attr because clients can send
> > > +		 * both MODE and ACL attrs where sticky bit is only in MODE attr.
> > > +		 */
> > > +		if (!(attrs.na_iattr->ia_valid & ATTR_MODE))
> > > +			attrs.na_iattr->ia_mode = inode->i_mode;
> > > +		if (dir_sticky)
> > > +			attrs.na_iattr->ia_mode |= S_ISVTX;
> > > +		else if (!(attrs.na_iattr->ia_valid & ATTR_MODE))
> > > +			attrs.na_iattr->ia_mode &= ~S_ISVTX;
> > > +		attrs.na_iattr->ia_valid |= ATTR_MODE;
> > > +	}
> > > +
> > >  	save_no_wcc = cstate->current_fh.fh_no_wcc;
> > >  	cstate->current_fh.fh_no_wcc = true;
> > >  	status = nfsd_setattr(rqstp, &cstate->current_fh, &attrs, NULL);
> > > -- 
> > > 2.20.1
> > > 
> > 
> > Hi Pali -
> > 
> > Apologies for the delayed response.
> > 
> > Being somewhat un-expert in things ACL, I'm not sure if this is the
> > correct approach, or if it's right for the POSIX ACL-only
> > implementation we have in Linux. I'm going to research this a bit
> > and get back to you.
> > 
> > -- 
> > Chuck Lever
> 
> Hello Chuck, thank you reply.
> 
> Just to note that this does not affect POSIX ACL-only storage on Linux
> server. Everything is same as before (it is POSIX ACL-only and
> everything is evaluated via POSIX ACLs).
> 
> This change is just what Linux NFS4 server provides to NFS4 clients,
> i.e. not for POSIX clients. As Linux NFS4 server already maps POSIX-ACL
> into NFS4-ACL format, I think that it makes sense to also map
> POSIX-sticky bit into NFS4-ACL format. It allows better interop for NFS4
> clients which use NFS4 ACLs.
> 
> What is this change trying to achieve is to allow Linux server to serve
> POSIX things (like sticky bit) for NFS4-ACL clients which are not POSIX
> aware, and serve this sticky bit in NFS4-ACL language. And same for
> opposite direction, to translate NFS4 ACL rules which describe sticky
> bit into the POSIX sticky bit in mode.

The fundamental claim from your patch description is that:

> > > the NFS4 ACL client is not
> > > aware of the fact that it cannot delete some files if the
> > > sticky bit is set on the server on parent directory.

POSIX-based clients are in fact aware of this additional constraint
because they can see the set of mode bits returned by GETATTR.

So can non-POSIX clients for that matter; although they might not
natively understand what that bit means, their NFS client can impart
that meaning.

I can find no spec mandate or guidance that requires this mapping,
nor can I find any other NFS server implementations that add it.
If this is indeed a valuable innovation, a standard that recommends
or requires implementation of this feature would be the place to
begin.

What RFC 8881 does say is on point:

> 6.3.1.1. Server Considerations

> The server uses the algorithm described in Section 6.2.1 to
> determine whether an ACL allows access to an object. However, the
> ACL might not be the sole determiner of access.

A list of examples follows. The spirit of this text seems to be that
a file object's ACL need not reflect every possible security policy
that a server might use to determine whether an operation may
proceed.
Pali Rohár Sept. 20, 2024, 7:29 p.m. UTC | #4
On Friday 20 September 2024 15:19:59 Chuck Lever wrote:
> On Tue, Sep 17, 2024 at 06:10:50PM +0200, Pali Rohár wrote:
> > On Tuesday 17 September 2024 11:39:58 Chuck Lever wrote:
> > > On Fri, Sep 13, 2024 at 12:15:23AM +0200, Pali Rohár wrote:
> > > > Currently the directory restricted deletion flag (sticky bit, SVTX) is not
> > > > mapped into NFS4 ACL by knfsd. Which means that the NFS4 ACL client is not
> > > > aware of the fact that it cannot delete some files if the sticky bit is set
> > > > on the server on parent directory. If the client copies sticky bit
> > > > directory recursively to some other storage which implements the NFS4-like
> > > > ACL (e.g. to NTFS filesystem or to SMB server) then the directory's
> > > > restricted deletion flag is completely lost.
> > > > 
> > > > This change aims to improve interoperability of the POSIX directory
> > > > restricted deletion flag (sticky bit, SVTX) and NFS4 ACL structure in
> > > > knfsd. It transparently maps POSIX directory's sticky bit into NFS4 ACL
> > > > and vice-versa similarly like it already maps POSIX ACL into NFS4 ACL.
> > > > It covers NFS4 GETATTR ACL, NFS4 SETATTR ACL, NFS4 CREATE+OPEN operations.
> > > > When creating a new object over NFS4, it is possible to optionally specify
> > > > NFS4 ACL, so this is covered too.
> > > > 
> > > > For client SETATTR ACL, CREATE with ACL and OPEN-create with ACL
> > > > operations, detection pattern for restricted deletion flag is ACE entry
> > > > INHERIT_ONLY NO_PROPAGATE ALLOW OWNER@ DELETE together with check that
> > > > nobody except the OWNER@ has DELETE_CHILD permission. Note that the OWNER@
> > > > does not have to have DELETE_CHILD permission in case it does not have
> > > > write access to directory.
> > > > 
> > > > For client GETATTR ACL operation, when restricted deletion flag is present
> > > > in inode, following ACE entries are prepended before any other ACEs:
> > > > 
> > > >   ALLOW OWNER@ DELETE_CHILD                                 (1)
> > > >   DENY EVERYONE@ DELETE_CHILD
> > > >   INHERIT_ONLY NO_PROPAGATE DENY user_owner_of_dir DELETE   (3)
> > > >   INHERIT_ONLY NO_PROPAGATE DENY OWNER@ DELETE              (4)
> > > >   INHERIT_ONLY NO_PROPAGATE DENY user1 DELETE               (ACL user1)
> > > >   ...
> > > >   INHERIT_ONLY NO_PROPAGATE DENY userN DELETE               (ACL userN)
> > > >   INHERIT_ONLY NO_PROPAGATE ALLOW OWNER@ DELETE
> > > > 
> > > > ACE entry (1) is present only when user OWNER@ has write access (can modify
> > > > directory, including removing child entries), because it is possible to
> > > > have sticky bit set also on read-only directory.
> > > > 
> > > > ACE entry (3) is present only when user OWNER@ does not have write access
> > > > (cannot modify directory) and it is required to override effect of the last
> > > > ACE entry which allows child entry OWNER@ to remove entry itself. This is
> > > > needed for example for POSIX mode 1577.
> > > > 
> > > > ACE entry (4) is present only when anybody else except the directory owner
> > > > has no write access to the directory. This is the case when sticky bit is
> > > > set but nobody can use it because of missing directory write access. So
> > > > this explicit DENY covers this edge case.
> > > > 
> > > > ACE entries (ACL user1...userN) are for POSIX users which do not have write
> > > > access to directory and therefore they cannot remove directory entries
> > > > which they own. This is again needed for overriding the effect of the last
> > > > ACE entry.
> > > > 
> > > > When restricted deletion flag is not present then nothing is added.
> > > > 
> > > > This is probably the best approximation of the directory restricted
> > > > deletion flag. It covers directory's OWNER@ grant access, child OWNER@
> > > > grant access and also restrictions for all other users. It also covers the
> > > > situation when OWNER@ or some POSIX user does not have write access to the
> > > > directory, and also covers situation when nobody except directory owner has
> > > > write access to the directory.
> > > > 
> > > > What this does not cover is the restriction when some POSIX group does not
> > > > have directory write access, and another POSIX group has directory write
> > > > access. This is probably not possible to express in NFS4 ACL language
> > > > without ability to evaluate if user is member of some group or not. NFS4
> > > > ACL in this case says for the owner of the file that the delete operation
> > > > is allowed.
> > > > 
> > > > The whole change is only about the mapping of the sticky bit into NFS4 ACL
> > > > and only for NFS4 GET and SET ACL operations. It does not change any access
> > > > permission checks.
> > > > 
> > > > Signed-off-by: Pali Rohár <pali@kernel.org>
> > > > ---
> > > >  fs/nfsd/acl.h      |   2 +-
> > > >  fs/nfsd/nfs4acl.c  | 242 ++++++++++++++++++++++++++++++++++++++++++---
> > > >  fs/nfsd/nfs4proc.c |  31 +++++-
> > > >  3 files changed, 255 insertions(+), 20 deletions(-)
> > > > 
> > > > diff --git a/fs/nfsd/acl.h b/fs/nfsd/acl.h
> > > > index 4b7324458a94..e7e7909bf03a 100644
> > > > --- a/fs/nfsd/acl.h
> > > > +++ b/fs/nfsd/acl.h
> > > > @@ -48,6 +48,6 @@ __be32 nfs4_acl_write_who(struct xdr_stream *xdr, int who);
> > > >  int nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
> > > >  		struct nfs4_acl **acl);
> > > >  __be32 nfsd4_acl_to_attr(enum nfs_ftype4 type, struct nfs4_acl *acl,
> > > > -			 struct nfsd_attrs *attr);
> > > > +			 struct nfsd_attrs *attr, int *dir_sticky);
> > > >  
> > > >  #endif /* LINUX_NFS4_ACL_H */
> > > > diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c
> > > > index 96e786b5e544..6a53772c2a13 100644
> > > > --- a/fs/nfsd/nfs4acl.c
> > > > +++ b/fs/nfsd/nfs4acl.c
> > > > @@ -46,6 +46,7 @@
> > > >  #define NFS4_ACL_TYPE_DEFAULT	0x01
> > > >  #define NFS4_ACL_DIR		0x02
> > > >  #define NFS4_ACL_OWNER		0x04
> > > > +#define NFS4_ACL_DIR_STICKY	0x08
> > > >  
> > > >  /* mode bit translations: */
> > > >  #define NFS4_READ_MODE (NFS4_ACE_READ_DATA)
> > > > @@ -73,7 +74,7 @@ mask_from_posix(unsigned short perm, unsigned int flags)
> > > >  		mask |= NFS4_READ_MODE;
> > > >  	if (perm & ACL_WRITE)
> > > >  		mask |= NFS4_WRITE_MODE;
> > > > -	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR))
> > > > +	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR) && !(flags & NFS4_ACL_DIR_STICKY))
> > > >  		mask |= NFS4_ACE_DELETE_CHILD;
> > > >  	if (perm & ACL_EXECUTE)
> > > >  		mask |= NFS4_EXECUTE_MODE;
> > > > @@ -89,7 +90,7 @@ deny_mask_from_posix(unsigned short perm, u32 flags)
> > > >  		mask |= NFS4_READ_MODE;
> > > >  	if (perm & ACL_WRITE)
> > > >  		mask |= NFS4_WRITE_MODE;
> > > > -	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR))
> > > > +	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR) && !(flags & NFS4_ACL_DIR_STICKY))
> > > >  		mask |= NFS4_ACE_DELETE_CHILD;
> > > >  	if (perm & ACL_EXECUTE)
> > > >  		mask |= NFS4_EXECUTE_MODE;
> > > > @@ -110,7 +111,7 @@ low_mode_from_nfs4(u32 perm, unsigned short *mode, unsigned int flags)
> > > >  {
> > > >  	u32 write_mode = NFS4_WRITE_MODE;
> > > >  
> > > > -	if (flags & NFS4_ACL_DIR)
> > > > +	if ((flags & NFS4_ACL_DIR) && !(flags | NFS4_ACL_DIR_STICKY))
> > > >  		write_mode |= NFS4_ACE_DELETE_CHILD;
> > > >  	*mode = 0;
> > > >  	if ((perm & NFS4_READ_MODE) == NFS4_READ_MODE)
> > > > @@ -123,7 +124,7 @@ low_mode_from_nfs4(u32 perm, unsigned short *mode, unsigned int flags)
> > > >  
> > > >  static short ace2type(struct nfs4_ace *);
> > > >  static void _posix_to_nfsv4_one(struct posix_acl *, struct nfs4_acl *,
> > > > -				unsigned int);
> > > > +				unsigned int, kuid_t);
> > > >  
> > > >  int
> > > >  nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
> > > > @@ -142,8 +143,11 @@ nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
> > > >  	if (IS_ERR(pacl))
> > > >  		return PTR_ERR(pacl);
> > > >  
> > > > -	/* allocate for worst case: one (deny, allow) pair each: */
> > > > -	size += 2 * pacl->a_count;
> > > > +	/*
> > > > +	 * allocate for worst case: one (deny, allow) pair for each
> > > > +	 * plus for restricted deletion flag (sticky bit): 4 + one for each
> > > > +	 */
> > > > +	size += 2 * pacl->a_count + (4 + pacl->a_count);
> > > >  
> > > >  	if (S_ISDIR(inode->i_mode)) {
> > > >  		flags = NFS4_ACL_DIR;
> > > > @@ -155,6 +159,10 @@ nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
> > > >  
> > > >  		if (dpacl)
> > > >  			size += 2 * dpacl->a_count;
> > > > +
> > > > +		/* propagate restricted deletion flag (sticky bit) */
> > > > +		if (inode->i_mode & S_ISVTX)
> > > > +			flags |= NFS4_ACL_DIR_STICKY;
> > > >  	}
> > > >  
> > > >  	*acl = kmalloc(nfs4_acl_bytes(size), GFP_KERNEL);
> > > > @@ -164,10 +172,10 @@ nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
> > > >  	}
> > > >  	(*acl)->naces = 0;
> > > >  
> > > > -	_posix_to_nfsv4_one(pacl, *acl, flags & ~NFS4_ACL_TYPE_DEFAULT);
> > > > +	_posix_to_nfsv4_one(pacl, *acl, flags & ~NFS4_ACL_TYPE_DEFAULT, inode->i_uid);
> > > >  
> > > >  	if (dpacl)
> > > > -		_posix_to_nfsv4_one(dpacl, *acl, flags | NFS4_ACL_TYPE_DEFAULT);
> > > > +		_posix_to_nfsv4_one(dpacl, *acl, (flags | NFS4_ACL_TYPE_DEFAULT) & ~NFS4_ACL_DIR_STICKY, inode->i_uid);
> > > >  
> > > >  out:
> > > >  	posix_acl_release(dpacl);
> > > > @@ -231,7 +239,7 @@ summarize_posix_acl(struct posix_acl *acl, struct posix_acl_summary *pas)
> > > >  /* We assume the acl has been verified with posix_acl_valid. */
> > > >  static void
> > > >  _posix_to_nfsv4_one(struct posix_acl *pacl, struct nfs4_acl *acl,
> > > > -						unsigned int flags)
> > > > +		    unsigned int flags, kuid_t owner_uid)
> > > >  {
> > > >  	struct posix_acl_entry *pa, *group_owner_entry;
> > > >  	struct nfs4_ace *ace;
> > > > @@ -243,9 +251,149 @@ _posix_to_nfsv4_one(struct posix_acl *pacl, struct nfs4_acl *acl,
> > > >  	BUG_ON(pacl->a_count < 3);
> > > >  	summarize_posix_acl(pacl, &pas);
> > > >  
> > > > -	pa = pacl->a_entries;
> > > >  	ace = acl->aces + acl->naces;
> > > >  
> > > > +	/*
> > > > +	 * Translate restricted deletion flag (sticky bit, SVTX) set on the
> > > > +	 * directory to following NFS4 ACEs prepended before any other ACEs:
> > > > +	 *
> > > > +	 *   ALLOW OWNER@ DELETE_CHILD                                 (1)
> > > > +	 *   DENY EVERYONE@ DELETE_CHILD
> > > > +	 *   INHERIT_ONLY NO_PROPAGATE DENY user_owner_of_dir DELETE   (3)
> > > > +	 *   INHERIT_ONLY NO_PROPAGATE DENY OWNER@ DELETE              (4)
> > > > +	 *   INHERIT_ONLY NO_PROPAGATE DENY user1 DELETE               (ACL user1)
> > > > +	 *   ...
> > > > +	 *   INHERIT_ONLY NO_PROPAGATE DENY userN DELETE               (ACL userN)
> > > > +	 *   INHERIT_ONLY NO_PROPAGATE ALLOW OWNER@ DELETE
> > > > +	 *
> > > > +	 *   (1) - only if user-owner has write access
> > > > +	 *   (3) - only if user-owner does not have write access
> > > > +	 *   (4) - only if non-user-owner does not have write access
> > > > +	 *   (ACL user1) - only if user1 does not have write access
> > > > +	 *   (ACL userN) - only if userN does not have write access
> > > > +	 *
> > > > +	 * These ACEs describe behavior of set restricted deletion flag (sticky
> > > > +	 * bit) on directory as they allow only owner of individual child entries
> > > > +	 * and owner of the directory to delete individual child entries.
> > > > +	 * Everyone else is denied to remove child entries in this directory.
> > > > +	 *
> > > > +	 * For deleting entry in NFS4 ACL model it is needed either DELETE_CHILD
> > > > +	 * permission (access mask) from the parent directory or DELETE permission
> > > > +	 * (access mask) on the child. Just one of these two permissions is enough.
> > > > +	 * So if there is explicit DENY DELETE_CHILD on the parent together with
> > > > +	 * explicit ALLOW DELETE on the child, it means that deleting is allowed
> > > > +	 * (evaluation of permissions is independent in NFS4 ACL model). OWNER@
> > > > +	 * for inherited ACEs means owner of the child entry and not the owner
> > > > +	 * of the parent from which was ACE inherited.
> > > > +	 *
> > > > +	 * This translation is imperfect just for a case when some group
> > > > +	 * (including group-owner and others-group) does not have write access
> > > > +	 * to directory. Handled is only the edge case (via rule 4) when
> > > > +	 * everyone else except owner has no write access to the directory.
> > > > +	 * This information is not present in NFS4 ACL because it looks like that
> > > > +	 * this is not possible to express in this ACL model. So for ACL consumer
> > > > +	 * it could look like that owner of the file can delete its own file even
> > > > +	 * when group or other mode bits of the directory do not allow it.
> > > > +	 */
> > > > +	if (flags & NFS4_ACL_DIR_STICKY) {
> > > > +		/*
> > > > +		 * Explicitly allow directory owner to delete child entries
> > > > +		 * if directory owner has write access
> > > > +		 */
> > > > +		if (pas.owner & ACL_WRITE) {
> > > > +			ace->type = NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE;
> > > > +			ace->flag = 0;
> > > > +			ace->access_mask = NFS4_ACE_DELETE_CHILD;
> > > > +			ace->whotype = NFS4_ACL_WHO_OWNER;
> > > > +			ace++;
> > > > +			acl->naces++;
> > > > +		}
> > > > +
> > > > +		/*
> > > > +		 * And then deny deleting child entries for all other users
> > > > +		 * which do not have explicit DELETE permission granted by
> > > > +		 * last rule in this section.
> > > > +		 */
> > > > +		ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
> > > > +		ace->flag = 0;
> > > > +		ace->access_mask = NFS4_ACE_DELETE_CHILD;
> > > > +		ace->whotype = NFS4_ACL_WHO_EVERYONE;
> > > > +		ace++;
> > > > +		acl->naces++;
> > > > +
> > > > +		/*
> > > > +		 * Do not grant directory owner to delete child entries (by the
> > > > +		 * last rule in this section) if it does not have write access.
> > > > +		 */
> > > > +		if (!(pas.owner & ACL_WRITE)) {
> > > > +			ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
> > > > +			ace->flag = NFS4_INHERITANCE_FLAGS |
> > > > +				    NFS4_ACE_INHERIT_ONLY_ACE |
> > > > +				    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
> > > > +			ace->access_mask = NFS4_ACE_DELETE;
> > > > +			ace->whotype = NFS4_ACL_WHO_NAMED;
> > > > +			ace->who_uid = owner_uid;
> > > > +			ace++;
> > > > +			acl->naces++;
> > > > +		}
> > > > +
> > > > +		if (!(pas.users & ACL_WRITE) && !(pas.group & ACL_WRITE) &&
> > > > +		    !(pas.groups & ACL_WRITE) && !(pas.other & ACL_WRITE)) {
> > > > +			/*
> > > > +			 * Do not grant child owner who is not directory owner
> > > > +			 * (handled by the first rule in this section) to
> > > > +			 * delete own child entries if there is no possible
> > > > +			 * directory write permission (checked for named users,
> > > > +			 * group-owner, named groups and others-groups).
> > > > +			 * This handles special edge case when only directory
> > > > +			 * owner has write access to directory.
> > > > +			 */
> > > > +			ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
> > > > +			ace->flag = NFS4_INHERITANCE_FLAGS |
> > > > +				    NFS4_ACE_INHERIT_ONLY_ACE |
> > > > +				    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
> > > > +			ace->access_mask = NFS4_ACE_DELETE;
> > > > +			ace->whotype = NFS4_ACL_WHO_OWNER;
> > > > +			ace++;
> > > > +			acl->naces++;
> > > > +		} else {
> > > > +			/*
> > > > +			 * Do not grant individual named users to delete child
> > > > +			 * entries (by the last rule in this section) if user
> > > > +			 * does not have write access to directory.
> > > > +			 */
> > > > +			for (pa = pacl->a_entries + 1; pa->e_tag == ACL_USER; pa++) {
> > > > +				if (pa->e_perm & pas.mask & ACL_WRITE)
> > > > +					continue;
> > > > +				ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
> > > > +				ace->flag = NFS4_INHERITANCE_FLAGS |
> > > > +					    NFS4_ACE_INHERIT_ONLY_ACE |
> > > > +					    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
> > > > +				ace->access_mask = NFS4_ACE_DELETE;
> > > > +				ace->whotype = NFS4_ACL_WHO_NAMED;
> > > > +				ace->who_uid = pa->e_uid;
> > > > +				ace++;
> > > > +				acl->naces++;
> > > > +			}
> > > > +		}
> > > > +
> > > > +		/*
> > > > +		 * Above rules filtered out users which do not have write access
> > > > +		 * to the directory. Now allow child-owner to delete its own
> > > > +		 * directory entries.
> > > > +		 */
> > > > +		ace->type = NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE;
> > > > +		ace->flag = NFS4_INHERITANCE_FLAGS |
> > > > +			    NFS4_ACE_INHERIT_ONLY_ACE |
> > > > +			    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
> > > > +		ace->access_mask = NFS4_ACE_DELETE;
> > > > +		ace->whotype = NFS4_ACL_WHO_OWNER;
> > > > +		ace++;
> > > > +		acl->naces++;
> > > > +	}
> > > > +
> > > > +	pa = pacl->a_entries;
> > > > +
> > > >  	/* We could deny everything not granted by the owner: */
> > > >  	deny = ~pas.owner;
> > > >  	/*
> > > > @@ -517,7 +665,8 @@ posix_state_to_acl(struct posix_acl_state *state, unsigned int flags)
> > > >  
> > > >  	pace = pacl->a_entries;
> > > >  	pace->e_tag = ACL_USER_OBJ;
> > > > -	low_mode_from_nfs4(state->owner.allow, &pace->e_perm, flags);
> > > > +	/* owner is never affected by restricted deletion flag, so clear NFS4_ACL_DIR_STICKY */
> > > > +	low_mode_from_nfs4(state->owner.allow, &pace->e_perm, flags & ~NFS4_ACL_DIR_STICKY);
> > > >  
> > > >  	for (i=0; i < state->users->n; i++) {
> > > >  		pace++;
> > > > @@ -691,9 +840,14 @@ static void process_one_v4_ace(struct posix_acl_state *state,
> > > >  
> > > >  static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
> > > >  		struct posix_acl **pacl, struct posix_acl **dpacl,
> > > > +		int *dir_sticky,
> > > >  		unsigned int flags)
> > > >  {
> > > >  	struct posix_acl_state effective_acl_state, default_acl_state;
> > > > +	bool dir_allow_nonowner_delete_child = false;
> > > > +	bool dir_deny_everyone_delete_child = false;
> > > > +	bool dir_allow_child_owner_delete = false;
> > > > +	unsigned int eflags = 0;
> > > >  	struct nfs4_ace *ace;
> > > >  	int ret;
> > > >  
> > > > @@ -705,6 +859,28 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
> > > >  		goto out_estate;
> > > >  	ret = -EINVAL;
> > > >  	for (ace = acl->aces; ace < acl->aces + acl->naces; ace++) {
> > > > +		/*
> > > > +		 * Process and parse ACE entry INHERIT_ONLY NO_PROPAGATE DELETE
> > > > +		 * for detecting restricted deletion flag (sticky bit). Allow
> > > > +		 * SYNCHRONIZE access mask to be present as NFS4 clients can
> > > > +		 * include this access mask together with any other one.
> > > > +		 *
> > > > +		 * It needs to be done before validation as NFS4_SUPPORTED_FLAGS
> > > > +		 * does not contain NFS4_ACE_NO_PROPAGATE_INHERIT_ACE and this
> > > > +		 * ACE must not throw error.
> > > > +		 */
> > > > +		if ((flags & NFS4_ACL_DIR) &&
> > > > +		    !(ace->flag & ~(NFS4_SUPPORTED_FLAGS|NFS4_ACE_NO_PROPAGATE_INHERIT_ACE)) &&
> > > > +		    (ace->flag & NFS4_INHERITANCE_FLAGS) &&
> > > > +		    (ace->flag & NFS4_ACE_INHERIT_ONLY_ACE) &&
> > > > +		    (ace->flag & NFS4_ACE_NO_PROPAGATE_INHERIT_ACE) &&
> > > > +		    (ace->access_mask & ~NFS4_ACE_SYNCHRONIZE) == NFS4_ACE_DELETE) {
> > > > +			if (ace->type == NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE &&
> > > > +			    ace->whotype == NFS4_ACL_WHO_OWNER)
> > > > +				dir_allow_child_owner_delete = true;
> > > > +			continue;
> > > > +		}
> > > > +
> > > >  		if (ace->type != NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE &&
> > > >  		    ace->type != NFS4_ACE_ACCESS_DENIED_ACE_TYPE)
> > > >  			goto out_dstate;
> > > > @@ -725,6 +901,38 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
> > > >  
> > > >  		if (!(ace->flag & NFS4_ACE_INHERIT_ONLY_ACE))
> > > >  			process_one_v4_ace(&effective_acl_state, ace);
> > > > +
> > > > +		/*
> > > > +		 * Process and parse ACE entry with DELETE_CHILD access mask
> > > > +		 * for detecting restricted deletion flag (sticky bit).
> > > > +		 */
> > > > +		if ((flags & NFS4_ACL_DIR) &&
> > > > +		    !(ace->flag & NFS4_ACE_INHERIT_ONLY_ACE) &&
> > > > +		    (ace->access_mask & NFS4_ACE_DELETE_CHILD)) {
> > > > +			if (ace->type == NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE &&
> > > > +			    !dir_deny_everyone_delete_child &&
> > > > +			    ace->whotype != NFS4_ACL_WHO_OWNER)
> > > > +				dir_allow_nonowner_delete_child = true;
> > > > +			else if (ace->type == NFS4_ACE_ACCESS_DENIED_ACE_TYPE &&
> > > > +				 ace->whotype == NFS4_ACL_WHO_EVERYONE)
> > > > +				dir_deny_everyone_delete_child = true;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	/*
> > > > +	 * Recognize restricted deletion flag (sticky bit) from directory ACL
> > > > +	 * if ACEs on directory allow only owner of directory child entry to
> > > > +	 * delete entry itself.
> > > > +	 *
> > > > +	 * This is relaxed check for rules generated by _posix_to_nfsv4_one().
> > > > +	 * Relaxed check of restricted deletion flag is for security reasons
> > > > +	 * and means that permissions would be more stricter, to prevent
> > > > +	 * granting more access than what was specified in NFS4 ACL packet.
> > > > +	 */
> > > > +	if (flags & NFS4_ACL_DIR) {
> > > > +		*dir_sticky = !dir_allow_nonowner_delete_child && dir_allow_child_owner_delete;
> > > > +		if (*dir_sticky)
> > > > +			eflags |= NFS4_ACL_DIR_STICKY;
> > > >  	}
> > > >  
> > > >  	/*
> > > > @@ -750,7 +958,7 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
> > > >  			default_acl_state.other = effective_acl_state.other;
> > > >  	}
> > > >  
> > > > -	*pacl = posix_state_to_acl(&effective_acl_state, flags);
> > > > +	*pacl = posix_state_to_acl(&effective_acl_state, flags | eflags);
> > > >  	if (IS_ERR(*pacl)) {
> > > >  		ret = PTR_ERR(*pacl);
> > > >  		*pacl = NULL;
> > > > @@ -776,19 +984,23 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
> > > >  }
> > > >  
> > > >  __be32 nfsd4_acl_to_attr(enum nfs_ftype4 type, struct nfs4_acl *acl,
> > > > -			 struct nfsd_attrs *attr)
> > > > +			 struct nfsd_attrs *attr, int *dir_sticky)
> > > >  {
> > > >  	int host_error;
> > > >  	unsigned int flags = 0;
> > > >  
> > > > -	if (!acl)
> > > > +	if (!acl) {
> > > > +		if (type == NF4DIR)
> > > > +			*dir_sticky = -1;
> > > >  		return nfs_ok;
> > > > +	}
> > > >  
> > > >  	if (type == NF4DIR)
> > > >  		flags = NFS4_ACL_DIR;
> > > >  
> > > >  	host_error = nfs4_acl_nfsv4_to_posix(acl, &attr->na_pacl,
> > > > -					     &attr->na_dpacl, flags);
> > > > +					     &attr->na_dpacl, dir_sticky,
> > > > +					     flags);
> > > >  	if (host_error == -EINVAL)
> > > >  		return nfserr_attrnotsupp;
> > > >  	else
> > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > index 0f67f4a7b8b2..56aeb745d108 100644
> > > > --- a/fs/nfsd/nfs4proc.c
> > > > +++ b/fs/nfsd/nfs4proc.c
> > > > @@ -259,7 +259,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > >  		return nfserrno(host_err);
> > > >  
> > > >  	if (is_create_with_attrs(open))
> > > > -		nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs);
> > > > +		nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs, NULL);
> > > >  
> > > >  	inode_lock_nested(inode, I_MUTEX_PARENT);
> > > >  
> > > > @@ -791,6 +791,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > >  	};
> > > >  	struct svc_fh resfh;
> > > >  	__be32 status;
> > > > +	int dir_sticky;
> > > >  	dev_t rdev;
> > > >  
> > > >  	fh_init(&resfh, NFS4_FHSIZE);
> > > > @@ -804,7 +805,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > >  	if (status)
> > > >  		return status;
> > > >  
> > > > -	status = nfsd4_acl_to_attr(create->cr_type, create->cr_acl, &attrs);
> > > > +	status = nfsd4_acl_to_attr(create->cr_type, create->cr_acl, &attrs, &dir_sticky);
> > > >  	current->fs->umask = create->cr_umask;
> > > >  	switch (create->cr_type) {
> > > >  	case NF4LNK:
> > > > @@ -848,6 +849,11 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > >  		break;
> > > >  
> > > >  	case NF4DIR:
> > > > +		if (dir_sticky == 1) {
> > > > +			/* Set directory sticky bit deduced from the ACL attr. */
> > > > +			create->cr_iattr.ia_valid |= ATTR_MODE;
> > > > +			create->cr_iattr.ia_mode |= S_ISVTX;
> > > > +		}
> > > >  		create->cr_iattr.ia_valid &= ~ATTR_SIZE;
> > > >  		status = nfsd_create(rqstp, &cstate->current_fh,
> > > >  				     create->cr_name, create->cr_namelen,
> > > > @@ -1144,6 +1150,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > >  	struct inode *inode;
> > > >  	__be32 status = nfs_ok;
> > > >  	bool save_no_wcc;
> > > > +	int dir_sticky;
> > > >  	int err;
> > > >  
> > > >  	if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
> > > > @@ -1165,10 +1172,26 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > >  
> > > >  	inode = cstate->current_fh.fh_dentry->d_inode;
> > > >  	status = nfsd4_acl_to_attr(S_ISDIR(inode->i_mode) ? NF4DIR : NF4REG,
> > > > -				   setattr->sa_acl, &attrs);
> > > > -
> > > > +				   setattr->sa_acl, &attrs, &dir_sticky);
> > > >  	if (status)
> > > >  		goto out;
> > > > +
> > > > +	if (S_ISDIR(inode->i_mode) && dir_sticky != -1 && !!(inode->i_mode & S_ISVTX) != dir_sticky) {
> > > > +		/*
> > > > +		 * Set directory sticky bit deduced from the ACL attr.
> > > > +		 * Do not clear sticky bit if it was explicitly set in MODE attr
> > > > +		 * but was not deduced from ACL attr because clients can send
> > > > +		 * both MODE and ACL attrs where sticky bit is only in MODE attr.
> > > > +		 */
> > > > +		if (!(attrs.na_iattr->ia_valid & ATTR_MODE))
> > > > +			attrs.na_iattr->ia_mode = inode->i_mode;
> > > > +		if (dir_sticky)
> > > > +			attrs.na_iattr->ia_mode |= S_ISVTX;
> > > > +		else if (!(attrs.na_iattr->ia_valid & ATTR_MODE))
> > > > +			attrs.na_iattr->ia_mode &= ~S_ISVTX;
> > > > +		attrs.na_iattr->ia_valid |= ATTR_MODE;
> > > > +	}
> > > > +
> > > >  	save_no_wcc = cstate->current_fh.fh_no_wcc;
> > > >  	cstate->current_fh.fh_no_wcc = true;
> > > >  	status = nfsd_setattr(rqstp, &cstate->current_fh, &attrs, NULL);
> > > > -- 
> > > > 2.20.1
> > > > 
> > > 
> > > Hi Pali -
> > > 
> > > Apologies for the delayed response.
> > > 
> > > Being somewhat un-expert in things ACL, I'm not sure if this is the
> > > correct approach, or if it's right for the POSIX ACL-only
> > > implementation we have in Linux. I'm going to research this a bit
> > > and get back to you.
> > > 
> > > -- 
> > > Chuck Lever
> > 
> > Hello Chuck, thank you reply.
> > 
> > Just to note that this does not affect POSIX ACL-only storage on Linux
> > server. Everything is same as before (it is POSIX ACL-only and
> > everything is evaluated via POSIX ACLs).
> > 
> > This change is just what Linux NFS4 server provides to NFS4 clients,
> > i.e. not for POSIX clients. As Linux NFS4 server already maps POSIX-ACL
> > into NFS4-ACL format, I think that it makes sense to also map
> > POSIX-sticky bit into NFS4-ACL format. It allows better interop for NFS4
> > clients which use NFS4 ACLs.
> > 
> > What is this change trying to achieve is to allow Linux server to serve
> > POSIX things (like sticky bit) for NFS4-ACL clients which are not POSIX
> > aware, and serve this sticky bit in NFS4-ACL language. And same for
> > opposite direction, to translate NFS4 ACL rules which describe sticky
> > bit into the POSIX sticky bit in mode.
> 
> The fundamental claim from your patch description is that:
> 
> > > > the NFS4 ACL client is not
> > > > aware of the fact that it cannot delete some files if the
> > > > sticky bit is set on the server on parent directory.
> 
> POSIX-based clients are in fact aware of this additional constraint
> because they can see the set of mode bits returned by GETATTR.
> 
> So can non-POSIX clients for that matter; although they might not
> natively understand what that bit means, their NFS client can impart
> that meaning.

Of course POSIX client is. But NFS4 ACL client is not (NFS4 ACL are not
POSIX/compatible).

> I can find no spec mandate or guidance that requires this mapping,
> nor can I find any other NFS server implementations that add it.
> If this is indeed a valuable innovation, a standard that recommends
> or requires implementation of this feature would be the place to
> begin.
> 
> What RFC 8881 does say is on point:
> 
> > 6.3.1.1. Server Considerations
> 
> > The server uses the algorithm described in Section 6.2.1 to
> > determine whether an ACL allows access to an object. However, the
> > ACL might not be the sole determiner of access.
> 
> A list of examples follows. The spirit of this text seems to be that
> a file object's ACL need not reflect every possible security policy
> that a server might use to determine whether an operation may
> proceed.
> 
> -- 
> Chuck Lever

Yes, it is not mentioned neither as mandatory or recommended. And as you
wrote server does not have to reflect.

But it helps NFS4 ACL based clients (i.e. non-POSIX), for example for
Windows NFS4 client. Also I mentioned that it helps to preserve
permissions then copying directory to some other filesystem, e.g. NTFS.
Pali Rohár Oct. 5, 2024, 3:08 p.m. UTC | #5
Hello Chuck, have you done more research on this as mentioned?

I think that this is really useful for non-POSIX clients as NFS4 ACLs
are not-POSIX; knfsd is already translating POSIX ACLs to non-POSIX
NFS4 ACLs, and this is just an improvement to covert also the
POSIX-sticky-bit in non-POSIX NFS4 ACL.

Also another improvement is that this change allows to modify all parts
of POSIX access mode (sticky bit, base mode permissions r/w/x and POSIX
ACL) via NFS4 ACL structure. So non-POSIX NFS4 client would be able to
add or remove directory sticky bit via NFS4 ACL editor.

Of course, nothing from this is required by RFC8881 specification, but
specification also does not disallow this for NFS4 servers. It is
improvement for non-POSIX clients. POSIX clients would of course not use
it.
Chuck Lever Oct. 5, 2024, 3:40 p.m. UTC | #6
> On Oct 5, 2024, at 11:08 AM, Pali Rohár <pali@kernel.org> wrote:
> 
> Hello Chuck, have you done more research on this as mentioned?

My position has not changed from the last email I sent:

> The fundamental claim from your patch description is that:
> 
>>>> the NFS4 ACL client is not
>>>> aware of the fact that it cannot delete some files if the
>>>> sticky bit is set on the server on parent directory.
> 
> POSIX-based clients are in fact aware of this additional constraint
> because they can see the set of mode bits returned by GETATTR.
> 
> So can non-POSIX clients for that matter; although they might not
> natively understand what that bit means, their NFS client can impart
> that meaning.
> 
> I can find no spec mandate or guidance that requires this mapping,
> nor can I find any other NFS server implementations that add it.
> If this is indeed a valuable innovation, a standard that recommends
> or requires implementation of this feature would be the place to
> begin.
> 
> What RFC 8881 does say is on point:
> 
>> 6.3.1.1. Server Considerations
> 
>> The server uses the algorithm described in Section 6.2.1 to
>> determine whether an ACL allows access to an object. However, the
>> ACL might not be the sole determiner of access.
> 
> A list of examples follows. The spirit of this text seems to be that
> a file object's ACL need not reflect every possible security policy
> that a server might use to determine whether an operation may
> proceed.

Which is to say that I need you to approach the nfsv4 WG with
this proposal first before it can be considered for NFSD.

After all, if this is a good thing for NFS servers to do, why
wouldn't you want to get other NFS server vendors to implement
this as well?

Meanwhile you are free to carry this patch in your own fork
so that others can experiment.


> I think that this is really useful for non-POSIX clients as NFS4 ACLs
> are not-POSIX; knfsd is already translating POSIX ACLs to non-POSIX
> NFS4 ACLs, and this is just an improvement to covert also the
> POSIX-sticky-bit in non-POSIX NFS4 ACL.
> 
> Also another improvement is that this change allows to modify all parts
> of POSIX access mode (sticky bit, base mode permissions r/w/x and POSIX
> ACL) via NFS4 ACL structure. So non-POSIX NFS4 client would be able to
> add or remove directory sticky bit via NFS4 ACL editor.

A real-world use case would be helpful for making the case
that this is something we want NFS servers to do. Currently
this is a "wouldn't it be nice if..." and I don't hear any
users saying "I can use this feature today if it existed".

But as I said above: non-POSIX clients can retrieve file
attributes and file ACLs in the same COMPOUND. If the client
sees the sticky bit, it can manufacture the extra ACEs itself
before presenting the ACL to local applications. There is
really no technical need for NFS servers to do this that I
can see.

TL;DR:

1. The ACEs can be added by clients themselves (and really
that is the preferred approach since the vast majority of
NFS client implementations don't need this behavior).

2. There has been no architectural review of the proposal.

3. There is no user demand for it that I am aware of.


--
Chuck Lever
Cedric Blancher Oct. 6, 2024, 6:19 a.m. UTC | #7
On Sat, 5 Oct 2024 at 17:08, Pali Rohár <pali@kernel.org> wrote:
>
> Hello Chuck, have you done more research on this as mentioned?
>
> I think that this is really useful for non-POSIX clients as NFS4 ACLs
> are not-POSIX; knfsd is already translating POSIX ACLs to non-POSIX
> NFS4 ACLs, and this is just an improvement to covert also the
> POSIX-sticky-bit in non-POSIX NFS4 ACL.
>
> Also another improvement is that this change allows to modify all parts
> of POSIX access mode (sticky bit, base mode permissions r/w/x and POSIX
> ACL) via NFS4 ACL structure. So non-POSIX NFS4 client would be able to
> add or remove directory sticky bit via NFS4 ACL editor.
>
> Of course, nothing from this is required by RFC8881 specification, but
> specification also does not disallow this for NFS4 servers. It is
> improvement for non-POSIX clients. POSIX clients would of course not use
> it.

Have you tested this change against the Windows ms-nfs41-client
(https://cygwin.com/pipermail/cygwin/2024-September/256473.html) and
OpenText NFSv4 clients? They do use NFSv4 ACLs extensively, and might
break if you abuse NFSv4 ACLs

Ced

--
Cedric Blancher <cedric.blancher@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur
diff mbox series

Patch

diff --git a/fs/nfsd/acl.h b/fs/nfsd/acl.h
index 4b7324458a94..e7e7909bf03a 100644
--- a/fs/nfsd/acl.h
+++ b/fs/nfsd/acl.h
@@ -48,6 +48,6 @@  __be32 nfs4_acl_write_who(struct xdr_stream *xdr, int who);
 int nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
 		struct nfs4_acl **acl);
 __be32 nfsd4_acl_to_attr(enum nfs_ftype4 type, struct nfs4_acl *acl,
-			 struct nfsd_attrs *attr);
+			 struct nfsd_attrs *attr, int *dir_sticky);
 
 #endif /* LINUX_NFS4_ACL_H */
diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c
index 96e786b5e544..6a53772c2a13 100644
--- a/fs/nfsd/nfs4acl.c
+++ b/fs/nfsd/nfs4acl.c
@@ -46,6 +46,7 @@ 
 #define NFS4_ACL_TYPE_DEFAULT	0x01
 #define NFS4_ACL_DIR		0x02
 #define NFS4_ACL_OWNER		0x04
+#define NFS4_ACL_DIR_STICKY	0x08
 
 /* mode bit translations: */
 #define NFS4_READ_MODE (NFS4_ACE_READ_DATA)
@@ -73,7 +74,7 @@  mask_from_posix(unsigned short perm, unsigned int flags)
 		mask |= NFS4_READ_MODE;
 	if (perm & ACL_WRITE)
 		mask |= NFS4_WRITE_MODE;
-	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR))
+	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR) && !(flags & NFS4_ACL_DIR_STICKY))
 		mask |= NFS4_ACE_DELETE_CHILD;
 	if (perm & ACL_EXECUTE)
 		mask |= NFS4_EXECUTE_MODE;
@@ -89,7 +90,7 @@  deny_mask_from_posix(unsigned short perm, u32 flags)
 		mask |= NFS4_READ_MODE;
 	if (perm & ACL_WRITE)
 		mask |= NFS4_WRITE_MODE;
-	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR))
+	if ((perm & ACL_WRITE) && (flags & NFS4_ACL_DIR) && !(flags & NFS4_ACL_DIR_STICKY))
 		mask |= NFS4_ACE_DELETE_CHILD;
 	if (perm & ACL_EXECUTE)
 		mask |= NFS4_EXECUTE_MODE;
@@ -110,7 +111,7 @@  low_mode_from_nfs4(u32 perm, unsigned short *mode, unsigned int flags)
 {
 	u32 write_mode = NFS4_WRITE_MODE;
 
-	if (flags & NFS4_ACL_DIR)
+	if ((flags & NFS4_ACL_DIR) && !(flags | NFS4_ACL_DIR_STICKY))
 		write_mode |= NFS4_ACE_DELETE_CHILD;
 	*mode = 0;
 	if ((perm & NFS4_READ_MODE) == NFS4_READ_MODE)
@@ -123,7 +124,7 @@  low_mode_from_nfs4(u32 perm, unsigned short *mode, unsigned int flags)
 
 static short ace2type(struct nfs4_ace *);
 static void _posix_to_nfsv4_one(struct posix_acl *, struct nfs4_acl *,
-				unsigned int);
+				unsigned int, kuid_t);
 
 int
 nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
@@ -142,8 +143,11 @@  nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
 	if (IS_ERR(pacl))
 		return PTR_ERR(pacl);
 
-	/* allocate for worst case: one (deny, allow) pair each: */
-	size += 2 * pacl->a_count;
+	/*
+	 * allocate for worst case: one (deny, allow) pair for each
+	 * plus for restricted deletion flag (sticky bit): 4 + one for each
+	 */
+	size += 2 * pacl->a_count + (4 + pacl->a_count);
 
 	if (S_ISDIR(inode->i_mode)) {
 		flags = NFS4_ACL_DIR;
@@ -155,6 +159,10 @@  nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
 
 		if (dpacl)
 			size += 2 * dpacl->a_count;
+
+		/* propagate restricted deletion flag (sticky bit) */
+		if (inode->i_mode & S_ISVTX)
+			flags |= NFS4_ACL_DIR_STICKY;
 	}
 
 	*acl = kmalloc(nfs4_acl_bytes(size), GFP_KERNEL);
@@ -164,10 +172,10 @@  nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
 	}
 	(*acl)->naces = 0;
 
-	_posix_to_nfsv4_one(pacl, *acl, flags & ~NFS4_ACL_TYPE_DEFAULT);
+	_posix_to_nfsv4_one(pacl, *acl, flags & ~NFS4_ACL_TYPE_DEFAULT, inode->i_uid);
 
 	if (dpacl)
-		_posix_to_nfsv4_one(dpacl, *acl, flags | NFS4_ACL_TYPE_DEFAULT);
+		_posix_to_nfsv4_one(dpacl, *acl, (flags | NFS4_ACL_TYPE_DEFAULT) & ~NFS4_ACL_DIR_STICKY, inode->i_uid);
 
 out:
 	posix_acl_release(dpacl);
@@ -231,7 +239,7 @@  summarize_posix_acl(struct posix_acl *acl, struct posix_acl_summary *pas)
 /* We assume the acl has been verified with posix_acl_valid. */
 static void
 _posix_to_nfsv4_one(struct posix_acl *pacl, struct nfs4_acl *acl,
-						unsigned int flags)
+		    unsigned int flags, kuid_t owner_uid)
 {
 	struct posix_acl_entry *pa, *group_owner_entry;
 	struct nfs4_ace *ace;
@@ -243,9 +251,149 @@  _posix_to_nfsv4_one(struct posix_acl *pacl, struct nfs4_acl *acl,
 	BUG_ON(pacl->a_count < 3);
 	summarize_posix_acl(pacl, &pas);
 
-	pa = pacl->a_entries;
 	ace = acl->aces + acl->naces;
 
+	/*
+	 * Translate restricted deletion flag (sticky bit, SVTX) set on the
+	 * directory to following NFS4 ACEs prepended before any other ACEs:
+	 *
+	 *   ALLOW OWNER@ DELETE_CHILD                                 (1)
+	 *   DENY EVERYONE@ DELETE_CHILD
+	 *   INHERIT_ONLY NO_PROPAGATE DENY user_owner_of_dir DELETE   (3)
+	 *   INHERIT_ONLY NO_PROPAGATE DENY OWNER@ DELETE              (4)
+	 *   INHERIT_ONLY NO_PROPAGATE DENY user1 DELETE               (ACL user1)
+	 *   ...
+	 *   INHERIT_ONLY NO_PROPAGATE DENY userN DELETE               (ACL userN)
+	 *   INHERIT_ONLY NO_PROPAGATE ALLOW OWNER@ DELETE
+	 *
+	 *   (1) - only if user-owner has write access
+	 *   (3) - only if user-owner does not have write access
+	 *   (4) - only if non-user-owner does not have write access
+	 *   (ACL user1) - only if user1 does not have write access
+	 *   (ACL userN) - only if userN does not have write access
+	 *
+	 * These ACEs describe behavior of set restricted deletion flag (sticky
+	 * bit) on directory as they allow only owner of individual child entries
+	 * and owner of the directory to delete individual child entries.
+	 * Everyone else is denied to remove child entries in this directory.
+	 *
+	 * For deleting entry in NFS4 ACL model it is needed either DELETE_CHILD
+	 * permission (access mask) from the parent directory or DELETE permission
+	 * (access mask) on the child. Just one of these two permissions is enough.
+	 * So if there is explicit DENY DELETE_CHILD on the parent together with
+	 * explicit ALLOW DELETE on the child, it means that deleting is allowed
+	 * (evaluation of permissions is independent in NFS4 ACL model). OWNER@
+	 * for inherited ACEs means owner of the child entry and not the owner
+	 * of the parent from which was ACE inherited.
+	 *
+	 * This translation is imperfect just for a case when some group
+	 * (including group-owner and others-group) does not have write access
+	 * to directory. Handled is only the edge case (via rule 4) when
+	 * everyone else except owner has no write access to the directory.
+	 * This information is not present in NFS4 ACL because it looks like that
+	 * this is not possible to express in this ACL model. So for ACL consumer
+	 * it could look like that owner of the file can delete its own file even
+	 * when group or other mode bits of the directory do not allow it.
+	 */
+	if (flags & NFS4_ACL_DIR_STICKY) {
+		/*
+		 * Explicitly allow directory owner to delete child entries
+		 * if directory owner has write access
+		 */
+		if (pas.owner & ACL_WRITE) {
+			ace->type = NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE;
+			ace->flag = 0;
+			ace->access_mask = NFS4_ACE_DELETE_CHILD;
+			ace->whotype = NFS4_ACL_WHO_OWNER;
+			ace++;
+			acl->naces++;
+		}
+
+		/*
+		 * And then deny deleting child entries for all other users
+		 * which do not have explicit DELETE permission granted by
+		 * last rule in this section.
+		 */
+		ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
+		ace->flag = 0;
+		ace->access_mask = NFS4_ACE_DELETE_CHILD;
+		ace->whotype = NFS4_ACL_WHO_EVERYONE;
+		ace++;
+		acl->naces++;
+
+		/*
+		 * Do not grant directory owner to delete child entries (by the
+		 * last rule in this section) if it does not have write access.
+		 */
+		if (!(pas.owner & ACL_WRITE)) {
+			ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
+			ace->flag = NFS4_INHERITANCE_FLAGS |
+				    NFS4_ACE_INHERIT_ONLY_ACE |
+				    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
+			ace->access_mask = NFS4_ACE_DELETE;
+			ace->whotype = NFS4_ACL_WHO_NAMED;
+			ace->who_uid = owner_uid;
+			ace++;
+			acl->naces++;
+		}
+
+		if (!(pas.users & ACL_WRITE) && !(pas.group & ACL_WRITE) &&
+		    !(pas.groups & ACL_WRITE) && !(pas.other & ACL_WRITE)) {
+			/*
+			 * Do not grant child owner who is not directory owner
+			 * (handled by the first rule in this section) to
+			 * delete own child entries if there is no possible
+			 * directory write permission (checked for named users,
+			 * group-owner, named groups and others-groups).
+			 * This handles special edge case when only directory
+			 * owner has write access to directory.
+			 */
+			ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
+			ace->flag = NFS4_INHERITANCE_FLAGS |
+				    NFS4_ACE_INHERIT_ONLY_ACE |
+				    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
+			ace->access_mask = NFS4_ACE_DELETE;
+			ace->whotype = NFS4_ACL_WHO_OWNER;
+			ace++;
+			acl->naces++;
+		} else {
+			/*
+			 * Do not grant individual named users to delete child
+			 * entries (by the last rule in this section) if user
+			 * does not have write access to directory.
+			 */
+			for (pa = pacl->a_entries + 1; pa->e_tag == ACL_USER; pa++) {
+				if (pa->e_perm & pas.mask & ACL_WRITE)
+					continue;
+				ace->type = NFS4_ACE_ACCESS_DENIED_ACE_TYPE;
+				ace->flag = NFS4_INHERITANCE_FLAGS |
+					    NFS4_ACE_INHERIT_ONLY_ACE |
+					    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
+				ace->access_mask = NFS4_ACE_DELETE;
+				ace->whotype = NFS4_ACL_WHO_NAMED;
+				ace->who_uid = pa->e_uid;
+				ace++;
+				acl->naces++;
+			}
+		}
+
+		/*
+		 * Above rules filtered out users which do not have write access
+		 * to the directory. Now allow child-owner to delete its own
+		 * directory entries.
+		 */
+		ace->type = NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE;
+		ace->flag = NFS4_INHERITANCE_FLAGS |
+			    NFS4_ACE_INHERIT_ONLY_ACE |
+			    NFS4_ACE_NO_PROPAGATE_INHERIT_ACE;
+		ace->access_mask = NFS4_ACE_DELETE;
+		ace->whotype = NFS4_ACL_WHO_OWNER;
+		ace++;
+		acl->naces++;
+	}
+
+	pa = pacl->a_entries;
+
 	/* We could deny everything not granted by the owner: */
 	deny = ~pas.owner;
 	/*
@@ -517,7 +665,8 @@  posix_state_to_acl(struct posix_acl_state *state, unsigned int flags)
 
 	pace = pacl->a_entries;
 	pace->e_tag = ACL_USER_OBJ;
-	low_mode_from_nfs4(state->owner.allow, &pace->e_perm, flags);
+	/* owner is never affected by restricted deletion flag, so clear NFS4_ACL_DIR_STICKY */
+	low_mode_from_nfs4(state->owner.allow, &pace->e_perm, flags & ~NFS4_ACL_DIR_STICKY);
 
 	for (i=0; i < state->users->n; i++) {
 		pace++;
@@ -691,9 +840,14 @@  static void process_one_v4_ace(struct posix_acl_state *state,
 
 static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
 		struct posix_acl **pacl, struct posix_acl **dpacl,
+		int *dir_sticky,
 		unsigned int flags)
 {
 	struct posix_acl_state effective_acl_state, default_acl_state;
+	bool dir_allow_nonowner_delete_child = false;
+	bool dir_deny_everyone_delete_child = false;
+	bool dir_allow_child_owner_delete = false;
+	unsigned int eflags = 0;
 	struct nfs4_ace *ace;
 	int ret;
 
@@ -705,6 +859,28 @@  static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
 		goto out_estate;
 	ret = -EINVAL;
 	for (ace = acl->aces; ace < acl->aces + acl->naces; ace++) {
+		/*
+		 * Process and parse ACE entry INHERIT_ONLY NO_PROPAGATE DELETE
+		 * for detecting restricted deletion flag (sticky bit). Allow
+		 * SYNCHRONIZE access mask to be present as NFS4 clients can
+		 * include this access mask together with any other one.
+		 *
+		 * It needs to be done before validation as NFS4_SUPPORTED_FLAGS
+		 * does not contain NFS4_ACE_NO_PROPAGATE_INHERIT_ACE and this
+		 * ACE must not throw error.
+		 */
+		if ((flags & NFS4_ACL_DIR) &&
+		    !(ace->flag & ~(NFS4_SUPPORTED_FLAGS|NFS4_ACE_NO_PROPAGATE_INHERIT_ACE)) &&
+		    (ace->flag & NFS4_INHERITANCE_FLAGS) &&
+		    (ace->flag & NFS4_ACE_INHERIT_ONLY_ACE) &&
+		    (ace->flag & NFS4_ACE_NO_PROPAGATE_INHERIT_ACE) &&
+		    (ace->access_mask & ~NFS4_ACE_SYNCHRONIZE) == NFS4_ACE_DELETE) {
+			if (ace->type == NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE &&
+			    ace->whotype == NFS4_ACL_WHO_OWNER)
+				dir_allow_child_owner_delete = true;
+			continue;
+		}
+
 		if (ace->type != NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE &&
 		    ace->type != NFS4_ACE_ACCESS_DENIED_ACE_TYPE)
 			goto out_dstate;
@@ -725,6 +901,38 @@  static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
 
 		if (!(ace->flag & NFS4_ACE_INHERIT_ONLY_ACE))
 			process_one_v4_ace(&effective_acl_state, ace);
+
+		/*
+		 * Process and parse ACE entry with DELETE_CHILD access mask
+		 * for detecting restricted deletion flag (sticky bit).
+		 */
+		if ((flags & NFS4_ACL_DIR) &&
+		    !(ace->flag & NFS4_ACE_INHERIT_ONLY_ACE) &&
+		    (ace->access_mask & NFS4_ACE_DELETE_CHILD)) {
+			if (ace->type == NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE &&
+			    !dir_deny_everyone_delete_child &&
+			    ace->whotype != NFS4_ACL_WHO_OWNER)
+				dir_allow_nonowner_delete_child = true;
+			else if (ace->type == NFS4_ACE_ACCESS_DENIED_ACE_TYPE &&
+				 ace->whotype == NFS4_ACL_WHO_EVERYONE)
+				dir_deny_everyone_delete_child = true;
+		}
+	}
+
+	/*
+	 * Recognize restricted deletion flag (sticky bit) from directory ACL
+	 * if ACEs on directory allow only owner of directory child entry to
+	 * delete entry itself.
+	 *
+	 * This is relaxed check for rules generated by _posix_to_nfsv4_one().
+	 * Relaxed check of restricted deletion flag is for security reasons
+	 * and means that permissions would be more stricter, to prevent
+	 * granting more access than what was specified in NFS4 ACL packet.
+	 */
+	if (flags & NFS4_ACL_DIR) {
+		*dir_sticky = !dir_allow_nonowner_delete_child && dir_allow_child_owner_delete;
+		if (*dir_sticky)
+			eflags |= NFS4_ACL_DIR_STICKY;
 	}
 
 	/*
@@ -750,7 +958,7 @@  static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
 			default_acl_state.other = effective_acl_state.other;
 	}
 
-	*pacl = posix_state_to_acl(&effective_acl_state, flags);
+	*pacl = posix_state_to_acl(&effective_acl_state, flags | eflags);
 	if (IS_ERR(*pacl)) {
 		ret = PTR_ERR(*pacl);
 		*pacl = NULL;
@@ -776,19 +984,23 @@  static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl,
 }
 
 __be32 nfsd4_acl_to_attr(enum nfs_ftype4 type, struct nfs4_acl *acl,
-			 struct nfsd_attrs *attr)
+			 struct nfsd_attrs *attr, int *dir_sticky)
 {
 	int host_error;
 	unsigned int flags = 0;
 
-	if (!acl)
+	if (!acl) {
+		if (type == NF4DIR)
+			*dir_sticky = -1;
 		return nfs_ok;
+	}
 
 	if (type == NF4DIR)
 		flags = NFS4_ACL_DIR;
 
 	host_error = nfs4_acl_nfsv4_to_posix(acl, &attr->na_pacl,
-					     &attr->na_dpacl, flags);
+					     &attr->na_dpacl, dir_sticky,
+					     flags);
 	if (host_error == -EINVAL)
 		return nfserr_attrnotsupp;
 	else
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 0f67f4a7b8b2..56aeb745d108 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -259,7 +259,7 @@  nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		return nfserrno(host_err);
 
 	if (is_create_with_attrs(open))
-		nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs);
+		nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs, NULL);
 
 	inode_lock_nested(inode, I_MUTEX_PARENT);
 
@@ -791,6 +791,7 @@  nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	};
 	struct svc_fh resfh;
 	__be32 status;
+	int dir_sticky;
 	dev_t rdev;
 
 	fh_init(&resfh, NFS4_FHSIZE);
@@ -804,7 +805,7 @@  nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	if (status)
 		return status;
 
-	status = nfsd4_acl_to_attr(create->cr_type, create->cr_acl, &attrs);
+	status = nfsd4_acl_to_attr(create->cr_type, create->cr_acl, &attrs, &dir_sticky);
 	current->fs->umask = create->cr_umask;
 	switch (create->cr_type) {
 	case NF4LNK:
@@ -848,6 +849,11 @@  nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 		break;
 
 	case NF4DIR:
+		if (dir_sticky == 1) {
+			/* Set directory sticky bit deduced from the ACL attr. */
+			create->cr_iattr.ia_valid |= ATTR_MODE;
+			create->cr_iattr.ia_mode |= S_ISVTX;
+		}
 		create->cr_iattr.ia_valid &= ~ATTR_SIZE;
 		status = nfsd_create(rqstp, &cstate->current_fh,
 				     create->cr_name, create->cr_namelen,
@@ -1144,6 +1150,7 @@  nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	struct inode *inode;
 	__be32 status = nfs_ok;
 	bool save_no_wcc;
+	int dir_sticky;
 	int err;
 
 	if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
@@ -1165,10 +1172,26 @@  nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 
 	inode = cstate->current_fh.fh_dentry->d_inode;
 	status = nfsd4_acl_to_attr(S_ISDIR(inode->i_mode) ? NF4DIR : NF4REG,
-				   setattr->sa_acl, &attrs);
-
+				   setattr->sa_acl, &attrs, &dir_sticky);
 	if (status)
 		goto out;
+
+	if (S_ISDIR(inode->i_mode) && dir_sticky != -1 && !!(inode->i_mode & S_ISVTX) != dir_sticky) {
+		/*
+		 * Set directory sticky bit deduced from the ACL attr.
+		 * Do not clear sticky bit if it was explicitly set in MODE attr
+		 * but was not deduced from ACL attr because clients can send
+		 * both MODE and ACL attrs where sticky bit is only in MODE attr.
+		 */
+		if (!(attrs.na_iattr->ia_valid & ATTR_MODE))
+			attrs.na_iattr->ia_mode = inode->i_mode;
+		if (dir_sticky)
+			attrs.na_iattr->ia_mode |= S_ISVTX;
+		else if (!(attrs.na_iattr->ia_valid & ATTR_MODE))
+			attrs.na_iattr->ia_mode &= ~S_ISVTX;
+		attrs.na_iattr->ia_valid |= ATTR_MODE;
+	}
+
 	save_no_wcc = cstate->current_fh.fh_no_wcc;
 	cstate->current_fh.fh_no_wcc = true;
 	status = nfsd_setattr(rqstp, &cstate->current_fh, &attrs, NULL);