diff mbox series

[v1,06/11] landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER

Message ID 20220221212522.320243-7-mic@digikod.net (mailing list archive)
State New, archived
Headers show
Series Landlock: file linking and renaming support | expand

Commit Message

Mickaël Salaün Feb. 21, 2022, 9:25 p.m. UTC
From: Mickaël Salaün <mic@linux.microsoft.com>

Add a new LANDLOCK_ACCESS_FS_REFER access right to enable policy writers
to allow sandboxed processes to link and rename files from and to a
specific set of file hierarchies.  This access right should be composed
with LANDLOCK_ACCESS_FS_MAKE_* for the destination of a link or rename,
and with LANDLOCK_ACCESS_FS_REMOVE_* for a source of a rename.  This
lift a Landlock limitation that always denied changing the parent of an
inode.

Renaming or linking to the same directory is still always allowed,
whatever LANDLOCK_ACCESS_FS_REFER is used or not, because it is not
considered a threat to user data.

However, creating multiple links or renaming to a different parent
directory may lead to privilege escalations if not handled properly.
Indeed, we must be sure that the source doesn't gain more privileges by
being accessible from the destination.  This is handled by making sure
that the source hierarchy (including the referenced file or directory
itself) restricts at least as much the destination hierarchy.  If it is
not the case, an EXDEV error is returned, making it potentially possible
for user space to copy the file hierarchy instead of moving or linking
it.

Instead of creating different access rights for the source and the
destination, we choose to make it simple and consistent for users.
Indeed, considering the previous constraint, it would be weird to
require such destination access right to be also granted to the source
(to make it a superset).

See the provided documentation for additional details.

New tests are provided with a following commit.

Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20220221212522.320243-7-mic@digikod.net
---
 include/uapi/linux/landlock.h                |  27 +-
 security/landlock/fs.c                       | 550 ++++++++++++++++---
 security/landlock/limits.h                   |   2 +-
 security/landlock/syscalls.c                 |   2 +-
 tools/testing/selftests/landlock/base_test.c |   2 +-
 tools/testing/selftests/landlock/fs_test.c   |   3 +-
 6 files changed, 516 insertions(+), 70 deletions(-)

Comments

kernel test robot Feb. 22, 2022, 3:16 a.m. UTC | #1
Hi "Mickaël,

I love your patch! Yet something to improve:

[auto build test ERROR on cfb92440ee71adcc2105b0890bb01ac3cddb8507]

url:    https://github.com/0day-ci/linux/commits/Micka-l-Sala-n/Landlock-file-linking-and-renaming-support/20220222-051842
base:   cfb92440ee71adcc2105b0890bb01ac3cddb8507
config: hexagon-randconfig-r002-20220221 (https://download.01.org/0day-ci/archive/20220222/202202221149.qLO9DEqo-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d271fc04d5b97b12e6b797c6067d3c96a8d7470e)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/c68b879f54d6262963d435a18cedbc238b7faeaf
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Micka-l-Sala-n/Landlock-file-linking-and-renaming-support/20220222-051842
        git checkout c68b879f54d6262963d435a18cedbc238b7faeaf
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> security/landlock/fs.c:463:2: error: call to __compiletime_assert_228 declared with 'error' attribute: BUILD_BUG_ON failed: !layer_masks_dst_parent
           BUILD_BUG_ON(!layer_masks_dst_parent);
           ^
   include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON'
           BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
           ^
   include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
   #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                       ^
   include/linux/compiler_types.h:346:2: note: expanded from macro 'compiletime_assert'
           _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
           ^
   include/linux/compiler_types.h:334:2: note: expanded from macro '_compiletime_assert'
           __compiletime_assert(condition, msg, prefix, suffix)
           ^
   include/linux/compiler_types.h:327:4: note: expanded from macro '__compiletime_assert'
                           prefix ## suffix();                             \
                           ^
   <scratch space>:170:1: note: expanded from here
   __compiletime_assert_228
   ^
>> security/landlock/fs.c:670:2: error: call to __compiletime_assert_229 declared with 'error' attribute: BUILD_BUG_ON failed: !layer_masks_dom
           BUILD_BUG_ON(!layer_masks_dom);
           ^
   include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON'
           BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
           ^
   include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
   #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                       ^
   include/linux/compiler_types.h:346:2: note: expanded from macro 'compiletime_assert'
           _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
           ^
   include/linux/compiler_types.h:334:2: note: expanded from macro '_compiletime_assert'
           __compiletime_assert(condition, msg, prefix, suffix)
           ^
   include/linux/compiler_types.h:327:4: note: expanded from macro '__compiletime_assert'
                           prefix ## suffix();                             \
                           ^
   <scratch space>:174:1: note: expanded from here
   __compiletime_assert_229
   ^
   2 errors generated.


vim +/error +463 security/landlock/fs.c

   401	
   402	/**
   403	 * check_access_path_dual - Check a source and a destination accesses
   404	 *
   405	 * @domain: Domain to check against.
   406	 * @path: File hierarchy to walk through.
   407	 * @child_is_directory: Must be set to true if the (original) leaf is a
   408	 *     directory, false otherwise.
   409	 * @access_request_dst_parent: Accesses to check, once @layer_masks_dst_parent
   410	 *     is equal to @layer_masks_src_parent (if any).
   411	 * @layer_masks_dst_parent: Pointer to a matrix of layer masks per access
   412	 *     masks, identifying the layers that forbid a specific access.  Bits from
   413	 *     this matrix can be unset according to the @path walk.  An empty matrix
   414	 *     means that @domain allows all possible Landlock accesses (i.e. not only
   415	 *     those identified by @access_request_dst_parent).  This matrix can
   416	 *     initially refer to domain layer masks and, when the accesses for the
   417	 *     destination and source are the same, to request layer masks.
   418	 * @access_request_src_parent: Similar to @access_request_dst_parent but for an
   419	 *     initial source path request.  Only taken into account if
   420	 *     @layer_masks_src_parent is not NULL.
   421	 * @layer_masks_src_parent: Similar to @layer_masks_dst_parent but for an
   422	 *     initial source path walk.  This can be NULL if only dealing with a
   423	 *     destination access request (i.e. not a rename nor a link action).
   424	 * @layer_masks_child: Similar to @layer_masks_src_parent but only for the
   425	 *     linked or renamed inode (without hierarchy).  This is only used if
   426	 *     @layer_masks_src_parent is not NULL.
   427	 *
   428	 * This helper first checks that the destination has a superset of restrictions
   429	 * compared to the source (if any) for a common path.  It then checks that the
   430	 * collected accesses and the remaining ones are enough to allow the request.
   431	 *
   432	 * Returns:
   433	 * - 0 if the access request is granted;
   434	 * - -EACCES if it is denied because of access right other than
   435	 *   LANDLOCK_ACCESS_FS_REFER;
   436	 * - -EXDEV if the renaming or linking would be a privileged escalation
   437	 *   (according to each layered policies), or if LANDLOCK_ACCESS_FS_REFER is
   438	 *   not allowed by the source or the destination.
   439	 */
   440	static int check_access_path_dual(const struct landlock_ruleset *const domain,
   441			const struct path *const path,
   442			bool child_is_directory,
   443			const access_mask_t access_request_dst_parent,
   444			layer_mask_t (*const
   445				layer_masks_dst_parent)[LANDLOCK_NUM_ACCESS_FS],
   446			const access_mask_t access_request_src_parent,
   447			layer_mask_t (*layer_masks_src_parent)[LANDLOCK_NUM_ACCESS_FS],
   448			layer_mask_t (*layer_masks_child)[LANDLOCK_NUM_ACCESS_FS])
   449	{
   450		bool allowed_dst_parent = false, allowed_src_parent = false, is_dom_check;
   451		struct path walker_path;
   452		access_mask_t access_masked_dst_parent, access_masked_src_parent;
   453	
   454		if (!access_request_dst_parent && !access_request_src_parent)
   455			return 0;
   456		if (WARN_ON_ONCE(!domain || !path))
   457			return 0;
   458		if (is_nouser_or_private(path->dentry))
   459			return 0;
   460		if (WARN_ON_ONCE(domain->num_layers < 1))
   461			return -EACCES;
   462	
 > 463		BUILD_BUG_ON(!layer_masks_dst_parent);
   464		if (layer_masks_src_parent) {
   465			if (WARN_ON_ONCE(!layer_masks_child))
   466				return -EACCES;
   467			access_masked_dst_parent = access_masked_src_parent =
   468				get_handled_accesses(domain);
   469			is_dom_check = true;
   470		} else {
   471			if (WARN_ON_ONCE(layer_masks_child))
   472				return -EACCES;
   473			access_masked_dst_parent = access_request_dst_parent;
   474			access_masked_src_parent = access_request_src_parent;
   475			is_dom_check = false;
   476		}
   477	
   478		walker_path = *path;
   479		path_get(&walker_path);
   480		/*
   481		 * We need to walk through all the hierarchy to not miss any relevant
   482		 * restriction.
   483		 */
   484		while (true) {
   485			struct dentry *parent_dentry;
   486			const struct landlock_rule *rule;
   487	
   488			/*
   489			 * If at least all accesses allowed on the destination are
   490			 * already allowed on the source, respectively if there is at
   491			 * least as much as restrictions on the destination than on the
   492			 * source, then we can safely refer files from the source to
   493			 * the destination without risking a privilege escalation.
   494			 * This is crucial for standalone multilayered security
   495			 * policies.  Furthermore, this helps avoid policy writers to
   496			 * shoot themselves in the foot.
   497			 */
   498			if (is_dom_check && is_superset(child_is_directory,
   499						layer_masks_dst_parent,
   500						layer_masks_src_parent,
   501						layer_masks_child)) {
   502				allowed_dst_parent =
   503					scope_to_request(access_request_dst_parent,
   504							layer_masks_dst_parent);
   505				allowed_src_parent =
   506					scope_to_request(access_request_src_parent,
   507							layer_masks_src_parent);
   508	
   509				/* Stops when all accesses are granted. */
   510				if (allowed_dst_parent && allowed_src_parent)
   511					break;
   512	
   513				/*
   514				 * Downgrades checks from domain handled accesses to
   515				 * requested accesses.
   516				 */
   517				is_dom_check = false;
   518				access_masked_dst_parent = access_request_dst_parent;
   519				access_masked_src_parent = access_request_src_parent;
   520			}
   521	
   522			rule = find_rule(domain, walker_path.dentry);
   523			allowed_dst_parent = unmask_layers(rule, access_masked_dst_parent,
   524					layer_masks_dst_parent);
   525			allowed_src_parent = unmask_layers(rule, access_masked_src_parent,
   526					layer_masks_src_parent);
   527	
   528			/* Stops when a rule from each layer grants access. */
   529			if (allowed_dst_parent && allowed_src_parent)
   530				break;
   531	
   532	jump_up:
   533			if (walker_path.dentry == walker_path.mnt->mnt_root) {
   534				if (follow_up(&walker_path)) {
   535					/* Ignores hidden mount points. */
   536					goto jump_up;
   537				} else {
   538					/*
   539					 * Stops at the real root.  Denies access
   540					 * because not all layers have granted access.
   541					 */
   542					allowed_dst_parent = false;
   543					break;
   544				}
   545			}
   546			if (unlikely(IS_ROOT(walker_path.dentry))) {
   547				/*
   548				 * Stops at disconnected root directories.  Only allows
   549				 * access to internal filesystems (e.g. nsfs, which is
   550				 * reachable through /proc/<pid>/ns/<namespace>).
   551				 */
   552				allowed_dst_parent = !!(walker_path.mnt->mnt_flags &
   553						MNT_INTERNAL);
   554				break;
   555			}
   556			parent_dentry = dget_parent(walker_path.dentry);
   557			dput(walker_path.dentry);
   558			walker_path.dentry = parent_dentry;
   559		}
   560		path_put(&walker_path);
   561	
   562		if (allowed_dst_parent && allowed_src_parent)
   563			return 0;
   564	
   565		/*
   566		 * Unfortunately, we cannot prioritize EACCES over EXDEV for all
   567		 * RENAME_EXCHANGE cases because it depends on the source and
   568		 * destination order.  This could be changed with a new
   569		 * security_path_rename hook implementation.
   570		 */
   571		if (likely(is_eacces(layer_masks_dst_parent, access_request_dst_parent)
   572					|| is_eacces(layer_masks_src_parent,
   573						access_request_src_parent)))
   574			return -EACCES;
   575	
   576		/*
   577		 * Gracefully forbids reparenting if the destination directory
   578		 * hierarchy is not a superset of restrictions of the source directory
   579		 * hierarchy, or if LANDLOCK_ACCESS_FS_REFER is not allowed by the
   580		 * source or the destination.
   581		 */
   582		return -EXDEV;
   583	}
   584	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Mickaël Salaün Feb. 22, 2022, 10:18 a.m. UTC | #2
This error is because clang does not behave like GCC: 
check_access_path_dual() should be marked as __always_inline, or I 
should change from BUILD_BUG_ON() to WARN_ON_ONCE() if needed. I'll fix 
that in the next series.

On 22/02/2022 04:16, kernel test robot wrote:
> Hi "Mickaël,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on cfb92440ee71adcc2105b0890bb01ac3cddb8507]
> 
> url:    https://github.com/0day-ci/linux/commits/Micka-l-Sala-n/Landlock-file-linking-and-renaming-support/20220222-051842
> base:   cfb92440ee71adcc2105b0890bb01ac3cddb8507
> config: hexagon-randconfig-r002-20220221 (https://download.01.org/0day-ci/archive/20220222/202202221149.qLO9DEqo-lkp@intel.com/config)
> compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d271fc04d5b97b12e6b797c6067d3c96a8d7470e)
> reproduce (this is a W=1 build):
>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # https://github.com/0day-ci/linux/commit/c68b879f54d6262963d435a18cedbc238b7faeaf
>          git remote add linux-review https://github.com/0day-ci/linux
>          git fetch --no-tags linux-review Micka-l-Sala-n/Landlock-file-linking-and-renaming-support/20220222-051842
>          git checkout c68b879f54d6262963d435a18cedbc238b7faeaf
>          # save the config file to linux build tree
>          mkdir build_dir
>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
>>> security/landlock/fs.c:463:2: error: call to __compiletime_assert_228 declared with 'error' attribute: BUILD_BUG_ON failed: !layer_masks_dst_parent
>             BUILD_BUG_ON(!layer_masks_dst_parent);
>             ^
>     include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON'
>             BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
>             ^
>     include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
>     #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
>                                         ^
>     include/linux/compiler_types.h:346:2: note: expanded from macro 'compiletime_assert'
>             _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
>             ^
>     include/linux/compiler_types.h:334:2: note: expanded from macro '_compiletime_assert'
>             __compiletime_assert(condition, msg, prefix, suffix)
>             ^
>     include/linux/compiler_types.h:327:4: note: expanded from macro '__compiletime_assert'
>                             prefix ## suffix();                             \
>                             ^
>     <scratch space>:170:1: note: expanded from here
>     __compiletime_assert_228
>     ^
>>> security/landlock/fs.c:670:2: error: call to __compiletime_assert_229 declared with 'error' attribute: BUILD_BUG_ON failed: !layer_masks_dom
>             BUILD_BUG_ON(!layer_masks_dom);
>             ^
>     include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON'
>             BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
>             ^
>     include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
>     #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
>                                         ^
>     include/linux/compiler_types.h:346:2: note: expanded from macro 'compiletime_assert'
>             _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
>             ^
>     include/linux/compiler_types.h:334:2: note: expanded from macro '_compiletime_assert'
>             __compiletime_assert(condition, msg, prefix, suffix)
>             ^
>     include/linux/compiler_types.h:327:4: note: expanded from macro '__compiletime_assert'
>                             prefix ## suffix();                             \
>                             ^
>     <scratch space>:174:1: note: expanded from here
>     __compiletime_assert_229
>     ^
>     2 errors generated.
> 
> 
> vim +/error +463 security/landlock/fs.c
> 
>     401	
>     402	/**
>     403	 * check_access_path_dual - Check a source and a destination accesses
>     404	 *
>     405	 * @domain: Domain to check against.
>     406	 * @path: File hierarchy to walk through.
>     407	 * @child_is_directory: Must be set to true if the (original) leaf is a
>     408	 *     directory, false otherwise.
>     409	 * @access_request_dst_parent: Accesses to check, once @layer_masks_dst_parent
>     410	 *     is equal to @layer_masks_src_parent (if any).
>     411	 * @layer_masks_dst_parent: Pointer to a matrix of layer masks per access
>     412	 *     masks, identifying the layers that forbid a specific access.  Bits from
>     413	 *     this matrix can be unset according to the @path walk.  An empty matrix
>     414	 *     means that @domain allows all possible Landlock accesses (i.e. not only
>     415	 *     those identified by @access_request_dst_parent).  This matrix can
>     416	 *     initially refer to domain layer masks and, when the accesses for the
>     417	 *     destination and source are the same, to request layer masks.
>     418	 * @access_request_src_parent: Similar to @access_request_dst_parent but for an
>     419	 *     initial source path request.  Only taken into account if
>     420	 *     @layer_masks_src_parent is not NULL.
>     421	 * @layer_masks_src_parent: Similar to @layer_masks_dst_parent but for an
>     422	 *     initial source path walk.  This can be NULL if only dealing with a
>     423	 *     destination access request (i.e. not a rename nor a link action).
>     424	 * @layer_masks_child: Similar to @layer_masks_src_parent but only for the
>     425	 *     linked or renamed inode (without hierarchy).  This is only used if
>     426	 *     @layer_masks_src_parent is not NULL.
>     427	 *
>     428	 * This helper first checks that the destination has a superset of restrictions
>     429	 * compared to the source (if any) for a common path.  It then checks that the
>     430	 * collected accesses and the remaining ones are enough to allow the request.
>     431	 *
>     432	 * Returns:
>     433	 * - 0 if the access request is granted;
>     434	 * - -EACCES if it is denied because of access right other than
>     435	 *   LANDLOCK_ACCESS_FS_REFER;
>     436	 * - -EXDEV if the renaming or linking would be a privileged escalation
>     437	 *   (according to each layered policies), or if LANDLOCK_ACCESS_FS_REFER is
>     438	 *   not allowed by the source or the destination.
>     439	 */
>     440	static int check_access_path_dual(const struct landlock_ruleset *const domain,
>     441			const struct path *const path,
>     442			bool child_is_directory,
>     443			const access_mask_t access_request_dst_parent,
>     444			layer_mask_t (*const
>     445				layer_masks_dst_parent)[LANDLOCK_NUM_ACCESS_FS],
>     446			const access_mask_t access_request_src_parent,
>     447			layer_mask_t (*layer_masks_src_parent)[LANDLOCK_NUM_ACCESS_FS],
>     448			layer_mask_t (*layer_masks_child)[LANDLOCK_NUM_ACCESS_FS])
>     449	{
>     450		bool allowed_dst_parent = false, allowed_src_parent = false, is_dom_check;
>     451		struct path walker_path;
>     452		access_mask_t access_masked_dst_parent, access_masked_src_parent;
>     453	
>     454		if (!access_request_dst_parent && !access_request_src_parent)
>     455			return 0;
>     456		if (WARN_ON_ONCE(!domain || !path))
>     457			return 0;
>     458		if (is_nouser_or_private(path->dentry))
>     459			return 0;
>     460		if (WARN_ON_ONCE(domain->num_layers < 1))
>     461			return -EACCES;
>     462	
>   > 463		BUILD_BUG_ON(!layer_masks_dst_parent);
>     464		if (layer_masks_src_parent) {
>     465			if (WARN_ON_ONCE(!layer_masks_child))
>     466				return -EACCES;
>     467			access_masked_dst_parent = access_masked_src_parent =
>     468				get_handled_accesses(domain);
>     469			is_dom_check = true;
>     470		} else {
>     471			if (WARN_ON_ONCE(layer_masks_child))
>     472				return -EACCES;
>     473			access_masked_dst_parent = access_request_dst_parent;
>     474			access_masked_src_parent = access_request_src_parent;
>     475			is_dom_check = false;
>     476		}
>     477	
>     478		walker_path = *path;
>     479		path_get(&walker_path);
>     480		/*
>     481		 * We need to walk through all the hierarchy to not miss any relevant
>     482		 * restriction.
>     483		 */
>     484		while (true) {
>     485			struct dentry *parent_dentry;
>     486			const struct landlock_rule *rule;
>     487	
>     488			/*
>     489			 * If at least all accesses allowed on the destination are
>     490			 * already allowed on the source, respectively if there is at
>     491			 * least as much as restrictions on the destination than on the
>     492			 * source, then we can safely refer files from the source to
>     493			 * the destination without risking a privilege escalation.
>     494			 * This is crucial for standalone multilayered security
>     495			 * policies.  Furthermore, this helps avoid policy writers to
>     496			 * shoot themselves in the foot.
>     497			 */
>     498			if (is_dom_check && is_superset(child_is_directory,
>     499						layer_masks_dst_parent,
>     500						layer_masks_src_parent,
>     501						layer_masks_child)) {
>     502				allowed_dst_parent =
>     503					scope_to_request(access_request_dst_parent,
>     504							layer_masks_dst_parent);
>     505				allowed_src_parent =
>     506					scope_to_request(access_request_src_parent,
>     507							layer_masks_src_parent);
>     508	
>     509				/* Stops when all accesses are granted. */
>     510				if (allowed_dst_parent && allowed_src_parent)
>     511					break;
>     512	
>     513				/*
>     514				 * Downgrades checks from domain handled accesses to
>     515				 * requested accesses.
>     516				 */
>     517				is_dom_check = false;
>     518				access_masked_dst_parent = access_request_dst_parent;
>     519				access_masked_src_parent = access_request_src_parent;
>     520			}
>     521	
>     522			rule = find_rule(domain, walker_path.dentry);
>     523			allowed_dst_parent = unmask_layers(rule, access_masked_dst_parent,
>     524					layer_masks_dst_parent);
>     525			allowed_src_parent = unmask_layers(rule, access_masked_src_parent,
>     526					layer_masks_src_parent);
>     527	
>     528			/* Stops when a rule from each layer grants access. */
>     529			if (allowed_dst_parent && allowed_src_parent)
>     530				break;
>     531	
>     532	jump_up:
>     533			if (walker_path.dentry == walker_path.mnt->mnt_root) {
>     534				if (follow_up(&walker_path)) {
>     535					/* Ignores hidden mount points. */
>     536					goto jump_up;
>     537				} else {
>     538					/*
>     539					 * Stops at the real root.  Denies access
>     540					 * because not all layers have granted access.
>     541					 */
>     542					allowed_dst_parent = false;
>     543					break;
>     544				}
>     545			}
>     546			if (unlikely(IS_ROOT(walker_path.dentry))) {
>     547				/*
>     548				 * Stops at disconnected root directories.  Only allows
>     549				 * access to internal filesystems (e.g. nsfs, which is
>     550				 * reachable through /proc/<pid>/ns/<namespace>).
>     551				 */
>     552				allowed_dst_parent = !!(walker_path.mnt->mnt_flags &
>     553						MNT_INTERNAL);
>     554				break;
>     555			}
>     556			parent_dentry = dget_parent(walker_path.dentry);
>     557			dput(walker_path.dentry);
>     558			walker_path.dentry = parent_dentry;
>     559		}
>     560		path_put(&walker_path);
>     561	
>     562		if (allowed_dst_parent && allowed_src_parent)
>     563			return 0;
>     564	
>     565		/*
>     566		 * Unfortunately, we cannot prioritize EACCES over EXDEV for all
>     567		 * RENAME_EXCHANGE cases because it depends on the source and
>     568		 * destination order.  This could be changed with a new
>     569		 * security_path_rename hook implementation.
>     570		 */
>     571		if (likely(is_eacces(layer_masks_dst_parent, access_request_dst_parent)
>     572					|| is_eacces(layer_masks_src_parent,
>     573						access_request_src_parent)))
>     574			return -EACCES;
>     575	
>     576		/*
>     577		 * Gracefully forbids reparenting if the destination directory
>     578		 * hierarchy is not a superset of restrictions of the source directory
>     579		 * hierarchy, or if LANDLOCK_ACCESS_FS_REFER is not allowed by the
>     580		 * source or the destination.
>     581		 */
>     582		return -EXDEV;
>     583	}
>     584	
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Paul Moore March 17, 2022, 1:26 a.m. UTC | #3
On Mon, Feb 21, 2022 at 4:15 PM Mickaël Salaün <mic@digikod.net> wrote:
>
> From: Mickaël Salaün <mic@linux.microsoft.com>
>
> Add a new LANDLOCK_ACCESS_FS_REFER access right to enable policy writers
> to allow sandboxed processes to link and rename files from and to a
> specific set of file hierarchies.  This access right should be composed
> with LANDLOCK_ACCESS_FS_MAKE_* for the destination of a link or rename,
> and with LANDLOCK_ACCESS_FS_REMOVE_* for a source of a rename.  This
> lift a Landlock limitation that always denied changing the parent of an
> inode.
>
> Renaming or linking to the same directory is still always allowed,
> whatever LANDLOCK_ACCESS_FS_REFER is used or not, because it is not
> considered a threat to user data.
>
> However, creating multiple links or renaming to a different parent
> directory may lead to privilege escalations if not handled properly.
> Indeed, we must be sure that the source doesn't gain more privileges by
> being accessible from the destination.  This is handled by making sure
> that the source hierarchy (including the referenced file or directory
> itself) restricts at least as much the destination hierarchy.  If it is
> not the case, an EXDEV error is returned, making it potentially possible
> for user space to copy the file hierarchy instead of moving or linking
> it.
>
> Instead of creating different access rights for the source and the
> destination, we choose to make it simple and consistent for users.
> Indeed, considering the previous constraint, it would be weird to
> require such destination access right to be also granted to the source
> (to make it a superset).
>
> See the provided documentation for additional details.
>
> New tests are provided with a following commit.
>
> Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
> Link: https://lore.kernel.org/r/20220221212522.320243-7-mic@digikod.net
> ---
>  include/uapi/linux/landlock.h                |  27 +-
>  security/landlock/fs.c                       | 550 ++++++++++++++++---
>  security/landlock/limits.h                   |   2 +-
>  security/landlock/syscalls.c                 |   2 +-
>  tools/testing/selftests/landlock/base_test.c |   2 +-
>  tools/testing/selftests/landlock/fs_test.c   |   3 +-
>  6 files changed, 516 insertions(+), 70 deletions(-)

...

> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> index 3886f9ad1a60..c7c7ce4e7cd5 100644
> --- a/security/landlock/fs.c
> +++ b/security/landlock/fs.c
> @@ -4,6 +4,7 @@
>   *
>   * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
>   * Copyright © 2018-2020 ANSSI
> + * Copyright © 2021-2022 Microsoft Corporation
>   */
>
>  #include <linux/atomic.h>
> @@ -269,16 +270,188 @@ static inline bool is_nouser_or_private(const struct dentry *dentry)
>                          unlikely(IS_PRIVATE(d_backing_inode(dentry))));
>  }
>
> -static int check_access_path(const struct landlock_ruleset *const domain,
> -               const struct path *const path,
> +static inline access_mask_t get_handled_accesses(
> +               const struct landlock_ruleset *const domain)
> +{
> +       access_mask_t access_dom = 0;
> +       unsigned long access_bit;

Would it be better to declare @access_bit as an access_mask_t type?
You're not using any macros like for_each_set_bit() in this function
so I believe it should be safe.

> +       for (access_bit = 0; access_bit < LANDLOCK_NUM_ACCESS_FS;
> +                       access_bit++) {
> +               size_t layer_level;

Considering the number of layers has dropped down to 16, it seems like
a normal unsigned int might be big enough for @layer_level :)

> +               for (layer_level = 0; layer_level < domain->num_layers;
> +                               layer_level++) {
> +                       if (domain->fs_access_masks[layer_level] &
> +                                       BIT_ULL(access_bit)) {
> +                               access_dom |= BIT_ULL(access_bit);
> +                               break;
> +                       }
> +               }
> +       }
> +       return access_dom;
> +}
> +
> +static inline access_mask_t init_layer_masks(
> +               const struct landlock_ruleset *const domain,
> +               const access_mask_t access_request,
> +               layer_mask_t (*const layer_masks)[LANDLOCK_NUM_ACCESS_FS])
> +{
> +       access_mask_t handled_accesses = 0;
> +       size_t layer_level;
> +
> +       memset(layer_masks, 0, sizeof(*layer_masks));
> +       if (WARN_ON_ONCE(!access_request))
> +               return 0;
> +
> +       /* Saves all handled accesses per layer. */
> +       for (layer_level = 0; layer_level < domain->num_layers;
> +                       layer_level++) {
> +               const unsigned long access_req = access_request;
> +               unsigned long access_bit;
> +
> +               for_each_set_bit(access_bit, &access_req,
> +                               ARRAY_SIZE(*layer_masks)) {
> +                       if (domain->fs_access_masks[layer_level] &
> +                                       BIT_ULL(access_bit)) {
> +                               (*layer_masks)[access_bit] |=
> +                                       BIT_ULL(layer_level);
> +                               handled_accesses |= BIT_ULL(access_bit);
> +                       }
> +               }
> +       }
> +       return handled_accesses;
> +}
> +
> +/*
> + * Check that a destination file hierarchy has more restrictions than a source
> + * file hierarchy.  This is only used for link and rename actions.
> + */
> +static inline bool is_superset(bool child_is_directory,
> +               const layer_mask_t (*const
> +                       layer_masks_dst_parent)[LANDLOCK_NUM_ACCESS_FS],
> +               const layer_mask_t (*const
> +                       layer_masks_src_parent)[LANDLOCK_NUM_ACCESS_FS],
> +               const layer_mask_t (*const
> +                       layer_masks_child)[LANDLOCK_NUM_ACCESS_FS])
> +{
> +       unsigned long access_bit;
> +
> +       for (access_bit = 0; access_bit < ARRAY_SIZE(*layer_masks_dst_parent);
> +                       access_bit++) {
> +               /* Ignores accesses that only make sense for directories. */
> +               if (!child_is_directory && !(BIT_ULL(access_bit) & ACCESS_FILE))
> +                       continue;
> +
> +               /*
> +                * Checks if the destination restrictions are a superset of the
> +                * source ones (i.e. inherited access rights without child
> +                * exceptions).
> +                */
> +               if ((((*layer_masks_src_parent)[access_bit] & (*layer_masks_child)[access_bit]) |
> +                                       (*layer_masks_dst_parent)[access_bit]) !=
> +                               (*layer_masks_dst_parent)[access_bit])
> +                       return false;
> +       }
> +       return true;
> +}
> +
> +/*
> + * Removes @layer_masks accesses that are not requested.
> + *
> + * Returns true if the request is allowed, false otherwise.
> + */
> +static inline bool scope_to_request(const access_mask_t access_request,
> +               layer_mask_t (*const layer_masks)[LANDLOCK_NUM_ACCESS_FS])
> +{
> +       const unsigned long access_req = access_request;
> +       unsigned long access_bit;
> +
> +       if (WARN_ON_ONCE(!layer_masks))
> +               return true;
> +
> +       for_each_clear_bit(access_bit, &access_req, ARRAY_SIZE(*layer_masks))
> +               (*layer_masks)[access_bit] = 0;
> +       return !memchr_inv(layer_masks, 0, sizeof(*layer_masks));
> +}
> +
> +/*
> + * Returns true if there is at least one access right different than
> + * LANDLOCK_ACCESS_FS_REFER.
> + */
> +static inline bool is_eacces(
> +               const layer_mask_t (*const
> +                       layer_masks)[LANDLOCK_NUM_ACCESS_FS],
>                 const access_mask_t access_request)
>  {

Granted, I don't have as deep of an understanding of Landlock as you
do, but the function name "is_eacces" seems a little odd given the
nature of the function.  Perhaps "is_fsrefer"?

> -       layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
> -       bool allowed = false, has_access = false;
> +       unsigned long access_bit;
> +       /* LANDLOCK_ACCESS_FS_REFER alone must return -EXDEV. */
> +       const unsigned long access_check = access_request &
> +               ~LANDLOCK_ACCESS_FS_REFER;
> +
> +       if (!layer_masks)
> +               return false;
> +
> +       for_each_set_bit(access_bit, &access_check, ARRAY_SIZE(*layer_masks)) {
> +               if ((*layer_masks)[access_bit])
> +                       return true;
> +       }

Is calling for_each_set_bit() overkill here?  @access_check should
only ever have at most one bit set (LANDLOCK_ACCESS_FS_REFER), yes?

> +       return false;
> +}
> +
> +/**
> + * check_access_path_dual - Check a source and a destination accesses
> + *
> + * @domain: Domain to check against.
> + * @path: File hierarchy to walk through.
> + * @child_is_directory: Must be set to true if the (original) leaf is a
> + *     directory, false otherwise.
> + * @access_request_dst_parent: Accesses to check, once @layer_masks_dst_parent
> + *     is equal to @layer_masks_src_parent (if any).
> + * @layer_masks_dst_parent: Pointer to a matrix of layer masks per access
> + *     masks, identifying the layers that forbid a specific access.  Bits from
> + *     this matrix can be unset according to the @path walk.  An empty matrix
> + *     means that @domain allows all possible Landlock accesses (i.e. not only
> + *     those identified by @access_request_dst_parent).  This matrix can
> + *     initially refer to domain layer masks and, when the accesses for the
> + *     destination and source are the same, to request layer masks.
> + * @access_request_src_parent: Similar to @access_request_dst_parent but for an
> + *     initial source path request.  Only taken into account if
> + *     @layer_masks_src_parent is not NULL.
> + * @layer_masks_src_parent: Similar to @layer_masks_dst_parent but for an
> + *     initial source path walk.  This can be NULL if only dealing with a
> + *     destination access request (i.e. not a rename nor a link action).
> + * @layer_masks_child: Similar to @layer_masks_src_parent but only for the
> + *     linked or renamed inode (without hierarchy).  This is only used if
> + *     @layer_masks_src_parent is not NULL.
> + *
> + * This helper first checks that the destination has a superset of restrictions
> + * compared to the source (if any) for a common path.  It then checks that the
> + * collected accesses and the remaining ones are enough to allow the request.
> + *
> + * Returns:
> + * - 0 if the access request is granted;
> + * - -EACCES if it is denied because of access right other than
> + *   LANDLOCK_ACCESS_FS_REFER;
> + * - -EXDEV if the renaming or linking would be a privileged escalation
> + *   (according to each layered policies), or if LANDLOCK_ACCESS_FS_REFER is
> + *   not allowed by the source or the destination.
> + */
> +static int check_access_path_dual(const struct landlock_ruleset *const domain,
> +               const struct path *const path,
> +               bool child_is_directory,
> +               const access_mask_t access_request_dst_parent,
> +               layer_mask_t (*const
> +                       layer_masks_dst_parent)[LANDLOCK_NUM_ACCESS_FS],
> +               const access_mask_t access_request_src_parent,
> +               layer_mask_t (*layer_masks_src_parent)[LANDLOCK_NUM_ACCESS_FS],
> +               layer_mask_t (*layer_masks_child)[LANDLOCK_NUM_ACCESS_FS])
> +{
> +       bool allowed_dst_parent = false, allowed_src_parent = false, is_dom_check;
>         struct path walker_path;
> -       size_t i;
> +       access_mask_t access_masked_dst_parent, access_masked_src_parent;
>
> -       if (!access_request)
> +       if (!access_request_dst_parent && !access_request_src_parent)
>                 return 0;
>         if (WARN_ON_ONCE(!domain || !path))
>                 return 0;
> @@ -287,22 +460,20 @@ static int check_access_path(const struct landlock_ruleset *const domain,
>         if (WARN_ON_ONCE(domain->num_layers < 1))
>                 return -EACCES;
>
> -       /* Saves all layers handling a subset of requested accesses. */
> -       for (i = 0; i < domain->num_layers; i++) {
> -               const unsigned long access_req = access_request;
> -               unsigned long access_bit;
> -
> -               for_each_set_bit(access_bit, &access_req,
> -                               ARRAY_SIZE(layer_masks)) {
> -                       if (domain->fs_access_masks[i] & BIT_ULL(access_bit)) {
> -                               layer_masks[access_bit] |= BIT_ULL(i);
> -                               has_access = true;
> -                       }
> -               }
> +       BUILD_BUG_ON(!layer_masks_dst_parent);

I know the kbuild robot already flagged this, but checking function
parameters with BUILD_BUG_ON() does seem a bit ... unusual :)

> +       if (layer_masks_src_parent) {
> +               if (WARN_ON_ONCE(!layer_masks_child))
> +                       return -EACCES;
> +               access_masked_dst_parent = access_masked_src_parent =
> +                       get_handled_accesses(domain);
> +               is_dom_check = true;
> +       } else {
> +               if (WARN_ON_ONCE(layer_masks_child))
> +                       return -EACCES;
> +               access_masked_dst_parent = access_request_dst_parent;
> +               access_masked_src_parent = access_request_src_parent;
> +               is_dom_check = false;
>         }
> -       /* An access request not handled by the domain is allowed. */
> -       if (!has_access)
> -               return 0;
>
>         walker_path = *path;
>         path_get(&walker_path);
> @@ -312,11 +483,50 @@ static int check_access_path(const struct landlock_ruleset *const domain,
>          */
>         while (true) {
>                 struct dentry *parent_dentry;
> +               const struct landlock_rule *rule;
> +
> +               /*
> +                * If at least all accesses allowed on the destination are
> +                * already allowed on the source, respectively if there is at
> +                * least as much as restrictions on the destination than on the
> +                * source, then we can safely refer files from the source to
> +                * the destination without risking a privilege escalation.
> +                * This is crucial for standalone multilayered security
> +                * policies.  Furthermore, this helps avoid policy writers to
> +                * shoot themselves in the foot.
> +                */
> +               if (is_dom_check && is_superset(child_is_directory,
> +                                       layer_masks_dst_parent,
> +                                       layer_masks_src_parent,
> +                                       layer_masks_child)) {
> +                       allowed_dst_parent =
> +                               scope_to_request(access_request_dst_parent,
> +                                               layer_masks_dst_parent);
> +                       allowed_src_parent =
> +                               scope_to_request(access_request_src_parent,
> +                                               layer_masks_src_parent);
> +
> +                       /* Stops when all accesses are granted. */
> +                       if (allowed_dst_parent && allowed_src_parent)
> +                               break;
> +
> +                       /*
> +                        * Downgrades checks from domain handled accesses to
> +                        * requested accesses.
> +                        */
> +                       is_dom_check = false;
> +                       access_masked_dst_parent = access_request_dst_parent;
> +                       access_masked_src_parent = access_request_src_parent;
> +               }
> +
> +               rule = find_rule(domain, walker_path.dentry);
> +               allowed_dst_parent = unmask_layers(rule, access_masked_dst_parent,
> +                               layer_masks_dst_parent);
> +               allowed_src_parent = unmask_layers(rule, access_masked_src_parent,
> +                               layer_masks_src_parent);
>
> -               allowed = unmask_layers(find_rule(domain, walker_path.dentry),
> -                               access_request, &layer_masks);
> -               if (allowed)
> -                       /* Stops when a rule from each layer grants access. */
> +               /* Stops when a rule from each layer grants access. */
> +               if (allowed_dst_parent && allowed_src_parent)
>                         break;

If "(allowed_dst_parent && allowed_src_parent)" is true, you break out
of the while loop only to do a path_put(), check the two booleans once
more, and then return zero, yes?  Why not just do the path_put() and
return zero here?


--
paul-moore.com
Mickaël Salaün March 17, 2022, 12:04 p.m. UTC | #4
On 17/03/2022 02:26, Paul Moore wrote:
> On Mon, Feb 21, 2022 at 4:15 PM Mickaël Salaün <mic@digikod.net> wrote:
>>
>> From: Mickaël Salaün <mic@linux.microsoft.com>
>>
>> Add a new LANDLOCK_ACCESS_FS_REFER access right to enable policy writers
>> to allow sandboxed processes to link and rename files from and to a
>> specific set of file hierarchies.  This access right should be composed
>> with LANDLOCK_ACCESS_FS_MAKE_* for the destination of a link or rename,
>> and with LANDLOCK_ACCESS_FS_REMOVE_* for a source of a rename.  This
>> lift a Landlock limitation that always denied changing the parent of an
>> inode.
>>
>> Renaming or linking to the same directory is still always allowed,
>> whatever LANDLOCK_ACCESS_FS_REFER is used or not, because it is not
>> considered a threat to user data.
>>
>> However, creating multiple links or renaming to a different parent
>> directory may lead to privilege escalations if not handled properly.
>> Indeed, we must be sure that the source doesn't gain more privileges by
>> being accessible from the destination.  This is handled by making sure
>> that the source hierarchy (including the referenced file or directory
>> itself) restricts at least as much the destination hierarchy.  If it is
>> not the case, an EXDEV error is returned, making it potentially possible
>> for user space to copy the file hierarchy instead of moving or linking
>> it.
>>
>> Instead of creating different access rights for the source and the
>> destination, we choose to make it simple and consistent for users.
>> Indeed, considering the previous constraint, it would be weird to
>> require such destination access right to be also granted to the source
>> (to make it a superset).
>>
>> See the provided documentation for additional details.
>>
>> New tests are provided with a following commit.
>>
>> Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
>> Link: https://lore.kernel.org/r/20220221212522.320243-7-mic@digikod.net
>> ---
>>   include/uapi/linux/landlock.h                |  27 +-
>>   security/landlock/fs.c                       | 550 ++++++++++++++++---
>>   security/landlock/limits.h                   |   2 +-
>>   security/landlock/syscalls.c                 |   2 +-
>>   tools/testing/selftests/landlock/base_test.c |   2 +-
>>   tools/testing/selftests/landlock/fs_test.c   |   3 +-
>>   6 files changed, 516 insertions(+), 70 deletions(-)
> 
> ...
> 
>> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
>> index 3886f9ad1a60..c7c7ce4e7cd5 100644
>> --- a/security/landlock/fs.c
>> +++ b/security/landlock/fs.c
>> @@ -4,6 +4,7 @@
>>    *
>>    * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
>>    * Copyright © 2018-2020 ANSSI
>> + * Copyright © 2021-2022 Microsoft Corporation
>>    */
>>
>>   #include <linux/atomic.h>
>> @@ -269,16 +270,188 @@ static inline bool is_nouser_or_private(const struct dentry *dentry)
>>                           unlikely(IS_PRIVATE(d_backing_inode(dentry))));
>>   }
>>
>> -static int check_access_path(const struct landlock_ruleset *const domain,
>> -               const struct path *const path,
>> +static inline access_mask_t get_handled_accesses(
>> +               const struct landlock_ruleset *const domain)
>> +{
>> +       access_mask_t access_dom = 0;
>> +       unsigned long access_bit;
> 
> Would it be better to declare @access_bit as an access_mask_t type?
> You're not using any macros like for_each_set_bit() in this function
> so I believe it should be safe.

Right, I'll change that.


> 
>> +       for (access_bit = 0; access_bit < LANDLOCK_NUM_ACCESS_FS;
>> +                       access_bit++) {
>> +               size_t layer_level;
> 
> Considering the number of layers has dropped down to 16, it seems like
> a normal unsigned int might be big enough for @layer_level :)

We could switch to u8, but I prefer to stick to size_t for array indexes 
which enable to reduce the cognitive workload related to the size of 
such array. ;) I guess there is enough info for compilers to optimize 
such code anyway.


> 
>> +               for (layer_level = 0; layer_level < domain->num_layers;
>> +                               layer_level++) {
>> +                       if (domain->fs_access_masks[layer_level] &
>> +                                       BIT_ULL(access_bit)) {
>> +                               access_dom |= BIT_ULL(access_bit);
>> +                               break;
>> +                       }
>> +               }
>> +       }
>> +       return access_dom;
>> +}
>> +
>> +static inline access_mask_t init_layer_masks(
>> +               const struct landlock_ruleset *const domain,
>> +               const access_mask_t access_request,
>> +               layer_mask_t (*const layer_masks)[LANDLOCK_NUM_ACCESS_FS])
>> +{
>> +       access_mask_t handled_accesses = 0;
>> +       size_t layer_level;
>> +
>> +       memset(layer_masks, 0, sizeof(*layer_masks));
>> +       if (WARN_ON_ONCE(!access_request))
>> +               return 0;
>> +
>> +       /* Saves all handled accesses per layer. */
>> +       for (layer_level = 0; layer_level < domain->num_layers;
>> +                       layer_level++) {
>> +               const unsigned long access_req = access_request;
>> +               unsigned long access_bit;
>> +
>> +               for_each_set_bit(access_bit, &access_req,
>> +                               ARRAY_SIZE(*layer_masks)) {
>> +                       if (domain->fs_access_masks[layer_level] &
>> +                                       BIT_ULL(access_bit)) {
>> +                               (*layer_masks)[access_bit] |=
>> +                                       BIT_ULL(layer_level);
>> +                               handled_accesses |= BIT_ULL(access_bit);
>> +                       }
>> +               }
>> +       }
>> +       return handled_accesses;
>> +}
>> +
>> +/*
>> + * Check that a destination file hierarchy has more restrictions than a source
>> + * file hierarchy.  This is only used for link and rename actions.
>> + */
>> +static inline bool is_superset(bool child_is_directory,
>> +               const layer_mask_t (*const
>> +                       layer_masks_dst_parent)[LANDLOCK_NUM_ACCESS_FS],
>> +               const layer_mask_t (*const
>> +                       layer_masks_src_parent)[LANDLOCK_NUM_ACCESS_FS],
>> +               const layer_mask_t (*const
>> +                       layer_masks_child)[LANDLOCK_NUM_ACCESS_FS])
>> +{
>> +       unsigned long access_bit;
>> +
>> +       for (access_bit = 0; access_bit < ARRAY_SIZE(*layer_masks_dst_parent);
>> +                       access_bit++) {
>> +               /* Ignores accesses that only make sense for directories. */
>> +               if (!child_is_directory && !(BIT_ULL(access_bit) & ACCESS_FILE))
>> +                       continue;
>> +
>> +               /*
>> +                * Checks if the destination restrictions are a superset of the
>> +                * source ones (i.e. inherited access rights without child
>> +                * exceptions).
>> +                */
>> +               if ((((*layer_masks_src_parent)[access_bit] & (*layer_masks_child)[access_bit]) |
>> +                                       (*layer_masks_dst_parent)[access_bit]) !=
>> +                               (*layer_masks_dst_parent)[access_bit])
>> +                       return false;
>> +       }
>> +       return true;
>> +}
>> +
>> +/*
>> + * Removes @layer_masks accesses that are not requested.
>> + *
>> + * Returns true if the request is allowed, false otherwise.
>> + */
>> +static inline bool scope_to_request(const access_mask_t access_request,
>> +               layer_mask_t (*const layer_masks)[LANDLOCK_NUM_ACCESS_FS])
>> +{
>> +       const unsigned long access_req = access_request;
>> +       unsigned long access_bit;
>> +
>> +       if (WARN_ON_ONCE(!layer_masks))
>> +               return true;
>> +
>> +       for_each_clear_bit(access_bit, &access_req, ARRAY_SIZE(*layer_masks))
>> +               (*layer_masks)[access_bit] = 0;
>> +       return !memchr_inv(layer_masks, 0, sizeof(*layer_masks));
>> +}
>> +
>> +/*
>> + * Returns true if there is at least one access right different than
>> + * LANDLOCK_ACCESS_FS_REFER.
>> + */
>> +static inline bool is_eacces(
>> +               const layer_mask_t (*const
>> +                       layer_masks)[LANDLOCK_NUM_ACCESS_FS],
>>                  const access_mask_t access_request)
>>   {
> 
> Granted, I don't have as deep of an understanding of Landlock as you
> do, but the function name "is_eacces" seems a little odd given the
> nature of the function.  Perhaps "is_fsrefer"?

Hmm, this helper does multiple things which are necessary to know if we 
need to return -EACCES or -EXDEV. Renaming it to is_fsrefer() would 
require to inverse the logic and use boolean negations in the callers 
(because of ordering). Renaming to something like without_fs_refer() 
would not be completely correct because we also check if there is no 
layer_masks, which indicated that it doesn't contain an access right 
that should return -EACCES. This helper is named as such because the 
underlying semantic is to check for such error code, which is a tricky. 
I can rename it co contains_eacces() or something, but a longer name 
would require to cut the caller lines to fit 80 columns. :|


> 
>> -       layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
>> -       bool allowed = false, has_access = false;
>> +       unsigned long access_bit;
>> +       /* LANDLOCK_ACCESS_FS_REFER alone must return -EXDEV. */
>> +       const unsigned long access_check = access_request &
>> +               ~LANDLOCK_ACCESS_FS_REFER;
>> +
>> +       if (!layer_masks)
>> +               return false;
>> +
>> +       for_each_set_bit(access_bit, &access_check, ARRAY_SIZE(*layer_masks)) {
>> +               if ((*layer_masks)[access_bit])
>> +                       return true;
>> +       }
> 
> Is calling for_each_set_bit() overkill here?  @access_check should
> only ever have at most one bit set (LANDLOCK_ACCESS_FS_REFER), yes?

No, it is the contrary, the bitmask is inverted and this loop check for 
non-FS_REFER access rights that should then return -EACCES. For 
instance, if a sandbox handles (and then restricts) MAKE_REG and REFER, 
a request to link a regular file would contains both of these bits, and 
the kernel should return -EACCES if MAKE_REG is not granted or -EXDEV if 
the request is only denied because of REFER. The reparent_* tests check 
the consistency of this behavior (with the exception of a 
RENAME_EXCHANGE case, see [1]).

[1] https://lore.kernel.org/r/20220222175332.384545-1-mic@digikod.net


> 
>> +       return false;
>> +}
>> +
>> +/**
>> + * check_access_path_dual - Check a source and a destination accesses
>> + *
>> + * @domain: Domain to check against.
>> + * @path: File hierarchy to walk through.
>> + * @child_is_directory: Must be set to true if the (original) leaf is a
>> + *     directory, false otherwise.
>> + * @access_request_dst_parent: Accesses to check, once @layer_masks_dst_parent
>> + *     is equal to @layer_masks_src_parent (if any).
>> + * @layer_masks_dst_parent: Pointer to a matrix of layer masks per access
>> + *     masks, identifying the layers that forbid a specific access.  Bits from
>> + *     this matrix can be unset according to the @path walk.  An empty matrix
>> + *     means that @domain allows all possible Landlock accesses (i.e. not only
>> + *     those identified by @access_request_dst_parent).  This matrix can
>> + *     initially refer to domain layer masks and, when the accesses for the
>> + *     destination and source are the same, to request layer masks.
>> + * @access_request_src_parent: Similar to @access_request_dst_parent but for an
>> + *     initial source path request.  Only taken into account if
>> + *     @layer_masks_src_parent is not NULL.
>> + * @layer_masks_src_parent: Similar to @layer_masks_dst_parent but for an
>> + *     initial source path walk.  This can be NULL if only dealing with a
>> + *     destination access request (i.e. not a rename nor a link action).
>> + * @layer_masks_child: Similar to @layer_masks_src_parent but only for the
>> + *     linked or renamed inode (without hierarchy).  This is only used if
>> + *     @layer_masks_src_parent is not NULL.
>> + *
>> + * This helper first checks that the destination has a superset of restrictions
>> + * compared to the source (if any) for a common path.  It then checks that the
>> + * collected accesses and the remaining ones are enough to allow the request.
>> + *
>> + * Returns:
>> + * - 0 if the access request is granted;
>> + * - -EACCES if it is denied because of access right other than
>> + *   LANDLOCK_ACCESS_FS_REFER;
>> + * - -EXDEV if the renaming or linking would be a privileged escalation
>> + *   (according to each layered policies), or if LANDLOCK_ACCESS_FS_REFER is
>> + *   not allowed by the source or the destination.
>> + */
>> +static int check_access_path_dual(const struct landlock_ruleset *const domain,
>> +               const struct path *const path,
>> +               bool child_is_directory,
>> +               const access_mask_t access_request_dst_parent,
>> +               layer_mask_t (*const
>> +                       layer_masks_dst_parent)[LANDLOCK_NUM_ACCESS_FS],
>> +               const access_mask_t access_request_src_parent,
>> +               layer_mask_t (*layer_masks_src_parent)[LANDLOCK_NUM_ACCESS_FS],
>> +               layer_mask_t (*layer_masks_child)[LANDLOCK_NUM_ACCESS_FS])
>> +{
>> +       bool allowed_dst_parent = false, allowed_src_parent = false, is_dom_check;
>>          struct path walker_path;
>> -       size_t i;
>> +       access_mask_t access_masked_dst_parent, access_masked_src_parent;
>>
>> -       if (!access_request)
>> +       if (!access_request_dst_parent && !access_request_src_parent)
>>                  return 0;
>>          if (WARN_ON_ONCE(!domain || !path))
>>                  return 0;
>> @@ -287,22 +460,20 @@ static int check_access_path(const struct landlock_ruleset *const domain,
>>          if (WARN_ON_ONCE(domain->num_layers < 1))
>>                  return -EACCES;
>>
>> -       /* Saves all layers handling a subset of requested accesses. */
>> -       for (i = 0; i < domain->num_layers; i++) {
>> -               const unsigned long access_req = access_request;
>> -               unsigned long access_bit;
>> -
>> -               for_each_set_bit(access_bit, &access_req,
>> -                               ARRAY_SIZE(layer_masks)) {
>> -                       if (domain->fs_access_masks[i] & BIT_ULL(access_bit)) {
>> -                               layer_masks[access_bit] |= BIT_ULL(i);
>> -                               has_access = true;
>> -                       }
>> -               }
>> +       BUILD_BUG_ON(!layer_masks_dst_parent);
> 
> I know the kbuild robot already flagged this, but checking function
> parameters with BUILD_BUG_ON() does seem a bit ... unusual :)

Yeah, I like such guarantee but it may not work without __always_inline. 
I moved this check in the previous WARN_ON_ONCE().


> 
>> +       if (layer_masks_src_parent) {
>> +               if (WARN_ON_ONCE(!layer_masks_child))
>> +                       return -EACCES;
>> +               access_masked_dst_parent = access_masked_src_parent =
>> +                       get_handled_accesses(domain);
>> +               is_dom_check = true;
>> +       } else {
>> +               if (WARN_ON_ONCE(layer_masks_child))
>> +                       return -EACCES;
>> +               access_masked_dst_parent = access_request_dst_parent;
>> +               access_masked_src_parent = access_request_src_parent;
>> +               is_dom_check = false;
>>          }
>> -       /* An access request not handled by the domain is allowed. */
>> -       if (!has_access)
>> -               return 0;
>>
>>          walker_path = *path;
>>          path_get(&walker_path);
>> @@ -312,11 +483,50 @@ static int check_access_path(const struct landlock_ruleset *const domain,
>>           */
>>          while (true) {
>>                  struct dentry *parent_dentry;
>> +               const struct landlock_rule *rule;
>> +
>> +               /*
>> +                * If at least all accesses allowed on the destination are
>> +                * already allowed on the source, respectively if there is at
>> +                * least as much as restrictions on the destination than on the
>> +                * source, then we can safely refer files from the source to
>> +                * the destination without risking a privilege escalation.
>> +                * This is crucial for standalone multilayered security
>> +                * policies.  Furthermore, this helps avoid policy writers to
>> +                * shoot themselves in the foot.
>> +                */
>> +               if (is_dom_check && is_superset(child_is_directory,
>> +                                       layer_masks_dst_parent,
>> +                                       layer_masks_src_parent,
>> +                                       layer_masks_child)) {
>> +                       allowed_dst_parent =
>> +                               scope_to_request(access_request_dst_parent,
>> +                                               layer_masks_dst_parent);
>> +                       allowed_src_parent =
>> +                               scope_to_request(access_request_src_parent,
>> +                                               layer_masks_src_parent);
>> +
>> +                       /* Stops when all accesses are granted. */
>> +                       if (allowed_dst_parent && allowed_src_parent)
>> +                               break;
>> +
>> +                       /*
>> +                        * Downgrades checks from domain handled accesses to
>> +                        * requested accesses.
>> +                        */
>> +                       is_dom_check = false;
>> +                       access_masked_dst_parent = access_request_dst_parent;
>> +                       access_masked_src_parent = access_request_src_parent;
>> +               }
>> +
>> +               rule = find_rule(domain, walker_path.dentry);
>> +               allowed_dst_parent = unmask_layers(rule, access_masked_dst_parent,
>> +                               layer_masks_dst_parent);
>> +               allowed_src_parent = unmask_layers(rule, access_masked_src_parent,
>> +                               layer_masks_src_parent);
>>
>> -               allowed = unmask_layers(find_rule(domain, walker_path.dentry),
>> -                               access_request, &layer_masks);
>> -               if (allowed)
>> -                       /* Stops when a rule from each layer grants access. */
>> +               /* Stops when a rule from each layer grants access. */
>> +               if (allowed_dst_parent && allowed_src_parent)
>>                          break;
> 
> If "(allowed_dst_parent && allowed_src_parent)" is true, you break out
> of the while loop only to do a path_put(), check the two booleans once
> more, and then return zero, yes?  Why not just do the path_put() and
> return zero here?

Correct, that would work, but I prefer not to duplicate the logic of 
granting access if it doesn't make the code more complex, which I think 
is not the case here, and I'm reluctant to duplicate path_get/put() 
calls. This loop break is a small optimization to avoid walking the path 
one more step, and writing it this way looks cleaner and less 
error-prone from my point of view.
Paul Moore March 17, 2022, 9:42 p.m. UTC | #5
On Thu, Mar 17, 2022 at 8:03 AM Mickaël Salaün <mic@digikod.net> wrote:
> On 17/03/2022 02:26, Paul Moore wrote:
> > On Mon, Feb 21, 2022 at 4:15 PM Mickaël Salaün <mic@digikod.net> wrote:
> >>
> >> From: Mickaël Salaün <mic@linux.microsoft.com>
> >>
> >> Add a new LANDLOCK_ACCESS_FS_REFER access right to enable policy writers
> >> to allow sandboxed processes to link and rename files from and to a
> >> specific set of file hierarchies.  This access right should be composed
> >> with LANDLOCK_ACCESS_FS_MAKE_* for the destination of a link or rename,
> >> and with LANDLOCK_ACCESS_FS_REMOVE_* for a source of a rename.  This
> >> lift a Landlock limitation that always denied changing the parent of an
> >> inode.
> >>
> >> Renaming or linking to the same directory is still always allowed,
> >> whatever LANDLOCK_ACCESS_FS_REFER is used or not, because it is not
> >> considered a threat to user data.
> >>
> >> However, creating multiple links or renaming to a different parent
> >> directory may lead to privilege escalations if not handled properly.
> >> Indeed, we must be sure that the source doesn't gain more privileges by
> >> being accessible from the destination.  This is handled by making sure
> >> that the source hierarchy (including the referenced file or directory
> >> itself) restricts at least as much the destination hierarchy.  If it is
> >> not the case, an EXDEV error is returned, making it potentially possible
> >> for user space to copy the file hierarchy instead of moving or linking
> >> it.
> >>
> >> Instead of creating different access rights for the source and the
> >> destination, we choose to make it simple and consistent for users.
> >> Indeed, considering the previous constraint, it would be weird to
> >> require such destination access right to be also granted to the source
> >> (to make it a superset).
> >>
> >> See the provided documentation for additional details.
> >>
> >> New tests are provided with a following commit.
> >>
> >> Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
> >> Link: https://lore.kernel.org/r/20220221212522.320243-7-mic@digikod.net
> >> ---
> >>   include/uapi/linux/landlock.h                |  27 +-
> >>   security/landlock/fs.c                       | 550 ++++++++++++++++---
> >>   security/landlock/limits.h                   |   2 +-
> >>   security/landlock/syscalls.c                 |   2 +-
> >>   tools/testing/selftests/landlock/base_test.c |   2 +-
> >>   tools/testing/selftests/landlock/fs_test.c   |   3 +-
> >>   6 files changed, 516 insertions(+), 70 deletions(-)

...

> >> +/*
> >> + * Returns true if there is at least one access right different than
> >> + * LANDLOCK_ACCESS_FS_REFER.
> >> + */
> >> +static inline bool is_eacces(
> >> +               const layer_mask_t (*const
> >> +                       layer_masks)[LANDLOCK_NUM_ACCESS_FS],
> >>                  const access_mask_t access_request)
> >>   {
> >
> > Granted, I don't have as deep of an understanding of Landlock as you
> > do, but the function name "is_eacces" seems a little odd given the
> > nature of the function.  Perhaps "is_fsrefer"?
>
> Hmm, this helper does multiple things which are necessary to know if we
> need to return -EACCES or -EXDEV. Renaming it to is_fsrefer() would
> require to inverse the logic and use boolean negations in the callers
> (because of ordering). Renaming to something like without_fs_refer()
> would not be completely correct because we also check if there is no
> layer_masks, which indicated that it doesn't contain an access right
> that should return -EACCES. This helper is named as such because the
> underlying semantic is to check for such error code, which is a tricky.
> I can rename it co contains_eacces() or something, but a longer name
> would require to cut the caller lines to fit 80 columns. :|

You know the Landlock code better than I do, if you like
"is_eacces()", then leave it as it is.

> >> -       layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
> >> -       bool allowed = false, has_access = false;
> >> +       unsigned long access_bit;
> >> +       /* LANDLOCK_ACCESS_FS_REFER alone must return -EXDEV. */
> >> +       const unsigned long access_check = access_request &
> >> +               ~LANDLOCK_ACCESS_FS_REFER;
> >> +
> >> +       if (!layer_masks)
> >> +               return false;
> >> +
> >> +       for_each_set_bit(access_bit, &access_check, ARRAY_SIZE(*layer_masks)) {
> >> +               if ((*layer_masks)[access_bit])
> >> +                       return true;
> >> +       }
> >
> > Is calling for_each_set_bit() overkill here?  @access_check should
> > only ever have at most one bit set (LANDLOCK_ACCESS_FS_REFER), yes?
>
> No, it is the contrary ...

Gotcha.  Thanks for the clarification, I must have missed that when I
was looking at it last night.

> >> @@ -287,22 +460,20 @@ static int check_access_path(const struct landlock_ruleset *const domain,
> >>          if (WARN_ON_ONCE(domain->num_layers < 1))
> >>                  return -EACCES;
> >>
> >> -       /* Saves all layers handling a subset of requested accesses. */
> >> -       for (i = 0; i < domain->num_layers; i++) {
> >> -               const unsigned long access_req = access_request;
> >> -               unsigned long access_bit;
> >> -
> >> -               for_each_set_bit(access_bit, &access_req,
> >> -                               ARRAY_SIZE(layer_masks)) {
> >> -                       if (domain->fs_access_masks[i] & BIT_ULL(access_bit)) {
> >> -                               layer_masks[access_bit] |= BIT_ULL(i);
> >> -                               has_access = true;
> >> -                       }
> >> -               }
> >> +       BUILD_BUG_ON(!layer_masks_dst_parent);
> >
> > I know the kbuild robot already flagged this, but checking function
> > parameters with BUILD_BUG_ON() does seem a bit ... unusual :)
>
> Yeah, I like such guarantee but it may not work without __always_inline.
> I moved this check in the previous WARN_ON_ONCE().

That sounds good to me.

> >> @@ -312,11 +483,50 @@ static int check_access_path(const struct landlock_ruleset *const domain,
> >>           */
> >>          while (true) {
> >>                  struct dentry *parent_dentry;
> >> +               const struct landlock_rule *rule;
> >> +
> >> +               /*
> >> +                * If at least all accesses allowed on the destination are
> >> +                * already allowed on the source, respectively if there is at
> >> +                * least as much as restrictions on the destination than on the
> >> +                * source, then we can safely refer files from the source to
> >> +                * the destination without risking a privilege escalation.
> >> +                * This is crucial for standalone multilayered security
> >> +                * policies.  Furthermore, this helps avoid policy writers to
> >> +                * shoot themselves in the foot.
> >> +                */
> >> +               if (is_dom_check && is_superset(child_is_directory,
> >> +                                       layer_masks_dst_parent,
> >> +                                       layer_masks_src_parent,
> >> +                                       layer_masks_child)) {
> >> +                       allowed_dst_parent =
> >> +                               scope_to_request(access_request_dst_parent,
> >> +                                               layer_masks_dst_parent);
> >> +                       allowed_src_parent =
> >> +                               scope_to_request(access_request_src_parent,
> >> +                                               layer_masks_src_parent);
> >> +
> >> +                       /* Stops when all accesses are granted. */
> >> +                       if (allowed_dst_parent && allowed_src_parent)
> >> +                               break;
> >> +
> >> +                       /*
> >> +                        * Downgrades checks from domain handled accesses to
> >> +                        * requested accesses.
> >> +                        */
> >> +                       is_dom_check = false;
> >> +                       access_masked_dst_parent = access_request_dst_parent;
> >> +                       access_masked_src_parent = access_request_src_parent;
> >> +               }
> >> +
> >> +               rule = find_rule(domain, walker_path.dentry);
> >> +               allowed_dst_parent = unmask_layers(rule, access_masked_dst_parent,
> >> +                               layer_masks_dst_parent);
> >> +               allowed_src_parent = unmask_layers(rule, access_masked_src_parent,
> >> +                               layer_masks_src_parent);
> >>
> >> -               allowed = unmask_layers(find_rule(domain, walker_path.dentry),
> >> -                               access_request, &layer_masks);
> >> -               if (allowed)
> >> -                       /* Stops when a rule from each layer grants access. */
> >> +               /* Stops when a rule from each layer grants access. */
> >> +               if (allowed_dst_parent && allowed_src_parent)
> >>                          break;
> >
> > If "(allowed_dst_parent && allowed_src_parent)" is true, you break out
> > of the while loop only to do a path_put(), check the two booleans once
> > more, and then return zero, yes?  Why not just do the path_put() and
> > return zero here?
>
> Correct, that would work, but I prefer not to duplicate the logic of
> granting access if it doesn't make the code more complex, which I think
> is not the case here, and I'm reluctant to duplicate path_get/put()
> calls. This loop break is a small optimization to avoid walking the path
> one more step, and writing it this way looks cleaner and less
> error-prone from my point of view.

I'm a big fan of maintainable code, and since you are the maintainer,
if you prefer this approach I say stick with what you have :)
Mickaël Salaün March 24, 2022, 10:31 a.m. UTC | #6
On 17/03/2022 13:04, Mickaël Salaün wrote:
> 
> On 17/03/2022 02:26, Paul Moore wrote:

[...]

>>> @@ -269,16 +270,188 @@ static inline bool is_nouser_or_private(const 
>>> struct dentry *dentry)
>>>                           
>>> unlikely(IS_PRIVATE(d_backing_inode(dentry))));
>>>   }
>>>
>>> -static int check_access_path(const struct landlock_ruleset *const 
>>> domain,
>>> -               const struct path *const path,
>>> +static inline access_mask_t get_handled_accesses(
>>> +               const struct landlock_ruleset *const domain)
>>> +{
>>> +       access_mask_t access_dom = 0;
>>> +       unsigned long access_bit;
>>
>> Would it be better to declare @access_bit as an access_mask_t type?
>> You're not using any macros like for_each_set_bit() in this function
>> so I believe it should be safe.
> 
> Right, I'll change that.

Well, thinking about it again, access_bit is not an access mask but an 
index in such mask. access_mask_t gives enough space for such index but 
it is definitely not the right semantic. The best type should be size_t, 
but I prefer to stick to unsigned long (used for size_t anyway) for 
consistency with the other access_bit variable types. There is no need 
to use for_each_set_bit() here now but that could change, and I prefer 
to do my best to prevent future issues. ;)
Anyway, I guess the compiler can optimize such code.
diff mbox series

Patch

diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index b3d952067f59..f433d58a58f2 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -21,8 +21,14 @@  struct landlock_ruleset_attr {
 	/**
 	 * @handled_access_fs: Bitmask of actions (cf. `Filesystem flags`_)
 	 * that is handled by this ruleset and should then be forbidden if no
-	 * rule explicitly allow them.  This is needed for backward
-	 * compatibility reasons.
+	 * rule explicitly allow them: it is a deny-by-default list that should
+	 * contain as much Landlock access rights as possible. Indeed, all
+	 * Landlock filesystem access rights that are not part of
+	 * handled_access_fs are allowed.  This is needed for backward
+	 * compatibility reasons.  One exception is the
+	 * LANDLOCK_ACCESS_FS_REFER access right, which is always implicitly
+	 * handled, but must still be explicitly handled to add new rules with
+	 * this access right.
 	 */
 	__u64 handled_access_fs;
 };
@@ -109,6 +115,22 @@  struct landlock_path_beneath_attr {
  * - %LANDLOCK_ACCESS_FS_MAKE_FIFO: Create (or rename or link) a named pipe.
  * - %LANDLOCK_ACCESS_FS_MAKE_BLOCK: Create (or rename or link) a block device.
  * - %LANDLOCK_ACCESS_FS_MAKE_SYM: Create (or rename or link) a symbolic link.
+ * - %LANDLOCK_ACCESS_FS_REFER: Link or rename a file from or to a different
+ *   directory (i.e. reparent a file hierarchy).  This access right is
+ *   available since the second version of the Landlock ABI.  This is also the
+ *   only access right which is always considered handled by any ruleset in
+ *   such a way that reparenting a file hierarchy is always denied by default.
+ *   To avoid privilege escalation, it is not enough to add a rule with this
+ *   access right.  When linking or renaming a file, the destination directory
+ *   hierarchy must also always have the same or a superset of restrictions of
+ *   the source hierarchy.  If it is not the case, or if the domain doesn't
+ *   handle this access right, such actions are denied by default with errno
+ *   set to EXDEV.  Linking also requires a LANDLOCK_ACCESS_FS_MAKE_* access
+ *   right on the destination directory, and renaming also requires a
+ *   LANDLOCK_ACCESS_FS_REMOVE_* access right on the source's (file or
+ *   directory) parent.  Otherwise, such actions are denied with errno set to
+ *   EACCES.  The EACCES errno prevails over EXDEV to let user space
+ *   efficiently deal with an unrecoverable error.
  *
  * .. warning::
  *
@@ -133,5 +155,6 @@  struct landlock_path_beneath_attr {
 #define LANDLOCK_ACCESS_FS_MAKE_FIFO			(1ULL << 10)
 #define LANDLOCK_ACCESS_FS_MAKE_BLOCK			(1ULL << 11)
 #define LANDLOCK_ACCESS_FS_MAKE_SYM			(1ULL << 12)
+#define LANDLOCK_ACCESS_FS_REFER			(1ULL << 13)
 
 #endif /* _UAPI_LINUX_LANDLOCK_H */
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index 3886f9ad1a60..c7c7ce4e7cd5 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -4,6 +4,7 @@ 
  *
  * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
  * Copyright © 2018-2020 ANSSI
+ * Copyright © 2021-2022 Microsoft Corporation
  */
 
 #include <linux/atomic.h>
@@ -269,16 +270,188 @@  static inline bool is_nouser_or_private(const struct dentry *dentry)
 			 unlikely(IS_PRIVATE(d_backing_inode(dentry))));
 }
 
-static int check_access_path(const struct landlock_ruleset *const domain,
-		const struct path *const path,
+static inline access_mask_t get_handled_accesses(
+		const struct landlock_ruleset *const domain)
+{
+	access_mask_t access_dom = 0;
+	unsigned long access_bit;
+
+	for (access_bit = 0; access_bit < LANDLOCK_NUM_ACCESS_FS;
+			access_bit++) {
+		size_t layer_level;
+
+		for (layer_level = 0; layer_level < domain->num_layers;
+				layer_level++) {
+			if (domain->fs_access_masks[layer_level] &
+					BIT_ULL(access_bit)) {
+				access_dom |= BIT_ULL(access_bit);
+				break;
+			}
+		}
+	}
+	return access_dom;
+}
+
+static inline access_mask_t init_layer_masks(
+		const struct landlock_ruleset *const domain,
+		const access_mask_t access_request,
+		layer_mask_t (*const layer_masks)[LANDLOCK_NUM_ACCESS_FS])
+{
+	access_mask_t handled_accesses = 0;
+	size_t layer_level;
+
+	memset(layer_masks, 0, sizeof(*layer_masks));
+	if (WARN_ON_ONCE(!access_request))
+		return 0;
+
+	/* Saves all handled accesses per layer. */
+	for (layer_level = 0; layer_level < domain->num_layers;
+			layer_level++) {
+		const unsigned long access_req = access_request;
+		unsigned long access_bit;
+
+		for_each_set_bit(access_bit, &access_req,
+				ARRAY_SIZE(*layer_masks)) {
+			if (domain->fs_access_masks[layer_level] &
+					BIT_ULL(access_bit)) {
+				(*layer_masks)[access_bit] |=
+					BIT_ULL(layer_level);
+				handled_accesses |= BIT_ULL(access_bit);
+			}
+		}
+	}
+	return handled_accesses;
+}
+
+/*
+ * Check that a destination file hierarchy has more restrictions than a source
+ * file hierarchy.  This is only used for link and rename actions.
+ */
+static inline bool is_superset(bool child_is_directory,
+		const layer_mask_t (*const
+			layer_masks_dst_parent)[LANDLOCK_NUM_ACCESS_FS],
+		const layer_mask_t (*const
+			layer_masks_src_parent)[LANDLOCK_NUM_ACCESS_FS],
+		const layer_mask_t (*const
+			layer_masks_child)[LANDLOCK_NUM_ACCESS_FS])
+{
+	unsigned long access_bit;
+
+	for (access_bit = 0; access_bit < ARRAY_SIZE(*layer_masks_dst_parent);
+			access_bit++) {
+		/* Ignores accesses that only make sense for directories. */
+		if (!child_is_directory && !(BIT_ULL(access_bit) & ACCESS_FILE))
+			continue;
+
+		/*
+		 * Checks if the destination restrictions are a superset of the
+		 * source ones (i.e. inherited access rights without child
+		 * exceptions).
+		 */
+		if ((((*layer_masks_src_parent)[access_bit] & (*layer_masks_child)[access_bit]) |
+					(*layer_masks_dst_parent)[access_bit]) !=
+				(*layer_masks_dst_parent)[access_bit])
+			return false;
+	}
+	return true;
+}
+
+/*
+ * Removes @layer_masks accesses that are not requested.
+ *
+ * Returns true if the request is allowed, false otherwise.
+ */
+static inline bool scope_to_request(const access_mask_t access_request,
+		layer_mask_t (*const layer_masks)[LANDLOCK_NUM_ACCESS_FS])
+{
+	const unsigned long access_req = access_request;
+	unsigned long access_bit;
+
+	if (WARN_ON_ONCE(!layer_masks))
+		return true;
+
+	for_each_clear_bit(access_bit, &access_req, ARRAY_SIZE(*layer_masks))
+		(*layer_masks)[access_bit] = 0;
+	return !memchr_inv(layer_masks, 0, sizeof(*layer_masks));
+}
+
+/*
+ * Returns true if there is at least one access right different than
+ * LANDLOCK_ACCESS_FS_REFER.
+ */
+static inline bool is_eacces(
+		const layer_mask_t (*const
+			layer_masks)[LANDLOCK_NUM_ACCESS_FS],
 		const access_mask_t access_request)
 {
-	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
-	bool allowed = false, has_access = false;
+	unsigned long access_bit;
+	/* LANDLOCK_ACCESS_FS_REFER alone must return -EXDEV. */
+	const unsigned long access_check = access_request &
+		~LANDLOCK_ACCESS_FS_REFER;
+
+	if (!layer_masks)
+		return false;
+
+	for_each_set_bit(access_bit, &access_check, ARRAY_SIZE(*layer_masks)) {
+		if ((*layer_masks)[access_bit])
+			return true;
+	}
+	return false;
+}
+
+/**
+ * check_access_path_dual - Check a source and a destination accesses
+ *
+ * @domain: Domain to check against.
+ * @path: File hierarchy to walk through.
+ * @child_is_directory: Must be set to true if the (original) leaf is a
+ *     directory, false otherwise.
+ * @access_request_dst_parent: Accesses to check, once @layer_masks_dst_parent
+ *     is equal to @layer_masks_src_parent (if any).
+ * @layer_masks_dst_parent: Pointer to a matrix of layer masks per access
+ *     masks, identifying the layers that forbid a specific access.  Bits from
+ *     this matrix can be unset according to the @path walk.  An empty matrix
+ *     means that @domain allows all possible Landlock accesses (i.e. not only
+ *     those identified by @access_request_dst_parent).  This matrix can
+ *     initially refer to domain layer masks and, when the accesses for the
+ *     destination and source are the same, to request layer masks.
+ * @access_request_src_parent: Similar to @access_request_dst_parent but for an
+ *     initial source path request.  Only taken into account if
+ *     @layer_masks_src_parent is not NULL.
+ * @layer_masks_src_parent: Similar to @layer_masks_dst_parent but for an
+ *     initial source path walk.  This can be NULL if only dealing with a
+ *     destination access request (i.e. not a rename nor a link action).
+ * @layer_masks_child: Similar to @layer_masks_src_parent but only for the
+ *     linked or renamed inode (without hierarchy).  This is only used if
+ *     @layer_masks_src_parent is not NULL.
+ *
+ * This helper first checks that the destination has a superset of restrictions
+ * compared to the source (if any) for a common path.  It then checks that the
+ * collected accesses and the remaining ones are enough to allow the request.
+ *
+ * Returns:
+ * - 0 if the access request is granted;
+ * - -EACCES if it is denied because of access right other than
+ *   LANDLOCK_ACCESS_FS_REFER;
+ * - -EXDEV if the renaming or linking would be a privileged escalation
+ *   (according to each layered policies), or if LANDLOCK_ACCESS_FS_REFER is
+ *   not allowed by the source or the destination.
+ */
+static int check_access_path_dual(const struct landlock_ruleset *const domain,
+		const struct path *const path,
+		bool child_is_directory,
+		const access_mask_t access_request_dst_parent,
+		layer_mask_t (*const
+			layer_masks_dst_parent)[LANDLOCK_NUM_ACCESS_FS],
+		const access_mask_t access_request_src_parent,
+		layer_mask_t (*layer_masks_src_parent)[LANDLOCK_NUM_ACCESS_FS],
+		layer_mask_t (*layer_masks_child)[LANDLOCK_NUM_ACCESS_FS])
+{
+	bool allowed_dst_parent = false, allowed_src_parent = false, is_dom_check;
 	struct path walker_path;
-	size_t i;
+	access_mask_t access_masked_dst_parent, access_masked_src_parent;
 
-	if (!access_request)
+	if (!access_request_dst_parent && !access_request_src_parent)
 		return 0;
 	if (WARN_ON_ONCE(!domain || !path))
 		return 0;
@@ -287,22 +460,20 @@  static int check_access_path(const struct landlock_ruleset *const domain,
 	if (WARN_ON_ONCE(domain->num_layers < 1))
 		return -EACCES;
 
-	/* Saves all layers handling a subset of requested accesses. */
-	for (i = 0; i < domain->num_layers; i++) {
-		const unsigned long access_req = access_request;
-		unsigned long access_bit;
-
-		for_each_set_bit(access_bit, &access_req,
-				ARRAY_SIZE(layer_masks)) {
-			if (domain->fs_access_masks[i] & BIT_ULL(access_bit)) {
-				layer_masks[access_bit] |= BIT_ULL(i);
-				has_access = true;
-			}
-		}
+	BUILD_BUG_ON(!layer_masks_dst_parent);
+	if (layer_masks_src_parent) {
+		if (WARN_ON_ONCE(!layer_masks_child))
+			return -EACCES;
+		access_masked_dst_parent = access_masked_src_parent =
+			get_handled_accesses(domain);
+		is_dom_check = true;
+	} else {
+		if (WARN_ON_ONCE(layer_masks_child))
+			return -EACCES;
+		access_masked_dst_parent = access_request_dst_parent;
+		access_masked_src_parent = access_request_src_parent;
+		is_dom_check = false;
 	}
-	/* An access request not handled by the domain is allowed. */
-	if (!has_access)
-		return 0;
 
 	walker_path = *path;
 	path_get(&walker_path);
@@ -312,11 +483,50 @@  static int check_access_path(const struct landlock_ruleset *const domain,
 	 */
 	while (true) {
 		struct dentry *parent_dentry;
+		const struct landlock_rule *rule;
+
+		/*
+		 * If at least all accesses allowed on the destination are
+		 * already allowed on the source, respectively if there is at
+		 * least as much as restrictions on the destination than on the
+		 * source, then we can safely refer files from the source to
+		 * the destination without risking a privilege escalation.
+		 * This is crucial for standalone multilayered security
+		 * policies.  Furthermore, this helps avoid policy writers to
+		 * shoot themselves in the foot.
+		 */
+		if (is_dom_check && is_superset(child_is_directory,
+					layer_masks_dst_parent,
+					layer_masks_src_parent,
+					layer_masks_child)) {
+			allowed_dst_parent =
+				scope_to_request(access_request_dst_parent,
+						layer_masks_dst_parent);
+			allowed_src_parent =
+				scope_to_request(access_request_src_parent,
+						layer_masks_src_parent);
+
+			/* Stops when all accesses are granted. */
+			if (allowed_dst_parent && allowed_src_parent)
+				break;
+
+			/*
+			 * Downgrades checks from domain handled accesses to
+			 * requested accesses.
+			 */
+			is_dom_check = false;
+			access_masked_dst_parent = access_request_dst_parent;
+			access_masked_src_parent = access_request_src_parent;
+		}
+
+		rule = find_rule(domain, walker_path.dentry);
+		allowed_dst_parent = unmask_layers(rule, access_masked_dst_parent,
+				layer_masks_dst_parent);
+		allowed_src_parent = unmask_layers(rule, access_masked_src_parent,
+				layer_masks_src_parent);
 
-		allowed = unmask_layers(find_rule(domain, walker_path.dentry),
-				access_request, &layer_masks);
-		if (allowed)
-			/* Stops when a rule from each layer grants access. */
+		/* Stops when a rule from each layer grants access. */
+		if (allowed_dst_parent && allowed_src_parent)
 			break;
 
 jump_up:
@@ -329,7 +539,7 @@  static int check_access_path(const struct landlock_ruleset *const domain,
 				 * Stops at the real root.  Denies access
 				 * because not all layers have granted access.
 				 */
-				allowed = false;
+				allowed_dst_parent = false;
 				break;
 			}
 		}
@@ -339,7 +549,8 @@  static int check_access_path(const struct landlock_ruleset *const domain,
 			 * access to internal filesystems (e.g. nsfs, which is
 			 * reachable through /proc/<pid>/ns/<namespace>).
 			 */
-			allowed = !!(walker_path.mnt->mnt_flags & MNT_INTERNAL);
+			allowed_dst_parent = !!(walker_path.mnt->mnt_flags &
+					MNT_INTERNAL);
 			break;
 		}
 		parent_dentry = dget_parent(walker_path.dentry);
@@ -347,7 +558,40 @@  static int check_access_path(const struct landlock_ruleset *const domain,
 		walker_path.dentry = parent_dentry;
 	}
 	path_put(&walker_path);
-	return allowed ? 0 : -EACCES;
+
+	if (allowed_dst_parent && allowed_src_parent)
+		return 0;
+
+	/*
+	 * Unfortunately, we cannot prioritize EACCES over EXDEV for all
+	 * RENAME_EXCHANGE cases because it depends on the source and
+	 * destination order.  This could be changed with a new
+	 * security_path_rename hook implementation.
+	 */
+	if (likely(is_eacces(layer_masks_dst_parent, access_request_dst_parent)
+				|| is_eacces(layer_masks_src_parent,
+					access_request_src_parent)))
+		return -EACCES;
+
+	/*
+	 * Gracefully forbids reparenting if the destination directory
+	 * hierarchy is not a superset of restrictions of the source directory
+	 * hierarchy, or if LANDLOCK_ACCESS_FS_REFER is not allowed by the
+	 * source or the destination.
+	 */
+	return -EXDEV;
+}
+
+static inline int check_access_path(const struct landlock_ruleset *const domain,
+		const struct path *const path,
+		access_mask_t access_request)
+{
+	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
+
+	access_request = init_layer_masks(domain, access_request,
+			&layer_masks);
+	return check_access_path_dual(domain, path, d_is_dir(path->dentry),
+			access_request, &layer_masks, 0, NULL, NULL);
 }
 
 static inline int current_check_access_path(const struct path *const path,
@@ -394,6 +638,217 @@  static inline access_mask_t maybe_remove(const struct dentry *const dentry)
 		LANDLOCK_ACCESS_FS_REMOVE_FILE;
 }
 
+/**
+ * collect_domain_accesses - Walk through a file path and collect accesses
+ *
+ * @domain: Domain to check against.
+ * @mnt_root: Last directory to check.
+ * @dir: Directory to start the walk from.
+ * @layer_masks_dom: Where to store the collected accesses.
+ *
+ * This helper is useful to begin a path walk from the @dir directory to a
+ * @mnt_root directory used as a mount point.  This mount point is the common
+ * ancestor between the source and the destination of a renamed and linked
+ * file.  While walking from @dir to @mnt_root, we record all the domain's
+ * allowed accesses in @layer_masks_dom.
+ *
+ * This is similar to check_access_path_dual() but much simpler because it only
+ * handles walking on the same mount point and only check one set of accesses.
+ *
+ * Returns:
+ * - true if all the domain access rights are allowed for @dir;
+ * - false if the walk reached @mnt_root.
+ */
+static bool collect_domain_accesses(
+		const struct landlock_ruleset *const domain,
+		const struct dentry *const mnt_root, struct dentry *dir,
+		layer_mask_t (*const layer_masks_dom)[LANDLOCK_NUM_ACCESS_FS])
+{
+	unsigned long access_dom;
+	bool ret = false;
+
+	BUILD_BUG_ON(!layer_masks_dom);
+	if (WARN_ON_ONCE(!domain || !mnt_root || !dir))
+		return true;
+	if (is_nouser_or_private(dir))
+		return true;
+
+	access_dom = init_layer_masks(domain, LANDLOCK_MASK_ACCESS_FS,
+			layer_masks_dom);
+
+	dget(dir);
+	while (true) {
+		struct dentry *parent_dentry;
+
+		/* Gets all layers allowing all domain accesses. */
+		if (unmask_layers(find_rule(domain, dir), access_dom,
+					layer_masks_dom)) {
+			/*
+			 * Stops when all handled accesses are allowed by at
+			 * least one rule in each layer.
+			 */
+			ret = true;
+			break;
+		}
+
+		/* We should not reach a root other than @mnt_root. */
+		if (dir == mnt_root || WARN_ON_ONCE(IS_ROOT(dir))) {
+			ret = false;
+			break;
+		}
+
+		parent_dentry = dget_parent(dir);
+		dput(dir);
+		dir = parent_dentry;
+	}
+	dput(dir);
+	return ret;
+}
+
+/**
+ * current_check_refer_path - Check if a rename or link action is allowed
+ *
+ * @old_dentry: File or directory requested to be moved or linked.
+ * @new_dir: Destination parent directory.
+ * @new_dentry: Destination file or directory.
+ * @removable: Sets to true if it is a rename operation.
+ *
+ * Because of its unprivileged constraints, Landlock relies on file hierarchies
+ * (and not only inodes) to tie access rights to files.  Being able to link or
+ * rename a file hierarchy brings some challenges.  Indeed, moving or linking a
+ * file (i.e. creating a new reference to an inode) can have an impact on the
+ * actions allowed for a set of files if it would change its parent directory
+ * (i.e. reparenting).
+ *
+ * To avoid trivial access right bypasses, Landlock first checks if the file or
+ * directory requested to be moved would gain new access rights inherited from
+ * its new hierarchy.  Before returning any error, Landlock then checks that
+ * the parent source hierarchy and the destination hierarchy would allow the
+ * link or rename action.  If it is not the case, an error with EACCES is
+ * returned to inform user space that there is no way to remove or create the
+ * requested source file type.  If it should be allowed but the new inherited
+ * access rights would be greater than the source access rights, then the
+ * kernel returns an error with EXDEV.  Prioritizing EACCES over EXDEV enables
+ * user space to abort the whole operation if there is no way to do it, or to
+ * manually copy the source to the destination if this remains allowed, e.g.
+ * because file creation is allowed on the destination directory but not direct
+ * linking.
+ *
+ * To achieve this goal, the kernel needs to compare two file hierarchies: the
+ * one identifying the source file or directory (including itself), and the
+ * destination one.  This can be seen as a multilayer partial ordering problem.
+ * The kernel walks through these paths and collect in a matrix the access
+ * rights that are denied per layer.  These matrices are then compared to see
+ * if the destination one has more (or the same) restrictions as the source
+ * one.  If this is the case, the requested action will not return EXDEV, which
+ * doesn't mean the action is allowed.  The parent hierarchy of the source
+ * (i.e. parent directory), and the destination hierarchy must also be checked
+ * to verify that they explicitly allow such action (i.e.  referencing,
+ * creation and potentially removal rights).  The kernel implementation is then
+ * required to rely on three matrices of access rights: one for the source file
+ * or directory (i.e. the child), one for the source parent hierarchy and one
+ * for the destination hierarchy.  These ephemeral matrices take some space on
+ * the stack, which limits the number of layers to a deemed reasonable number:
+ * 16.
+ *
+ * Returns:
+ * - 0 if access is allowed;
+ * - -EXDEV if @old_dentry would inherit new access rights from @new_dir;
+ * - -EACCES if file removal or creation is denied.
+ */
+static int current_check_refer_path(struct dentry *const old_dentry,
+		const struct path *const new_dir,
+		struct dentry *const new_dentry,
+		bool removable)
+{
+	const struct landlock_ruleset *const dom =
+		landlock_get_current_domain();
+	bool allow_dst_parent, allow_src_parent;
+	access_mask_t access_request_dst_parent, access_request_src_parent,
+		      access_child;
+	struct path mnt_dir;
+	layer_mask_t layer_masks_dst_parent[LANDLOCK_NUM_ACCESS_FS],
+	    layer_masks_src_parent[LANDLOCK_NUM_ACCESS_FS],
+	    layer_masks_child[LANDLOCK_NUM_ACCESS_FS];
+
+	if (!dom)
+		return 0;
+	if (WARN_ON_ONCE(dom->num_layers < 1))
+		return -EACCES;
+	if (unlikely(d_is_negative(old_dentry)))
+		return -ENOENT;
+
+	access_request_dst_parent =
+		get_mode_access(d_backing_inode(old_dentry)->i_mode);
+	access_request_src_parent = 0;
+	if (removable) {
+		access_request_dst_parent |= maybe_remove(new_dentry);
+		access_request_src_parent |= maybe_remove(old_dentry);
+	}
+
+	/* The mount points are the same for old and new paths, cf. EXDEV. */
+	if (old_dentry->d_parent == new_dir->dentry) {
+		/*
+		 * The LANDLOCK_ACCESS_FS_REFER access right is not required
+		 * for same-directory referer (i.e. no reparenting).
+		 */
+		access_request_dst_parent = init_layer_masks(dom,
+				access_request_dst_parent | access_request_src_parent,
+				&layer_masks_dst_parent);
+		return check_access_path_dual(dom, new_dir, d_is_dir(old_dentry),
+				access_request_dst_parent, &layer_masks_dst_parent,
+				0, NULL, NULL);
+	}
+
+	/* Backward compatibility: no reparenting support. */
+	if (!(get_handled_accesses(dom) & LANDLOCK_ACCESS_FS_REFER))
+		return -EXDEV;
+
+	access_request_src_parent |= LANDLOCK_ACCESS_FS_REFER;
+	access_request_dst_parent |= LANDLOCK_ACCESS_FS_REFER;
+
+	/* Saves the common mount point. */
+	mnt_dir.mnt = new_dir->mnt;
+	mnt_dir.dentry = new_dir->mnt->mnt_root;
+
+	/* new_dir->dentry is equal to new_dentry->d_parent */
+	allow_dst_parent = collect_domain_accesses(dom, mnt_dir.dentry,
+			new_dir->dentry, &layer_masks_dst_parent);
+	allow_src_parent = collect_domain_accesses(dom, mnt_dir.dentry,
+			old_dentry->d_parent, &layer_masks_src_parent);
+
+	if (allow_src_parent) {
+		/* No need to go further if everything is allowed. */
+		if (allow_dst_parent)
+			return 0;
+
+		/* @new_dentry can only gain more restrictions. */
+		if (scope_to_request(access_request_dst_parent,
+					&layer_masks_dst_parent))
+			return 0;
+
+		return check_access_path_dual(dom, &mnt_dir, d_is_dir(old_dentry),
+				access_request_dst_parent, &layer_masks_dst_parent,
+				0, NULL, NULL);
+	}
+
+	/*
+	 * To be able to compare source and destination domain access rights,
+	 * take into account the @old_dentry access rights aggregated with its
+	 * parent access rights.  This will be useful to compare with the
+	 * destination parent access rights.
+	 */
+	access_child = init_layer_masks(dom, LANDLOCK_MASK_ACCESS_FS,
+			&layer_masks_child);
+	unmask_layers(find_rule(dom, old_dentry), access_child,
+			&layer_masks_child);
+
+	return check_access_path_dual(dom, &mnt_dir, d_is_dir(old_dentry),
+			access_request_dst_parent, &layer_masks_dst_parent,
+			access_request_src_parent, &layer_masks_src_parent,
+			&layer_masks_child);
+}
+
 /* Inode hooks */
 
 static void hook_inode_free_security(struct inode *const inode)
@@ -587,31 +1042,11 @@  static int hook_sb_pivotroot(const struct path *const old_path,
 
 /* Path hooks */
 
-/*
- * Creating multiple links or renaming may lead to privilege escalations if not
- * handled properly.  Indeed, we must be sure that the source doesn't gain more
- * privileges by being accessible from the destination.  This is getting more
- * complex when dealing with multiple layers.  The whole picture can be seen as
- * a multilayer partial ordering problem.  A future version of Landlock will
- * deal with that.
- */
 static int hook_path_link(struct dentry *const old_dentry,
 		const struct path *const new_dir,
 		struct dentry *const new_dentry)
 {
-	const struct landlock_ruleset *const dom =
-		landlock_get_current_domain();
-
-	if (!dom)
-		return 0;
-	/* The mount points are the same for old and new paths, cf. EXDEV. */
-	if (old_dentry->d_parent != new_dir->dentry)
-		/* Gracefully forbids reparenting. */
-		return -EXDEV;
-	if (unlikely(d_is_negative(old_dentry)))
-		return -ENOENT;
-	return check_access_path(dom, new_dir,
-			get_mode_access(d_backing_inode(old_dentry)->i_mode));
+	return current_check_refer_path(old_dentry, new_dir, new_dentry, false);
 }
 
 static int hook_path_rename(const struct path *const old_dir,
@@ -619,21 +1054,8 @@  static int hook_path_rename(const struct path *const old_dir,
 		const struct path *const new_dir,
 		struct dentry *const new_dentry)
 {
-	const struct landlock_ruleset *const dom =
-		landlock_get_current_domain();
-
-	if (!dom)
-		return 0;
-	/* The mount points are the same for old and new paths, cf. EXDEV. */
-	if (old_dir->dentry != new_dir->dentry)
-		/* Gracefully forbids reparenting. */
-		return -EXDEV;
-	if (unlikely(d_is_negative(old_dentry)))
-		return -ENOENT;
-	/* RENAME_EXCHANGE is handled because directories are the same. */
-	return check_access_path(dom, old_dir, maybe_remove(old_dentry) |
-			maybe_remove(new_dentry) |
-			get_mode_access(d_backing_inode(old_dentry)->i_mode));
+	/* old_dir refers to old_dentry->d_parent and new_dir->mnt */
+	return current_check_refer_path(old_dentry, new_dir, new_dentry, true);
 }
 
 static int hook_path_mkdir(const struct path *const dir,
diff --git a/security/landlock/limits.h b/security/landlock/limits.h
index 126d1ec04d34..26c8166d0265 100644
--- a/security/landlock/limits.h
+++ b/security/landlock/limits.h
@@ -16,7 +16,7 @@ 
 #define LANDLOCK_MAX_NUM_LAYERS		16
 #define LANDLOCK_MAX_NUM_RULES		U32_MAX
 
-#define LANDLOCK_LAST_ACCESS_FS		LANDLOCK_ACCESS_FS_MAKE_SYM
+#define LANDLOCK_LAST_ACCESS_FS		LANDLOCK_ACCESS_FS_REFER
 #define LANDLOCK_MASK_ACCESS_FS		((LANDLOCK_LAST_ACCESS_FS << 1) - 1)
 #define LANDLOCK_NUM_ACCESS_FS		__const_hweight64(LANDLOCK_MASK_ACCESS_FS)
 
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index 32396962f04d..fa14f09b6bf4 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -128,7 +128,7 @@  static const struct file_operations ruleset_fops = {
 	.write = fop_dummy_write,
 };
 
-#define LANDLOCK_ABI_VERSION	1
+#define LANDLOCK_ABI_VERSION	2
 
 /**
  * sys_landlock_create_ruleset - Create a new ruleset
diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
index ca40abe9daa8..99aab93d50e1 100644
--- a/tools/testing/selftests/landlock/base_test.c
+++ b/tools/testing/selftests/landlock/base_test.c
@@ -67,7 +67,7 @@  TEST(abi_version) {
 	const struct landlock_ruleset_attr ruleset_attr = {
 		.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
 	};
-	ASSERT_EQ(1, landlock_create_ruleset(NULL, 0,
+	ASSERT_EQ(2, landlock_create_ruleset(NULL, 0,
 				LANDLOCK_CREATE_RULESET_VERSION));
 
 	ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 0,
diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
index 1ac41bfa7382..0568d1193492 100644
--- a/tools/testing/selftests/landlock/fs_test.c
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -381,7 +381,7 @@  TEST_F_FORK(layout1, inval)
 	LANDLOCK_ACCESS_FS_WRITE_FILE | \
 	LANDLOCK_ACCESS_FS_READ_FILE)
 
-#define ACCESS_LAST LANDLOCK_ACCESS_FS_MAKE_SYM
+#define ACCESS_LAST LANDLOCK_ACCESS_FS_REFER
 
 #define ACCESS_ALL ( \
 	ACCESS_FILE | \
@@ -394,6 +394,7 @@  TEST_F_FORK(layout1, inval)
 	LANDLOCK_ACCESS_FS_MAKE_SOCK | \
 	LANDLOCK_ACCESS_FS_MAKE_FIFO | \
 	LANDLOCK_ACCESS_FS_MAKE_BLOCK | \
+	LANDLOCK_ACCESS_FS_MAKE_SYM | \
 	ACCESS_LAST)
 
 TEST_F_FORK(layout1, file_access_rights)