Message ID | 20150323023738.8161.97062.stgit@notabene.brown (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Mar 23, 2015 at 01:37:38PM +1100, NeilBrown wrote: > lustre's ->follow_link() uses a lot of stack space and so > need to limit symlink recursion based on stack size. > > It currently tests current->link_count, but that will soon > become private to fs/namei.c. > So instead base on actual available stack space. > This patch aborts recursive symlinks in less than 2K of space > is available. This seems consistent with current code, but > hasn't been tested. BTW, in the best case that logics is fishy. We have "up to 5 levels with 4Kb stack and up to 7 with 8Kb one". Could somebody manage to dig out the reasons for such limits? Preferably along with the kernel version where the overflows had been observed, both for 4K and 8K cases. I'm very tempted to rip that thing out in the "kill link_path_walk() recursion completely" series... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Apr 17, 2015, at 9:01 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Mon, Mar 23, 2015 at 01:37:38PM +1100, NeilBrown wrote: >> lustre's ->follow_link() uses a lot of stack space and so >> need to limit symlink recursion based on stack size. >> >> It currently tests current->link_count, but that will soon >> become private to fs/namei.c. >> So instead base on actual available stack space. >> This patch aborts recursive symlinks in less than 2K of space >> is available. This seems consistent with current code, but >> hasn't been tested. > > BTW, in the best case that logics is fishy. We have "up to 5 levels with > 4Kb stack and up to 7 with 8Kb one". Could somebody manage to dig out > the reasons for such limits? Preferably along with the kernel version > where the overflows had been observed, both for 4K and 8K cases. Hi Al, I checked in our bug history, and the 8KB stack limit was hit with older clients running racer or our recursive-symlink regression test: 2.6.18: https://bugzilla.lustre.org/show_bug.cgi?id=18533#c0 2.6.16: https://bugzilla.lustre.org/show_bug.cgi?id=19380#c11 The 4KB stack limit for clients has existed a lot longer than that, but CONFIG_4KSTACKS was not the default on all kernels for a while. The following bug showed a stack overflow with 2.6.22 kernels: https://bugzilla.lustre.org/show_bug.cgi?id=17379#c0 Prior to 2.6.16 when we needed client-side kernel patches and a custom kernel build, we always forced the CONFIG_4KSTACKS off in the config. In general, Lustre is a heavy stack user because it is a network filesystem, and doubly so if the Lustre client is re-exporting the filesystem to NFS clients. I'd be happy if symlink recursion was removed completely, but so far the added symlink recursion limit hasn't been a problem for Lustre users. Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Apr 19, 2015 at 02:57:07PM -0600, Andreas Dilger wrote: > I'd be happy if symlink recursion was removed completely, but so far the > added symlink recursion limit hasn't been a problem for Lustre users. Well, it's gone in my tree; I've just pushed the current queue to vfs.git#link_path_walk. Right now I'm looking at the unholy mess gcc does to stack footprint with inlining - the last commit in there is a result of exactly that. Inlines in there really need tuning ;-/ -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Apr 19, 2015 at 10:33:48PM +0100, Al Viro wrote: > On Sun, Apr 19, 2015 at 02:57:07PM -0600, Andreas Dilger wrote: > > > I'd be happy if symlink recursion was removed completely, but so far the > > added symlink recursion limit hasn't been a problem for Lustre users. > > Well, it's gone in my tree; I've just pushed the current queue to > vfs.git#link_path_walk. Right now I'm looking at the unholy mess > gcc does to stack footprint with inlining - the last commit in there > is a result of exactly that. Inlines in there really need tuning ;-/ FWIW, right now in my tree the maximal stack footprint of call chains through fs/namei.c (amd64, my test config, including aushit) is 1408 bytes. Goes via rename() -> renameat2() -> user_path_parent() -> filename_lookup() -> path_lookupa() -> path_init() or follow_link() -> link_path_walk() -> walk_component() -> lookup_fast() -> follow_managed(). And that does *not* depend upon the depth of symlink nesting. The maximal depth when calling any methods present in lustre is 1328; similar path, except that its tail goes like walk_component() -> __lookup_hash() -> lookup_dcache() -> ->d_revalidate(). Again, independent from the symlink nesting depth. ->lookup() calls are at 1296 maximum, similar call chain, for ->permission() it's 1152, for ->follow_link() - 1088. For mainline it's _much_ worse. Maximal depth on the same config is 2986 bytes (with 8 levels of nesting) and each level costs 208 bytes. ->d_revalidate() is at 2880; for lustre it would be reduced a bit (again, 208 per level), but if you have any symlinks at all, you will end up deeper than in non-recursive variant. And frankly, the most scary thing in there isn't lustre-related - it's NFS4 (and AFS, etc.), where ->d_automount() might get called on _that_ depth. With quite a bit of stack footprint of its own - we are doing NFS referral handling. With almost 3Kb of stack already eaten up. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c index 686b6a574cc5..ba37eb6b29dc 100644 --- a/drivers/staging/lustre/lustre/llite/symlink.c +++ b/drivers/staging/lustre/lustre/llite/symlink.c @@ -120,20 +120,27 @@ failed: static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd) { + unsigned long avail_space; struct inode *inode = dentry->d_inode; struct ptlrpc_request *request = NULL; int rc; char *symname = NULL; CDEBUG(D_VFSTRACE, "VFS Op\n"); - /* Limit the recursive symlink depth to 5 instead of default - * 8 links when kernel has 4k stack to prevent stack overflow. - * For 8k stacks we need to limit it to 7 for local servers. */ - if (THREAD_SIZE < 8192 && current->link_count >= 6) { - rc = -ELOOP; - } else if (THREAD_SIZE == 8192 && current->link_count >= 8) { + /* Limit the recursive symlink depth. + * Previously limited to 5 instead of default 8 links when + * kernel has 4k stack to prevent stack overflow. + * For 8k stacks, was limited to 7 for local servers. + * Now limited to ensure 2K of stack is available for lustre. + */ +#ifdef CONFIG_STACK_GROWSUP + avail_space = end_of_stack(current) - &avail_space; +#else + avail_space = &avail_space - end_of_stack(current); +#endif + if (avail_space < 2048) rc = -ELOOP; - } else { + else { ll_inode_size_lock(inode); rc = ll_readlink_internal(inode, &request, &symname); ll_inode_size_unlock(inode);
lustre's ->follow_link() uses a lot of stack space and so need to limit symlink recursion based on stack size. It currently tests current->link_count, but that will soon become private to fs/namei.c. So instead base on actual available stack space. This patch aborts recursive symlinks in less than 2K of space is available. This seems consistent with current code, but hasn't been tested. Signed-off-by: NeilBrown <neilb@suse.de> --- drivers/staging/lustre/lustre/llite/symlink.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html