From patchwork Mon May 16 21:06:55 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Mason X-Patchwork-Id: 9106001 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 852889F30C for ; Mon, 16 May 2016 21:07:31 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 5213820172 for ; Mon, 16 May 2016 21:07:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 265F120165 for ; Mon, 16 May 2016 21:07:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751433AbcEPVHY (ORCPT ); Mon, 16 May 2016 17:07:24 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:65179 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751014AbcEPVHW (ORCPT ); Mon, 16 May 2016 17:07:22 -0400 Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.16.0.11/8.16.0.11) with SMTP id u4GL41gi023700; Mon, 16 May 2016 14:07:04 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fb.com; h=date : from : to : cc : subject : message-id : mime-version : content-type; s=facebook; bh=6oPWYS94TC6BL2cOBn5TBzeXi9znFXOxLPW3tz4le5A=; b=OyqXd+O3Y2sDhgAnNnU5NKCnaVFiYgokqU/hdau7ix/tMcauhfLQi6sVeEGkP6d1mN5R edZ6fRoNXNFeDXxuSdOoMAt4IBD9xjpT3izCGvLgKIkQAS2z/aP7HmZJAs7fU9efruDA HAbcqW2Enpp5mayzIhUM/khM4Wy1Sk9YHyI= Received: from mail.thefacebook.com ([199.201.64.23]) by m0001303.ppops.net with ESMTP id 22wypavuqy-2 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT); Mon, 16 May 2016 14:07:04 -0700 Received: from localhost (192.168.52.123) by mail.thefacebook.com (192.168.16.12) with Microsoft SMTP Server (TLS) id 14.3.294.0; Mon, 16 May 2016 14:06:57 -0700 Date: Mon, 16 May 2016 17:06:55 -0400 From: Chris Mason To: , CC: Jakob =?iso-8859-1?Q?Sch=FCrz?= , "Christian Borntraeger" , Adam Borowski , Markus Trippelsdorf Subject: [PATCH] Btrfs: fix handling of faults from btrfs_copy_from_user Message-ID: <20160516210655.sfsjtj5oleeksaxq@floor.thefacebook.com> Mail-Followup-To: Chris Mason , linux-btrfs@vger.kernel.org, dsj@fb.com, Jakob =?iso-8859-1?Q?Sch=FCrz?= , Christian Borntraeger , Adam Borowski , Markus Trippelsdorf MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.23.1 (2014-03-12) X-Originating-IP: [192.168.52.123] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-05-16_09:, , signatures=0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi everyone, I've tried to cc most of the recent reports of this warning. It was pretty hard to track down, but I've got a test program for it now. My goal is to change xfs_io to add the madvise loop under --do-horrible-things, instead of adding yet another special doio.c function to xfstests. I've run this through both my test case, and Dave's trinity code. And now for the patch: When btrfs_copy_from_user isn't able to copy all of the pages, we need to adjust our accounting to reflect the work that was actually done. Commit 2e78c927d79 changed around the decisions a little and we ended up skipping the accounting adjustments some of the time. This commit makes sure that when we don't copy anything at all, we still hop into the adjustments, and switches to release_bytes instead of write_bytes, since write_bytes isn't aligned. The accounting errors led to warnings during btrfs_destroy_inode: [ 70.847532] WARNING: CPU: 10 PID: 514 at fs/btrfs/inode.c:9350 btrfs_destroy_inode+0x2b3/0x2c0 [ 70.847536] Modules linked in: i2c_piix4 virtio_net i2c_core input_leds button led_class serio_raw acpi_cpufreq sch_fq_codel autofs4 virtio_blk [ 70.847538] CPU: 10 PID: 514 Comm: umount Tainted: G W 4.6.0-rc6_00062_g2997da1-dirty #23 [ 70.847539] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 04/01/2014 [ 70.847542] 0000000000000000 ffff880ff5cafab8 ffffffff8149d5e9 0000000000000202 [ 70.847543] 0000000000000000 0000000000000000 0000000000000000 ffff880ff5cafb08 [ 70.847547] ffffffff8107bdfd ffff880ff5cafaf8 000024868120013d ffff880ff5cafb28 [ 70.847547] Call Trace: [ 70.847550] [] dump_stack+0x51/0x78 [ 70.847551] [] __warn+0xfd/0x120 [ 70.847553] [] warn_slowpath_null+0x1d/0x20 [ 70.847555] [] btrfs_destroy_inode+0x2b3/0x2c0 [ 70.847556] [] ? __destroy_inode+0x71/0x140 [ 70.847558] [] destroy_inode+0x43/0x70 [ 70.847559] [] ? wake_up_bit+0x2f/0x40 [ 70.847560] [] evict+0x148/0x1d0 [ 70.847562] [] ? start_transaction+0x3de/0x460 [ 70.847564] [] dispose_list+0x59/0x80 [ 70.847565] [] evict_inodes+0x180/0x190 [ 70.847566] [] ? __sync_filesystem+0x3f/0x50 [ 70.847568] [] generic_shutdown_super+0x48/0x100 [ 70.847569] [] ? woken_wake_function+0x20/0x20 [ 70.847571] [] kill_anon_super+0x16/0x30 [ 70.847573] [] btrfs_kill_super+0x1e/0x130 [ 70.847574] [] deactivate_locked_super+0x4e/0x90 [ 70.847576] [] deactivate_super+0x51/0x70 [ 70.847577] [] cleanup_mnt+0x3f/0x80 [ 70.847579] [] __cleanup_mnt+0x12/0x20 [ 70.847581] [] task_work_run+0x68/0xa0 [ 70.847582] [] exit_to_usermode_loop+0xd6/0xe0 [ 70.847583] [] do_syscall_64+0xbd/0x170 [ 70.847586] [] entry_SYSCALL64_slow_path+0x25/0x25 This is the test program I used to force short returns from btrfs_copy_from_user void *dontneed(void *arg) { char *p = arg; int ret; while(1) { ret = madvise(p, BUFSIZE/4, MADV_DONTNEED); if (ret) { perror("madvise"); exit(1); } } } int main(int ac, char **av) { int ret; int fd; char *filename; unsigned long offset; char *buf; int i; pthread_t tid; if (ac != 2) { fprintf(stderr, "usage: dammitdave filename\n"); exit(1); } buf = mmap(NULL, BUFSIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); if (buf == MAP_FAILED) { perror("mmap"); exit(1); } memset(buf, 'a', BUFSIZE); filename = av[1]; ret = pthread_create(&tid, NULL, dontneed, buf); if (ret) { fprintf(stderr, "error %d from pthread_create\n", ret); exit(1); } ret = pthread_detach(tid); if (ret) { fprintf(stderr, "pthread detach failed %d\n", ret); exit(1); } while (1) { fd = open(filename, O_RDWR | O_CREAT, 0600); if (fd < 0) { perror("open"); exit(1); } for (i = 0; i < ROUNDS; i++) { int this_write = BUFSIZE; offset = rand() % MAXSIZE; ret = pwrite(fd, buf, this_write, offset); if (ret < 0) { perror("pwrite"); exit(1); } else if (ret != this_write) { fprintf(stderr, "short write to %s offset %lu ret %d\n", filename, offset, ret); exit(1); } if (i == ROUNDS - 1) { ret = sync_file_range(fd, offset, 4096, SYNC_FILE_RANGE_WRITE); if (ret < 0) { perror("sync_file_range"); exit(1); } } } ret = ftruncate(fd, 0); if (ret < 0) { perror("ftruncate"); exit(1); } ret = close(fd); if (ret) { perror("close"); exit(1); } ret = unlink(filename); if (ret) { perror("unlink"); exit(1); } } return 0; } Signed-off-by: Chris Mason Reported-by: Dave Jones Fixes: 2e78c927d79333f299a8ac81c2fd2952caeef335 cc: stable@vger.kernel.org # v4.6 Tested-by: Adam Borowski --- fs/btrfs/file.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 3ed795e..ab7c70e 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1597,6 +1597,13 @@ again: copied = btrfs_copy_from_user(pos, write_bytes, pages, i); + num_sectors = BTRFS_BYTES_TO_BLKS(root->fs_info, + reserve_bytes); + dirty_sectors = round_up(copied + sector_offset, + root->sectorsize); + dirty_sectors = BTRFS_BYTES_TO_BLKS(root->fs_info, + dirty_sectors); + /* * if we have trouble faulting in the pages, fall * back to one page at a time @@ -1606,6 +1613,7 @@ again: if (copied == 0) { force_page_uptodate = true; + dirty_sectors = 0; dirty_pages = 0; } else { force_page_uptodate = false; @@ -1616,20 +1624,19 @@ again: /* * If we had a short copy we need to release the excess delaloc * bytes we reserved. We need to increment outstanding_extents - * because btrfs_delalloc_release_space will decrement it, but + * because btrfs_delalloc_release_space and + * btrfs_delalloc_release_metadata will decrement it, but * we still have an outstanding extent for the chunk we actually * managed to copy. */ - num_sectors = BTRFS_BYTES_TO_BLKS(root->fs_info, - reserve_bytes); - dirty_sectors = round_up(copied + sector_offset, - root->sectorsize); - dirty_sectors = BTRFS_BYTES_TO_BLKS(root->fs_info, - dirty_sectors); - if (num_sectors > dirty_sectors) { - release_bytes = (write_bytes - copied) - & ~((u64)root->sectorsize - 1); + /* + * we round down because we don't want to count + * any partial blocks actually send through the + * IO machines + */ + release_bytes = round_down(release_bytes - copied, + root->sectorsize); if (copied > 0) { spin_lock(&BTRFS_I(inode)->lock); BTRFS_I(inode)->outstanding_extents++;