From patchwork Mon Mar 22 02:32:39 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: jim owens X-Patchwork-Id: 87326 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter.kernel.org (8.14.3/8.14.3) with ESMTP id o2M2WjG2006855 for ; Mon, 22 Mar 2010 02:32:45 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753652Ab0CVCco (ORCPT ); Sun, 21 Mar 2010 22:32:44 -0400 Received: from mail-iw0-f182.google.com ([209.85.223.182]:54249 "EHLO mail-iw0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753558Ab0CVCcn (ORCPT ); Sun, 21 Mar 2010 22:32:43 -0400 Received: by iwn12 with SMTP id 12so3954773iwn.21 for ; Sun, 21 Mar 2010 19:32:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:content-type :content-transfer-encoding; bh=HauHC1ejW0kiEXpIvjX31pAPtUj9UpXUvcgEZmhtYVA=; b=Xcvv+SWuLK3udC2vs8fouDdxc+EBbUDJ8JKjad2s45rjVf+YXgmUGv398i0WhCC0TY F3psLTl7AW57jbPRwZt9l9Ik9TtVWF5w/QVvU8QX+iKD0Cg7Ygwsz2t9LTxMOoZo8wiq SH5mbNXGhXaNFQLJrwUD71IlFZR2ioGcazX20= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; b=wJe/xA88klbBbw7Qoma5dzkfLVQXu9Qql2RN+f7cfZt/wErEF0gefcerMt0tRISjER b95dnT+F9IvS5jcCXIOxoCUo80NnzjgUHCds/SwtiU1Ovyx2CFjWJTAFkJiU5MEz2Bui qkm2qlsahMYMgz4+fQsUF1qyneEGYNJJ14BoY= Received: by 10.231.152.75 with SMTP id f11mr1470328ibw.50.1269225161447; Sun, 21 Mar 2010 19:32:41 -0700 (PDT) Received: from [192.168.0.97] (c-24-147-40-65.hsd1.nh.comcast.net [24.147.40.65]) by mx.google.com with ESMTPS id a1sm1372645ibs.12.2010.03.21.19.32.40 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 21 Mar 2010 19:32:41 -0700 (PDT) Message-ID: <4BA6D6C7.3030708@gmail.com> Date: Sun, 21 Mar 2010 22:32:39 -0400 From: jim owens User-Agent: Thunderbird 2.0.0.24 (X11/20100317) MIME-Version: 1.0 To: linux-btrfs Subject: [PATCH] Btrfs: change direct I/O read to not use i_mutex. Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.3 (demeter.kernel.org [140.211.167.41]); Mon, 22 Mar 2010 02:32:45 +0000 (UTC) diff --git a/fs/btrfs/dio.c b/fs/btrfs/dio.c index b6934be..c930ff5 100644 --- a/fs/btrfs/dio.c +++ b/fs/btrfs/dio.c @@ -435,14 +435,81 @@ static void btrfs_dio_write(struct btrfs_diocb *diocb) { } +/* verify that we have locked everything we need to do the read and + * have pushed the ordered data into the btree so the extent is valid + */ +static void btrfs_dio_safe_to_read(struct btrfs_diocb *diocb, + struct extent_map *em, u64 *lockend, + u64 *data_len, int *safe_to_read) +{ + struct extent_io_tree *io_tree = &BTRFS_I(diocb->inode)->io_tree; + struct btrfs_ordered_extent *ordered; + u64 stop; + + /* must ensure the whole compressed extent is valid on each loop + * as we don't know the final extent size until we look it up + */ + if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags) && + (diocb->lockstart > em->start || *lockend <= em->start + em->len)) { + unlock_extent(io_tree, diocb->lockstart, *lockend, GFP_NOFS); + diocb->lockstart = em->start; + *lockend = min(*lockend, em->start + em->len - 1); + *safe_to_read = 0; + return; + } + + /* one test on first loop covers all extents if no concurrent writes */ + if (*safe_to_read) + return; + + ordered = btrfs_lookup_first_ordered_extent(diocb->inode, + diocb->lockstart, *lockend + 1 - diocb->lockstart); + if (!ordered) { + *safe_to_read = 1; + return; + } + + /* we checked everything to lockend which might cover multiple extents + * in the hope that we could do the whole read with one locking. that + * won't happen now, but we can read the first extent (or part of it + * for uncompressed data) if what we need is before this ordered data. + * we must have the whole extent valid to read any compressed data, + * while we can read a single block of valid uncompressed data. + */ + if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) + stop = em->start + em->len; + else + stop = diocb->lockstart + + BTRFS_I(diocb->inode)->root->sectorsize; + + if (ordered->file_offset < stop) { + unlock_extent(io_tree, diocb->lockstart, *lockend, GFP_NOFS); + btrfs_start_ordered_extent(diocb->inode, ordered, 1); + btrfs_put_ordered_extent(ordered); + *safe_to_read = 0; + return; + } + + /* do the part of the data that is valid to read now with the + * remainder unlocked so that ordered data can flush in parallel + */ + unlock_extent(io_tree, ordered->file_offset, *lockend, GFP_NOFS); + *lockend = ordered->file_offset - 1; + *data_len = ordered->file_offset - diocb->start; + btrfs_put_ordered_extent(ordered); + + *safe_to_read = 1; + return; +} + static void btrfs_dio_read(struct btrfs_diocb *diocb) { struct extent_io_tree *io_tree = &BTRFS_I(diocb->inode)->io_tree; u64 end = diocb->terminate; /* copy because reaper changes it */ u64 lockend; u64 data_len; + int safe_to_read; int err = 0; - int loop = 0; u32 blocksize = BTRFS_I(diocb->inode)->root->sectorsize; /* expand lock region to include what we read to validate checksum */ @@ -450,42 +517,25 @@ static void btrfs_dio_read(struct btrfs_diocb *diocb) lockend = ALIGN(end, blocksize) - 1; getlock: - mutex_lock(&diocb->inode->i_mutex); + /* writeout everything we read for checksum or compressed extents */ + filemap_write_and_wait_range(diocb->inode->i_mapping, + diocb->lockstart, lockend); + lock_extent(io_tree, diocb->lockstart, lockend, GFP_NOFS); - /* ensure writeout and btree update on everything - * we might read for checksum or compressed extents - */ - data_len = lockend + 1 - diocb->lockstart; - err = btrfs_wait_ordered_range(diocb->inode, - diocb->lockstart, data_len); - if (err) { - diocb->error = err; - mutex_unlock(&diocb->inode->i_mutex); - return; - } - data_len = i_size_read(diocb->inode); - if (data_len < end) - end = data_len; - if (end <= diocb->start) { - mutex_unlock(&diocb->inode->i_mutex); - return; /* 0 is returned past EOF */ - } - if (!loop) { - loop++; - diocb->terminate = end; - lockend = ALIGN(end, blocksize) - 1; + data_len = min_t(u64, end, i_size_read(diocb->inode)); + if (data_len <= diocb->start) { + /* whatever we finished (or 0) is returned past EOF */ + goto fail; } + data_len -= diocb->start; - lock_extent(io_tree, diocb->lockstart, lockend, GFP_NOFS); - mutex_unlock(&diocb->inode->i_mutex); - - data_len = end - diocb->start; + safe_to_read = 0; while (data_len && !diocb->error) { /* error in reaper stops submit */ struct extent_map *em; - u64 len = data_len; + u64 len; em = btrfs_get_extent(diocb->inode, NULL, 0, - diocb->start, len, 0); + diocb->start, data_len, 0); if (IS_ERR(em)) { err = PTR_ERR(em); printk(KERN_ERR @@ -496,6 +546,18 @@ getlock: goto fail; } + /* verify extent was locked and ordered data was flushed, + * may change data_len and lockend whether true or false. + */ + btrfs_dio_safe_to_read(diocb, em, &lockend, &data_len, + &safe_to_read); + if (!safe_to_read) { + free_extent_map(em); + goto getlock; + } + + len = data_len; + /* problem flushing ordered data with btree not updated */ if (test_bit(EXTENT_FLAG_VACANCY, &em->flags)) { printk(KERN_ERR @@ -520,25 +582,12 @@ getlock: } else { len = min(len, em->len - (diocb->start - em->start)); if (test_bit(EXTENT_FLAG_PREALLOC, &em->flags) || - em->block_start == EXTENT_MAP_HOLE) { + em->block_start == EXTENT_MAP_HOLE) err = btrfs_dio_hole_read(diocb, len); - } else if (test_bit(EXTENT_FLAG_COMPRESSED, - &em->flags)) { - if (diocb->lockstart > em->start || - lockend < em->start + em->len - 1) { - /* lock everything we read to inflate */ - unlock_extent(io_tree, diocb->lockstart, - lockend, GFP_NOFS); - diocb->lockstart = em->start; - lockend = max(lockend, - em->start + em->len - 1); - free_extent_map(em); - goto getlock; - } + else if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) err = btrfs_dio_compressed_read(diocb, em, len); - } else { + else err = btrfs_dio_extent_read(diocb, em, len); - } } free_extent_map(em); @@ -547,6 +596,15 @@ getlock: goto fail; cond_resched(); } + + /* we might have shortened data_len because of uncommitted + * ordered data, we want to try again to read the remainder + */ + if (diocb->start < end && !err && !diocb->error) { + lockend = ALIGN(end, blocksize) - 1; + goto getlock; + } + fail: if (err) diocb->error = err;