From patchwork Sun Mar 26 23:17:03 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 9645243 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 91DAE601D7 for ; Sun, 26 Mar 2017 23:20:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8124527F9F for ; Sun, 26 Mar 2017 23:20:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 72E272818B; Sun, 26 Mar 2017 23:20:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 04C7E27F9F for ; Sun, 26 Mar 2017 23:20:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751685AbdCZXSp (ORCPT ); Sun, 26 Mar 2017 19:18:45 -0400 Received: from mx2.suse.de ([195.135.220.15]:41523 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751510AbdCZXSm (ORCPT ); Sun, 26 Mar 2017 19:18:42 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 09DF4AB4B; Sun, 26 Mar 2017 23:17:15 +0000 (UTC) From: NeilBrown To: Ming Lei Date: Mon, 27 Mar 2017 10:17:03 +1100 Cc: Christoph Hellwig , Jens Axboe , linux-block , "open list\:SOFTWARE RAID \(Multiple Disks\) SUPPORT" , "open list\:DEVICE-MAPPER \(LVM\)" , Alasdair Kergon , Mike Snitzer , Shaohua Li , Linux Kernel Mailing List , "Martin K . Petersen" Subject: Re: [PATCH v3] block: trace completion of all bios. In-Reply-To: References: <877f3iave6.fsf@notabene.neil.brown.name> <20170322125149.GA29606@infradead.org> <87shm4a4lt.fsf@notabene.neil.brown.name> <20170323104331.GA16903@ming.t460p> <87fui3a65o.fsf@notabene.neil.brown.name> Message-ID: <87inmv8w7k.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Fri, Mar 24 2017, Ming Lei wrote: > On Fri, Mar 24, 2017 at 8:07 AM, NeilBrown wrote: ... >> @@ -102,6 +102,8 @@ struct bio { >> #define BIO_REFFED 8 /* bio has elevated ->bi_cnt */ >> #define BIO_THROTTLED 9 /* This bio has already been subjected to >> * throttling rules. Don't do it again. */ >> +#define BIO_TRACE_COMPLETION 10 /* bio_endio() should trace the final completion >> + * of this bio. */ > > This may not be a good idea, since the flag space is quite small(12). which means there are 2 unused bits and I only want to use 1. It should fit. I'm a bit perplexed why 4 bits a reserved for the BVEC_POOL number which ranges from 0 to 7.... But if these bits really are a scarce resource, we should stop wasting them. The patch below makes bit 7 (BIO_CHAIN) available. We could probably make bit 8 (BIO_REFFED) available using a similar technique. BIO_QUIET is only relevant if bi_error is non-zero and bi_error has a limited range, so a bit in there could be used instead. In fact, MAX_ERRNO is 4096, so bi_error could be a 'short'. That could save 2 whole bytes ... or make 16 more flag bits available. So I don't think you concern about a shortage of spare flag bits is valid. Thanks, NeilBrown diff --git a/block/bio.c b/block/bio.c index c1272986133e..1703786a206a 100644 --- a/block/bio.c +++ b/block/bio.c @@ -274,7 +274,7 @@ void bio_init(struct bio *bio, struct bio_vec *table, unsigned short max_vecs) { memset(bio, 0, sizeof(*bio)); - atomic_set(&bio->__bi_remaining, 1); + atomic_set(&bio->__bi_remaining, 0); atomic_set(&bio->__bi_cnt, 1); bio->bi_io_vec = table; @@ -300,7 +300,7 @@ void bio_reset(struct bio *bio) memset(bio, 0, BIO_RESET_BYTES); bio->bi_flags = flags; - atomic_set(&bio->__bi_remaining, 1); + atomic_set(&bio->__bi_remaining, 0); } EXPORT_SYMBOL(bio_reset); @@ -1794,20 +1794,15 @@ EXPORT_SYMBOL(bio_flush_dcache_pages); static inline bool bio_remaining_done(struct bio *bio) { /* - * If we're not chaining, then ->__bi_remaining is always 1 and + * If we're not chaining, then ->__bi_remaining is always 0 and * we always end io on the first invocation. */ - if (!bio_flagged(bio, BIO_CHAIN)) + if (atomic_read(&bio->__bi_remaining) == 0) return true; BUG_ON(atomic_read(&bio->__bi_remaining) <= 0); - if (atomic_dec_and_test(&bio->__bi_remaining)) { - bio_clear_flag(bio, BIO_CHAIN); - return true; - } - - return false; + return atomic_dec_and_test(&bio->__bi_remaining); } /** diff --git a/include/linux/bio.h b/include/linux/bio.h index 8e521194f6fc..d15245544111 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -664,13 +664,19 @@ static inline struct bio *bio_list_get(struct bio_list *bl) } /* - * Increment chain count for the bio. Make sure the CHAIN flag update - * is visible before the raised count. + * Increment chain count for the bio. */ static inline void bio_inc_remaining(struct bio *bio) { - bio_set_flag(bio, BIO_CHAIN); - smp_mb__before_atomic(); + /* + * Calls to bio_inc_remaining() cannot race + * with the final call to bio_end_io(), and + * the first call cannot race with other calls, + * so if __bi_remaining appears to be zero, there + * can be no race which might change it. + */ + if (atomic_read(&bio->__bi_remaining) == 0) + atomic_set(&bio->__bi_remaining, 1); atomic_inc(&bio->__bi_remaining); } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index db7a57ee0e58..2deea4501d14 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -46,6 +46,15 @@ struct bio { unsigned int bi_seg_front_size; unsigned int bi_seg_back_size; + /* __bi_remaining records the number of completion events + * (i.e. calls to bi_end_io()) that need to happen before this + * bio is truly complete. + * A value of '0' means that there will only ever be one completion + * event, so there will be no racing and no need for an atomic operation + * to detect the last event. + * Any other value represents a simple count subject to atomic_inc() and + * atomic_dec_and_test(). + */ atomic_t __bi_remaining; bio_end_io_t *bi_end_io; @@ -98,7 +107,7 @@ struct bio { #define BIO_USER_MAPPED 4 /* contains user pages */ #define BIO_NULL_MAPPED 5 /* contains invalid user pages */ #define BIO_QUIET 6 /* Make BIO Quiet */ -#define BIO_CHAIN 7 /* chained bio, ->bi_remaining in effect */ +/* 7 unused */ #define BIO_REFFED 8 /* bio has elevated ->bi_cnt */ #define BIO_THROTTLED 9 /* This bio has already been subjected to * throttling rules. Don't do it again. */