From patchwork Wed Nov 22 04:00:51 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10069493 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 3CB5560375 for ; Wed, 22 Nov 2017 04:01:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3691829ADE for ; Wed, 22 Nov 2017 04:01:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2B38B29AFC; Wed, 22 Nov 2017 04:01:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 052CA29B05 for ; Wed, 22 Nov 2017 04:01:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751429AbdKVEBC (ORCPT ); Tue, 21 Nov 2017 23:01:02 -0500 Received: from mx2.suse.de ([195.135.220.15]:38044 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751409AbdKVEBB (ORCPT ); Tue, 21 Nov 2017 23:01:01 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id ECE62AE93; Wed, 22 Nov 2017 04:00:59 +0000 (UTC) From: NeilBrown To: Mikulas Patocka , Mike Snitzer Date: Wed, 22 Nov 2017 15:00:51 +1100 Cc: Jens Axboe , "linux-kernel\@vger.kernel.org" , linux-block@vger.kernel.org, device-mapper development , Zdenek Kabelac Subject: Re: [dm-devel] new patchset to eliminate DM's use of BIOSET_NEED_RESCUER In-Reply-To: References: <149776047907.23258.8058071140236879834.stgit@noble> <20170618184143.GA10920@kernel.dk> <87poe13rmm.fsf@notabene.neil.brown.name> <87a7zg31vx.fsf@notabene.neil.brown.name> <20171121013533.GA14520@redhat.com> <20171121121049.GA17014@redhat.com> <20171121124311.GA17243@redhat.com> <20171121194709.GA18903@redhat.com> <20171121225119.GA19630@redhat.com> Message-ID: <87bmjv0xos.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Tue, Nov 21 2017, Mikulas Patocka wrote: > On Tue, 21 Nov 2017, Mike Snitzer wrote: > >> On Tue, Nov 21 2017 at 4:23pm -0500, >> Mikulas Patocka wrote: >> >> > This is not correct: >> > >> > 2206 static void dm_wq_work(struct work_struct *work) >> > 2207 { >> > 2208 struct mapped_device *md = container_of(work, struct mapped_device, work); >> > 2209 struct bio *bio; >> > 2210 int srcu_idx; >> > 2211 struct dm_table *map; >> > 2212 >> > 2213 if (!bio_list_empty(&md->rescued)) { >> > 2214 struct bio_list list; >> > 2215 spin_lock_irq(&md->deferred_lock); >> > 2216 list = md->rescued; >> > 2217 bio_list_init(&md->rescued); >> > 2218 spin_unlock_irq(&md->deferred_lock); >> > 2219 while ((bio = bio_list_pop(&list))) >> > 2220 generic_make_request(bio); >> > 2221 } >> > 2222 >> > 2223 map = dm_get_live_table(md, &srcu_idx); >> > 2224 >> > 2225 while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) { >> > 2226 spin_lock_irq(&md->deferred_lock); >> > 2227 bio = bio_list_pop(&md->deferred); >> > 2228 spin_unlock_irq(&md->deferred_lock); >> > 2229 >> > 2230 if (!bio) >> > 2231 break; >> > 2232 >> > 2233 if (dm_request_based(md)) >> > 2234 generic_make_request(bio); >> > 2235 else >> > 2236 __split_and_process_bio(md, map, bio); >> > 2237 } >> > 2238 >> > 2239 dm_put_live_table(md, srcu_idx); >> > 2240 } >> > >> > You can see that if we are in dm_wq_work in __split_and_process_bio, we >> > will not process md->rescued list. >> >> Can you elaborate further? We cannot be "in dm_wq_work in >> __split_and_process_bio" simultaneously. Do you mean as a side-effect >> of scheduling away from __split_and_process_bio? >> >> The more detail you can share the better. > > Suppose this scenario: > > * dm_wq_work calls __split_and_process_bio > * __split_and_process_bio eventually reaches the function snapshot_map > * snapshot_map attempts to take the snapshot lock > > * the snapshot lock could be released only if some bios submitted by the > snapshot driver to the underlying device complete > * the bios submitted to the underlying device were already offloaded by > some other task and they are waiting on the list md->rescued > * the bios waiting on md->rescued are not processed, because dm_wq_work is > blocked in snapshot_map (called from __split_and_process_bio) Yes, I think you are right. I think the solution is to get rid of the dm_offload() infrastructure and make it not necessary. i.e. discard my patches dm: prepare to discontinue use of BIOSET_NEED_RESCUER and dm: revise 'rescue' strategy for bio-based bioset allocations And build on "dm: ensure bio submission follows a depth-first tree walk" which was written after those and already makes dm_offload() less important. Since that "depth-first" patch, every request to the dm device, after the initial splitting, allocates just one dm_target_io structure, and makes just one __map_bio() call, and so will behave exactly the way generic_make_request() expects and copes with - thus avoiding awkward dependencies and deadlocks. Except.... a/ If any target defines ->num_write_bios() to return >1, __clone_and_map_data_bio() will make multiple calls to alloc_tio() and __map_bio(), which might need rescuing. But no target defines num_write_bios, and none have since it was removed from dm-cache 4.5 years ago. Can we discard num_write_bios?? b/ If any target sets any of num_{flush,discard,write_same,write_zeroes}_bios to a value > 1, then __send_duplicate_bios() will also make multiple calls to alloc_tio() and __map_bio(). Some do. dm-cache-target: flush=2 dm-snap: flush=2 dm-stripe: discard, write_same, write_zeroes all set to 'stripes'. These will only be a problem if the second (or subsequent) alloc_tio() blocks waiting for an earlier allocation to complete. This will only be a problem if multiple threads are each trying to allocate multiple dm_target_io from the same bioset at the same time. This is rare and should be easier to address than the current dm_offload() approach. One possibility would be to copy the approach taken by crypt_alloc_buffer() which needs to allocate multiple entries from a mempool. It first tries the with GFP_NOWAIT. If that fails it take a mutex and tries with GFP_NOIO. This mean only one thread will try to allocate multiple bios at once, so there can be no deadlock. Below are two RFC patches. The first removes num_write_bios. The second is incomplete and makes a stab are allocating multiple bios at once safely. A third would be needed to remove dm_offload() etc... but I cannot quite fit that in today - must be off. Thanks, NeilBrown From: NeilBrown Date: Wed, 22 Nov 2017 14:25:18 +1100 Subject: [PATCH] DM: remove num_write_bios target interface. No target provides num_write_bios and none has done since 2013. Having the possibility of num_write_bios > 1 complicates bio allocation. So remove the interface and assume there is only one bio needed. If a target ever needs more, it must provide a suitable bioset and allocate itself based on its particular needs. Signed-off-by: NeilBrown --- drivers/md/dm.c | 22 ++++------------------ include/linux/device-mapper.h | 15 --------------- 2 files changed, 4 insertions(+), 33 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index b20febd6cbc7..8c1a05609eea 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1323,27 +1323,13 @@ static int __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti, { struct bio *bio = ci->bio; struct dm_target_io *tio; - unsigned target_bio_nr; - unsigned num_target_bios = 1; int r = 0; - /* - * Does the target want to receive duplicate copies of the bio? - */ - if (bio_data_dir(bio) == WRITE && ti->num_write_bios) - num_target_bios = ti->num_write_bios(ti, bio); - - for (target_bio_nr = 0; target_bio_nr < num_target_bios; target_bio_nr++) { - tio = alloc_tio(ci, ti, target_bio_nr); - tio->len_ptr = len; - r = clone_bio(tio, bio, sector, *len); - if (r < 0) { - free_tio(tio); - break; - } + tio = alloc_tio(ci, ti, 0); + tio->len_ptr = len; + r = clone_bio(tio, bio, sector, *len); + if (r >= 0) __map_bio(tio); - } - return r; } diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index a5538433c927..5a68b366e664 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -220,14 +220,6 @@ struct target_type { #define DM_TARGET_WILDCARD 0x00000008 #define dm_target_is_wildcard(type) ((type)->features & DM_TARGET_WILDCARD) -/* - * Some targets need to be sent the same WRITE bio severals times so - * that they can send copies of it to different devices. This function - * examines any supplied bio and returns the number of copies of it the - * target requires. - */ -typedef unsigned (*dm_num_write_bios_fn) (struct dm_target *ti, struct bio *bio); - /* * A target implements own bio data integrity. */ @@ -291,13 +283,6 @@ struct dm_target { */ unsigned per_io_data_size; - /* - * If defined, this function is called to find out how many - * duplicate bios should be sent to the target when writing - * data. - */ - dm_num_write_bios_fn num_write_bios; - /* target specific data */ void *private; -- 2.14.0.rc0.dirty ----------------------------------- diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 8c1a05609eea..8762661df2ef 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1265,8 +1265,7 @@ static int clone_bio(struct dm_target_io *tio, struct bio *bio, } static struct dm_target_io *alloc_tio(struct clone_info *ci, - struct dm_target *ti, - unsigned target_bio_nr) + struct dm_target *ti) { struct dm_target_io *tio; struct bio *clone; @@ -1276,34 +1275,66 @@ static struct dm_target_io *alloc_tio(struct clone_info *ci, tio->io = ci->io; tio->ti = ti; - tio->target_bio_nr = target_bio_nr; + tio->target_bio_nr = 0; return tio; } -static void __clone_and_map_simple_bio(struct clone_info *ci, - struct dm_target *ti, - unsigned target_bio_nr, unsigned *len) +static void alloc_multiple_bios(struct bio_list *blist, struct clone_info *ci, + struct dm_target *ti, unsigned num_bios) { - struct dm_target_io *tio = alloc_tio(ci, ti, target_bio_nr); - struct bio *clone = &tio->clone; + int try; - tio->len_ptr = len; + for (try = 0; try < 2; try++) { + int bio_nr; + struct bio *bio; + + if (try) + mutex_lock(&ci->md->table_devices_lock); + for (bio_nr = 0; bio_nr < num_bios; bio_nr++) { + bio = bio_alloc_bioset(try ? GFP_NOIO : GFP_NOWAIT, + 0, ci->md->bs); + if (bio) { + struct dm_target_io *tio; + bio_list_add(blist, bio); + tio = container_of(bio, struct dm_target_io, clone); - __bio_clone_fast(clone, ci->bio); - if (len) - bio_setup_sector(clone, ci->sector, *len); + tio->io = ci->io; + tio->ti = ti; + tio->target_bio_nr = bio_nr; + } else + break; + } + if (try) + mutex_unlock(&ci->md->table_devices_lock); + if (bio_nr == num_bios) + return; - __map_bio(tio); + while ((bio = bio_list_pop(blist)) != NULL) + bio_put(bio); + } } static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti, unsigned num_bios, unsigned *len) { - unsigned target_bio_nr; + struct bio_list blist = BIO_EMPTY_LIST; + struct bio *bio; - for (target_bio_nr = 0; target_bio_nr < num_bios; target_bio_nr++) - __clone_and_map_simple_bio(ci, ti, target_bio_nr, len); + if (num_bios == 1) + bio_list_add(&blist, &alloc_tio(ci, ti)->clone); + else + alloc_multiple_bios(&blist, ci, ti, num_bios); + + while ((bio = bio_list_pop(&blist)) != NULL) { + struct dm_target_io *tio = container_of( + bio, struct dm_target_io, clone); + tio->len_ptr = len; + __bio_clone_fast(bio, ci->bio); + if (len) + bio_setup_sector(bio, ci->sector, *len); + __map_bio(tio); + } } static int __send_empty_flush(struct clone_info *ci) @@ -1325,7 +1356,7 @@ static int __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti, struct dm_target_io *tio; int r = 0; - tio = alloc_tio(ci, ti, 0); + tio = alloc_tio(ci, ti); tio->len_ptr = len; r = clone_bio(tio, bio, sector, *len); if (r >= 0)