From patchwork Fri Jul 14 22:11:48 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Verma, Vishal L" X-Patchwork-Id: 9841751 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2A84960212 for ; Fri, 14 Jul 2017 22:13:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1B388287C3 for ; Fri, 14 Jul 2017 22:13:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 102E5287D1; Fri, 14 Jul 2017 22:13:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 3B06A287C3 for ; Fri, 14 Jul 2017 22:13:36 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id CC54621D147A9; Fri, 14 Jul 2017 15:11:44 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id EF37221D0B675 for ; Fri, 14 Jul 2017 15:11:42 -0700 (PDT) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga104.jf.intel.com with ESMTP; 14 Jul 2017 15:13:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,360,1496127600"; d="scan'208";a="111501929" Received: from omniknight.lm.intel.com ([10.232.112.78]) by orsmga002.jf.intel.com with ESMTP; 14 Jul 2017 15:13:32 -0700 From: Vishal Verma To: Subject: [PATCH 6/6] libnvdimm, btt: rework error clearing Date: Fri, 14 Jul 2017 16:11:48 -0600 Message-Id: <20170714221148.11232-7-vishal.l.verma@intel.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20170714221148.11232-1-vishal.l.verma@intel.com> References: <20170714221148.11232-1-vishal.l.verma@intel.com> X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Rafael J. Wysocki" , linux-acpi@vger.kernel.org MIME-Version: 1.0 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP Clearing errors or badblocks during a BTT write requires sending an ACPI DSM, which means potentially sleeping. Since a BTT IO happens in atomic context (preemption disabled, spinlocks may be held), we cannot perform error clearing in the course of an IO. Due to this error clearing for BTT IOs has hitherto been disabled. In this patch we move error clearing out of the atomic section, and thus re-enable error clearing with BTTs. When we are about to add a block to the free list, we check if it was previously marked as an error, and if it was, we add it to the freelist, but also set a flag that says error clearing will be required. We then drop the lane (ending the atomic context), and send a zero buffer so that the error can be cleared. The error flag in the free list is protected by the nd 'lane', and is set only be a thread while it holds that lane. When the error is cleared, the flag is cleared, but while holding a mutex for that freelist index. When writing, we check for two things - 1/ If the freelist mutex is held or if the error flag is set. If so, this is an error block that is being (or about to be) cleared. 2/ If the block is a known badblock based on nsio->bb The second check is required because the BTT map error flag for a map entry only gets set when an error LBA is read. If we write to a new location that may not have the map error flag set, but still might be in the region's badblock list, we can trigger an EIO on the write, which is undesirable and completely avoidable. Cc: Jeff Moyer Cc: Toshi Kani Cc: Dan Williams Signed-off-by: Vishal Verma --- drivers/nvdimm/btt.c | 89 +++++++++++++++++++++++++++++++++++++++++++++----- drivers/nvdimm/btt.h | 5 +++ drivers/nvdimm/claim.c | 8 ----- 3 files changed, 85 insertions(+), 17 deletions(-) diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c index 6b84eae..48382ca9 100644 --- a/drivers/nvdimm/btt.c +++ b/drivers/nvdimm/btt.c @@ -381,7 +381,9 @@ static int btt_flog_write(struct arena_info *arena, u32 lane, u32 sub, arena->freelist[lane].sub = 1 - arena->freelist[lane].sub; if (++(arena->freelist[lane].seq) == 4) arena->freelist[lane].seq = 1; - arena->freelist[lane].block = le32_to_cpu(ent->old_map); + if (ent_e_flag(ent->old_map)) + arena->freelist[lane].has_err = 1; + arena->freelist[lane].block = le32_to_cpu(ent_lba(ent->old_map)); return ret; } @@ -480,6 +482,34 @@ static int btt_log_init(struct arena_info *arena) return ret; } +static u64 to_namespace_offset(struct arena_info *arena, u64 lba) +{ + return arena->dataoff + ((u64)lba * arena->internal_lbasize); +} + +static int arena_clear_freelist_error(struct arena_info *arena, u32 lane) +{ + int ret = 0; + + if (arena->freelist[lane].has_err) { + u32 lba = arena->freelist[lane].block; + u64 nsoff = to_namespace_offset(arena, lba); + void *zerobuf; + + zerobuf = kzalloc(arena->sector_size, GFP_KERNEL); + if (!zerobuf) + return -ENOMEM; + + mutex_lock(&arena->freelist[lane].err_lock); + ret = arena_write_bytes(arena, nsoff, zerobuf, + arena->sector_size, 0); + arena->freelist[lane].has_err = 0; + mutex_unlock(&arena->freelist[lane].err_lock); + kfree(zerobuf); + } + return ret; +} + static int btt_freelist_init(struct arena_info *arena) { int old, new, ret; @@ -505,6 +535,9 @@ static int btt_freelist_init(struct arena_info *arena) arena->freelist[i].seq = nd_inc_seq(le32_to_cpu(log_new.seq)); arena->freelist[i].block = le32_to_cpu(log_new.old_map); + if (ent_e_flag(log_new.old_map)) + arena_clear_freelist_error(arena, i); + /* This implies a newly created or untouched flog entry */ if (log_new.old_map == log_new.new_map) continue; @@ -525,7 +558,7 @@ static int btt_freelist_init(struct arena_info *arena) if (ret) return ret; } - + mutex_init(&arena->freelist[i].err_lock); } return 0; @@ -905,11 +938,6 @@ static void unlock_map(struct arena_info *arena, u32 premap) spin_unlock(&arena->map_locks[idx].lock); } -static u64 to_namespace_offset(struct arena_info *arena, u64 lba) -{ - return arena->dataoff + ((u64)lba * arena->internal_lbasize); -} - static int btt_data_read(struct arena_info *arena, struct page *page, unsigned int off, u32 lba, u32 len) { @@ -1066,8 +1094,14 @@ static int btt_read_pg(struct btt *btt, struct bio_integrity_payload *bip, } ret = btt_data_read(arena, page, off, postmap, cur_len); - if (ret) + if (ret) { + int rc; + + /* Media error - set the e_flag */ + rc = btt_map_write(arena, premap, postmap, 0, 1, + NVDIMM_IO_ATOMIC); goto out_rtt; + } if (bip) { ret = btt_rw_integrity(btt, bip, arena, postmap, READ); @@ -1092,6 +1126,15 @@ static int btt_read_pg(struct btt *btt, struct bio_integrity_payload *bip, return ret; } +static bool btt_is_badblock(struct btt *btt, struct arena_info *arena, + u32 postmap) +{ + u64 nsoff = to_namespace_offset(arena, postmap); + sector_t phys_sector = nsoff >> 9; + + return is_bad_pmem(btt->phys_bb, phys_sector, arena->internal_lbasize); +} + static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip, sector_t sector, struct page *page, unsigned int off, unsigned int len) @@ -1104,7 +1147,9 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip, while (len) { u32 cur_len; + int e_flag; + retry: lane = nd_region_acquire_lane(btt->nd_region); ret = lba_to_arena(btt, sector, &premap, &arena); @@ -1117,6 +1162,24 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip, goto out_lane; } + /* The block had a media error, and is being cleared */ + if (mutex_is_locked(&arena->freelist[lane].err_lock) + || arena->freelist[lane].has_err) { + nd_region_release_lane(btt->nd_region, lane); + /* OK to acquire a different lane/free block */ + goto retry; + } + + /* The block had a media error, and needs to be cleared */ + if (btt_is_badblock(btt, arena, arena->freelist[lane].block)) { + arena->freelist[lane].has_err = 1; + nd_region_release_lane(btt->nd_region, lane); + + arena_clear_freelist_error(arena, lane); + /* OK to acquire a different lane/free block */ + goto retry; + } + new_postmap = arena->freelist[lane].block; /* Wait if the new block is being read from */ @@ -1142,7 +1205,7 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip, } lock_map(arena, premap); - ret = btt_map_read(arena, premap, &old_postmap, NULL, NULL, + ret = btt_map_read(arena, premap, &old_postmap, NULL, &e_flag, NVDIMM_IO_ATOMIC); if (ret) goto out_map; @@ -1150,6 +1213,8 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip, ret = -EIO; goto out_map; } + if (e_flag) + set_e_flag(old_postmap); log.lba = cpu_to_le32(premap); log.old_map = cpu_to_le32(old_postmap); @@ -1168,6 +1233,9 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip, unlock_map(arena, premap); nd_region_release_lane(btt->nd_region, lane); + if (e_flag) + arena_clear_freelist_error(arena, lane); + len -= cur_len; off += cur_len; sector += btt->sector_size >> SECTOR_SHIFT; @@ -1358,6 +1426,7 @@ static struct btt *btt_init(struct nd_btt *nd_btt, unsigned long long rawsize, { int ret; struct btt *btt; + struct nd_namespace_io *nsio; struct device *dev = &nd_btt->dev; btt = devm_kzalloc(dev, sizeof(struct btt), GFP_KERNEL); @@ -1371,6 +1440,8 @@ static struct btt *btt_init(struct nd_btt *nd_btt, unsigned long long rawsize, INIT_LIST_HEAD(&btt->arena_list); mutex_init(&btt->init_lock); btt->nd_region = nd_region; + nsio = to_nd_namespace_io(&nd_btt->ndns->dev); + btt->phys_bb = &nsio->bb; ret = discover_arenas(btt); if (ret) { diff --git a/drivers/nvdimm/btt.h b/drivers/nvdimm/btt.h index 2bc0d10b..69c1f90 100644 --- a/drivers/nvdimm/btt.h +++ b/drivers/nvdimm/btt.h @@ -15,6 +15,7 @@ #ifndef _LINUX_BTT_H #define _LINUX_BTT_H +#include #include #define BTT_SIG_LEN 16 @@ -41,6 +42,7 @@ #define ent_lba(ent) (ent & MAP_LBA_MASK) #define ent_e_flag(ent) (!!(ent & MAP_ERR_MASK)) #define ent_z_flag(ent) (!!(ent & MAP_TRIM_MASK)) +#define set_e_flag(ent) (ent |= MAP_ERR_MASK) enum btt_init_state { INIT_UNCHECKED = 0, @@ -82,6 +84,8 @@ struct free_entry { u32 block; u8 sub; u8 seq; + struct mutex err_lock; + u8 has_err; }; struct aligned_lock { @@ -187,6 +191,7 @@ struct btt { struct mutex init_lock; int init_state; int num_arenas; + struct badblocks *phys_bb; }; bool nd_btt_arena_is_valid(struct nd_btt *nd_btt, struct btt_sb *super); diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c index 8f29937..727f11b 100644 --- a/drivers/nvdimm/claim.c +++ b/drivers/nvdimm/claim.c @@ -280,14 +280,6 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns, } if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align))) { - /* - * FIXME: nsio_rw_bytes() may be called from atomic - * context in the btt case and the ACPI DSM path for - * clearing the error takes sleeping locks and allocates - * memory. An explicit error clearing path, and support - * for tracking badblocks in BTT metadata is needed to - * work around this collision. - */ if (IS_ALIGNED(offset, 512) && IS_ALIGNED(size, 512) && !(flags & NVDIMM_IO_ATOMIC)) { long cleared;