From patchwork Thu Jul 26 19:12:05 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Elder X-Patchwork-Id: 1244301 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork2.kernel.org (Postfix) with ESMTP id 8D145DFFCE for ; Thu, 26 Jul 2012 19:12:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752436Ab2GZTMI (ORCPT ); Thu, 26 Jul 2012 15:12:08 -0400 Received: from mail-yw0-f46.google.com ([209.85.213.46]:44396 "EHLO mail-yw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752359Ab2GZTMH (ORCPT ); Thu, 26 Jul 2012 15:12:07 -0400 Received: by yhmm54 with SMTP id m54so2356156yhm.19 for ; Thu, 26 Jul 2012 12:12:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding:x-gm-message-state; bh=FwZlbmbXXGATjmUNbUBY5ELwEGmLKUQOCFNEUPUJwLc=; b=L0YItoJkGLfHYKqMVg/7i9XBLedE+tGw2xjppgfgDPEpRdsueSqD21VgxMNARBUK/A 01eMzuS9AZhsqAdN+ul3Y4RxLaRB3HbjlTC6AvU0Ss2cFXtiW9LXv2u//QCsAZLZZja4 nr/gY2n760b3XSfnOVzBsPq1asYFMMtsUoqu5qvCf2k7jivDlpPHGf3vZwbnpafH5662 q15YzX+jpgo9fBhWECAEdigkSvn2zcFigqfpPCAHRnsoSn2bKhYTIUNTO9RdcUKRzpLn PrQ0tcHOwgU7wnaS8U0g/L/OQ/SbnfK6oQAXDOrQJs1FsH5TCCjSh22mHJbDHqKj7NHb zSxw== Received: by 10.236.125.133 with SMTP id z5mr19711649yhh.121.1343329926674; Thu, 26 Jul 2012 12:12:06 -0700 (PDT) Received: from [172.22.22.4] (c-71-195-31-37.hsd1.mn.comcast.net. [71.195.31.37]) by mx.google.com with ESMTPS id g15sm38606anl.19.2012.07.26.12.12.05 (version=SSLv3 cipher=OTHER); Thu, 26 Jul 2012 12:12:06 -0700 (PDT) Message-ID: <50119685.4010106@inktank.com> Date: Thu, 26 Jul 2012 14:12:05 -0500 From: Alex Elder User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: ceph-devel@vger.kernel.org Subject: [PATCH] rbd: __rbd_init_snaps_header() bug X-Gm-Message-State: ALoCoQkIFrJ14vRdfUfCN4wWvWK9By5lXPyfm93Bjj4CC/pmGTWwO/0B7t8DO++okSEYN4yA2y7/ Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org There is a bug in the way __rbd_init_snaps_header() handles newly- discovered snapshots, in the unusual case where the new snapshot id is less than an existing rbd snapshot id. When a new snapshot for an rbd image is created it is assigned a new snapshot id. The client requests the id from a monitor, and later updates the osds with information about the snapshot. The sequence of snapshot ids is monotonically increasing, which guarantees each is unique. And typically all new snapshots will have higher ids than any others a client has encountered. Because of the delay between getting assigned a snapshot id and recording the snapshot, there is a possible race between two clients requesting new snapshots on the same rbd image. It's possible in this case for the client granted a higher id to be the first to update the set of snapshots for the rbd image. This leaves a sort of gap in the set of snapshots until the other client's update gets recorded. This is an unlikely scenario, but it appears to be possible. I'm not sure whether we even support multiple concurrent clients for the same rbd image. However there is code in place to handle this case and it's wrong, so it needs to be fixed. The job of __rbd_init_snaps_header() is to compare an rbd's existing list of snapshots to a new set, and remove any existing snapshots that are not present in the new set and create any new ones not present in the existing set. If any new snapshots exist after the entire list of existing ones has been examined, all the new ones that remain need to be created. The logic for handling these last snapshots correct. However, if a new snapshot is found whose id is less than an existing one, the logic for handling this is wrong. The basic problem is that the name and current id used here are taken from a place one position beyond where they should be. In the event this block of code is executed the first pass through the outer loop, for example, these values will refer to invalid memory. The fix is to make this block of code more closely match the block that handles adding new snapshots at the end. There are three differences: - we use a do..while loop because we know we initially have an entry to process. - we insert the new entry at a position within the list instead of at the beginning - because we use it in the loop condition, we need to update our notion of "current id" each pass through. Signed-off-by: Alex Elder --- drivers/block/rbd.c | 55 +++++++++++++++++++++++++++++---------------------- 1 file changed, 31 insertions(+), 24 deletions(-) old_snap = list_entry(p, struct rbd_snap, node); @@ -2104,12 +2105,16 @@ static int __rbd_init_snaps_header(struct rbd_device *rbd_dev) if (i) cur_id = rbd_dev->header.snapc->snaps[i - 1]; + /* + * If old id is not present in the new context, or + * if there are no more snapshots in the new + * context, this snapshot should be removed. + */ if (!i || old_snap->id < cur_id) { /* - * old_snap->id was skipped, thus was - * removed. If this rbd_dev is mapped to - * the removed snapshot, record that it no - * longer exists, to prevent further I/O. + * If this rbd_dev is mapped to the removed + * snapshot, record that it no longer exists, + * to prevent further I/O. */ if (rbd_dev->snap_id == old_snap->id) rbd_dev->snap_exists = false; @@ -2122,34 +2127,36 @@ static int __rbd_init_snaps_header(struct rbd_device *rbd_dev) name = rbd_prev_snap_name(name, first_name); continue; } - for (; i > 0; - i--, name = rbd_prev_snap_name(name, first_name)) { - if (!name) { - WARN_ON(1); + + /* + * A new snapshot (or possibly more than one) + * appeared in the middle of our set of snapshots. + * This could happen of two hosts raced while + * creating snapshots, and the one that was granted + * a higher snapshot id managed to get its snapshot + * saved before the other. + */ + do { + i--; + name = rbd_prev_snap_name(name, first_name); + if (WARN_ON(!name)) return -EINVAL; - } - cur_id = rbd_dev->header.snapc->snaps[i]; - /* snapshot removal? handle it above */ - if (cur_id >= old_snap->id) - break; - /* a new snapshot */ - snap = __rbd_add_snap_dev(rbd_dev, i - 1, name); + snap = __rbd_add_snap_dev(rbd_dev, i, name); if (IS_ERR(snap)) return PTR_ERR(snap); - /* note that we add it backward so using n and not p */ list_add(&snap->node, n); - p = &snap->node; - } + + cur_id = rbd_dev->header.snapc->snaps[i]; + } while (i && cur_id < old_snap->id); } /* we're done going over the old snap list, just add what's left */ - for (; i > 0; i--) { + while (i) { + i--; name = rbd_prev_snap_name(name, first_name); - if (!name) { - WARN_ON(1); + if (WARN_ON(!name)) return -EINVAL; - } - snap = __rbd_add_snap_dev(rbd_dev, i - 1, name); + snap = __rbd_add_snap_dev(rbd_dev, i, name); if (IS_ERR(snap)) return PTR_ERR(snap); list_add(&snap->node, &rbd_dev->snaps); diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 71e3f3b..da09c0d 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -2090,13 +2090,14 @@ static int __rbd_init_snaps_header(struct rbd_device *rbd_dev) { const char *name, *first_name; int i = rbd_dev->header.total_snaps; - struct rbd_snap *snap, *old_snap = NULL; + struct rbd_snap *snap; struct list_head *p, *n; first_name = rbd_dev->header.snap_names; name = first_name + rbd_dev->header.snap_names_len; list_for_each_prev_safe(p, n, &rbd_dev->snaps) { + struct rbd_snap *old_snap; u64 cur_id;