From patchwork Sat Jan 21 21:39:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Andryuk X-Patchwork-Id: 13111365 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 02870C27C76 for ; Sat, 21 Jan 2023 21:40:24 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.482383.747860 (Exim 4.92) (envelope-from ) id 1pJLaa-0004Zn-U5; Sat, 21 Jan 2023 21:39:52 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 482383.747860; Sat, 21 Jan 2023 21:39:52 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1pJLaa-0004Zg-Qc; Sat, 21 Jan 2023 21:39:52 +0000 Received: by outflank-mailman (input) for mailman id 482383; Sat, 21 Jan 2023 21:39:50 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1pJLaY-0004Ke-QT for xen-devel@lists.xenproject.org; Sat, 21 Jan 2023 21:39:50 +0000 Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [2607:f8b0:4864:20::832]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 20f4fb03-99d4-11ed-b8d1-410ff93cb8f0; Sat, 21 Jan 2023 22:39:49 +0100 (CET) Received: by mail-qt1-x832.google.com with SMTP id g16so4885622qtu.2 for ; Sat, 21 Jan 2023 13:39:48 -0800 (PST) Received: from shine.lan ([2001:470:8:67e:4282:e612:9c15:499]) by smtp.gmail.com with ESMTPSA id w6-20020a05620a424600b00705be892191sm24202402qko.56.2023.01.21.13.39.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 21 Jan 2023 13:39:46 -0800 (PST) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 20f4fb03-99d4-11ed-b8d1-410ff93cb8f0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VlprdPagtmcclyYEQlFBNVYwZD5+y5cWhZ95XyI3Leo=; b=TgMzWx9ifX/noFmbJ2gaApzq0RK4TGGBMBSNDRAsGREzeFsuh0gzj8i863CeBSIcZS 2ME+DLqYW2BMfbEVe+WovwJ9sLFLT0rvbs/8VUbdttQ6p8c/GZ2l0A4vAqCNQApafay1 c5X/T8WS6OsVRfOO8xdH6+bhlUNhHDi+jKhW0PVV4+nNxTnif01QX1prVKr78i8rG6j7 A1CTJSCrvcyem2AgwVbFpz/U9nu5ZPkaSiLYJkyhj9WiKw6KJ7te/8RkrZRdSVZ4nrto iSIVx5mEwmloMF0VwaL8hN8wGmE8i+f7TZW7HXj4KTZqQgGXcK+bNzbZhItptg6uStTK aocg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VlprdPagtmcclyYEQlFBNVYwZD5+y5cWhZ95XyI3Leo=; b=T2KkMNSMGGI7B6nLEbY/971pQZyGIjdgjPNFVGtdlKHG0fG9FZJOiGkyaj4NO8JOAW OJdUclvh4Ueuh5SXBeCWVydgTGMvQeLrZ0y9vyPukRvybPkbUIQFL6LxW386fdQ280Ha 9hthboLRSqB9kMsI6NOWezNzvILQLu4ycri0LY5cwYPCiE2BV4fq9OyorJW81J2lVJn4 NXch4maxIhCu2S5vvYIt10VrxckZRFXfNnGJCtpGB3PvazHd34I/U2IFyW9Kn7pUKI62 i8wQyMeJvapl6J+nPPlp1O2rwcKPVm2yCZHXmjoZb8UTznx+cCUr/3SeDS3SQf+sQbP7 rNTw== X-Gm-Message-State: AFqh2kroY4hz4Ko5/c7FCL+GoPzKx5Qgy6RvR9PCZp99/HOYb29EdJZi zbg+WteCQxE1oSLUtV4IjweNlNOLaRs= X-Google-Smtp-Source: AMrXdXuv2Ijx/AyZiagmueTAO6PlPJBQtrFWQiM+a7Cq1+SAIrbfpmcChWa9Yfug9Q+/CUKjGK6ymA== X-Received: by 2002:a05:622a:4208:b0:3b6:2cdb:e240 with SMTP id cp8-20020a05622a420800b003b62cdbe240mr27853568qtb.18.1674337187386; Sat, 21 Jan 2023 13:39:47 -0800 (PST) From: Jason Andryuk To: xen-devel@lists.xenproject.org Cc: Jason Andryuk , Wei Liu , Anthony PERARD , Juergen Gross , Dongli Zhang Subject: [PATCH 1/2] libxl: Fix guest kexec - skip cpuid policy Date: Sat, 21 Jan 2023 16:39:07 -0500 Message-Id: <20230121213908.6504-2-jandryuk@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230121213908.6504-1-jandryuk@gmail.com> References: <20230121213908.6504-1-jandryuk@gmail.com> MIME-Version: 1.0 When a domain performs a kexec (soft reset), libxl__build_pre() is called with the existing domid. Calling libxl__cpuid_legacy() on the existing domain fails since the cpuid policy has already been set, and the guest isn't rebuilt and doesn't kexec. xc: error: Failed to set d1's policy (err leaf 0xffffffff, subleaf 0xffffffff, msr 0xffffffff) (17 = File exists): Internal error libxl: error: libxl_cpuid.c:494:libxl__cpuid_legacy: Domain 1:Failed to apply CPUID policy: File exists libxl: error: libxl_create.c:1641:domcreate_rebuild_done: Domain 1:cannot (re-)build domain: -3 libxl: error: libxl_xshelp.c:201:libxl__xs_read_mandatory: xenstore read failed: `/libxl/1/type': No such file or directory libxl: warning: libxl_dom.c:49:libxl__domain_type: unable to get domain type for domid=1, assuming HVM During a soft_reset, skip calling libxl__cpuid_legacy() to avoid the issue. Before the fixes commit, the libxl__cpuid_legacy() failure would have been ignored, so kexec would continue. Fixes: 34990446ca91 "libxl: don't ignore the return value from xc_cpuid_apply_policy" Signed-off-by: Jason Andryuk --- Probably a backport candidate since this has been broken for a while. --- tools/libs/light/libxl_create.c | 4 ++-- tools/libs/light/libxl_dom.c | 5 +++-- tools/libs/light/libxl_internal.h | 2 +- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c index 5cddc3df79..587a515dff 100644 --- a/tools/libs/light/libxl_create.c +++ b/tools/libs/light/libxl_create.c @@ -510,7 +510,7 @@ int libxl__domain_build(libxl__gc *gc, struct timeval start_time; int i, ret; - ret = libxl__build_pre(gc, domid, d_config, state); + ret = libxl__build_pre(gc, domid, d_config, state, false); if (ret) goto out; @@ -1440,7 +1440,7 @@ static void domcreate_bootloader_done(libxl__egc *egc, goto out; } - rc = libxl__build_pre(gc, domid, d_config, state); + rc = libxl__build_pre(gc, domid, d_config, state, dcs->soft_reset); if (rc) goto out; diff --git a/tools/libs/light/libxl_dom.c b/tools/libs/light/libxl_dom.c index b454f988fb..7cebf5047f 100644 --- a/tools/libs/light/libxl_dom.c +++ b/tools/libs/light/libxl_dom.c @@ -241,7 +241,8 @@ static int numa_place_domain(libxl__gc *gc, uint32_t domid, } int libxl__build_pre(libxl__gc *gc, uint32_t domid, - libxl_domain_config *d_config, libxl__domain_build_state *state) + libxl_domain_config *d_config, libxl__domain_build_state *state, + bool soft_reset) { libxl_domain_build_info *const info = &d_config->b_info; libxl_ctx *ctx = libxl__gc_owner(gc); @@ -382,7 +383,7 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid, /* Construct a CPUID policy, but only for brand new domains. Domains * being migrated-in/restored have CPUID handled during the * static_data_done() callback. */ - if (!state->restore) + if (!state->restore && !soft_reset) rc = libxl__cpuid_legacy(ctx, domid, false, info); out: diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h index 0dc8b8f210..f0af44b523 100644 --- a/tools/libs/light/libxl_internal.h +++ b/tools/libs/light/libxl_internal.h @@ -1418,7 +1418,7 @@ _hidden void libxl__domain_build_state_dispose(libxl__domain_build_state *s); _hidden int libxl__build_pre(libxl__gc *gc, uint32_t domid, libxl_domain_config * const d_config, - libxl__domain_build_state *state); + libxl__domain_build_state *state, bool soft_reset); _hidden int libxl__build_post(libxl__gc *gc, uint32_t domid, libxl_domain_build_info *info, libxl__domain_build_state *state, char **vms_ents, char **local_ents); From patchwork Sat Jan 21 21:39:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Andryuk X-Patchwork-Id: 13111366 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C1188C38142 for ; Sat, 21 Jan 2023 21:40:25 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.482384.747870 (Exim 4.92) (envelope-from ) id 1pJLac-0004pE-5j; Sat, 21 Jan 2023 21:39:54 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 482384.747870; Sat, 21 Jan 2023 21:39:54 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1pJLac-0004p6-2s; Sat, 21 Jan 2023 21:39:54 +0000 Received: by outflank-mailman (input) for mailman id 482384; Sat, 21 Jan 2023 21:39:53 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1pJLab-0004Ke-Ap for xen-devel@lists.xenproject.org; Sat, 21 Jan 2023 21:39:53 +0000 Received: from mail-qv1-xf2e.google.com (mail-qv1-xf2e.google.com [2607:f8b0:4864:20::f2e]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 226ad58a-99d4-11ed-b8d1-410ff93cb8f0; Sat, 21 Jan 2023 22:39:51 +0100 (CET) Received: by mail-qv1-xf2e.google.com with SMTP id g10so6368251qvo.6 for ; Sat, 21 Jan 2023 13:39:51 -0800 (PST) Received: from shine.lan ([2001:470:8:67e:4282:e612:9c15:499]) by smtp.gmail.com with ESMTPSA id w6-20020a05620a424600b00705be892191sm24202402qko.56.2023.01.21.13.39.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 21 Jan 2023 13:39:48 -0800 (PST) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 226ad58a-99d4-11ed-b8d1-410ff93cb8f0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5M2ErVFM3ffELnEbXKvTrmkvSoDVxImjjfW2QVBelko=; b=f/lW/Oy/8GtMQRgRZmpr7uaJfCjHJ40UWyp5DbiReRm6FlvGHdZUgu9sv/+kkegcTO rgFLK3DEAYSK4sJ6Rv26b1W/RiYbNFnoIfHhBbgsnIXn8ERrerCa2/9B0xnvcBb1wq1Q Afl9yNr0x8EPKpCa2DLuxHe4LfxozqJW01Vqb01Rv6yQOSygne3cs/Kk8qJcUHl7XzLd eiRCpCYeOjBJILEUZt4tnWzVGQuJJ+1H0eiZPkevmoW6oyUW9zPpOEyR2HcYlq8hwYYr nGFN0TyMAxobXOClVOx0Iic1yPux8CmbFVBjMTvdjJwRfKDZ9itWbRhNmFSVQ+m/Sq5h izlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5M2ErVFM3ffELnEbXKvTrmkvSoDVxImjjfW2QVBelko=; b=13tywHzQkTd4EYfymQQsO3a+j42XCGLWE4+h21LzOi64Abkd6XLijQd3o6VNqVg46N nEHw76QMHIYbJg29RO+sQAi+rbBhVv6donYql9ZshJmn+aMDtqfhY/C5bppGNsNvOtcg yL0da0n41gRae6yuJ2cE9rbY8+FCutYL5utwyEX1xfR1GqXJAI7U4TWQzOB+qnlkhEkc rMBU1oJswqCIKNN6NMi0eMFH9JlDK1xDGiHtdz6h5lmDitxCscqXXrQwhMRfe+RdfBg9 4KpJ2qqej7zrRUU/Bkrj1gG+LMEhCN84DgQsrctPKHlV87W0VYDImCj5QtIJFwoifp09 W3Kw== X-Gm-Message-State: AFqh2kp5lAaj3Qir4NrWPrLHokv9CLqqYmbzyin22Xncj1PGtQS25jXO InFC/IqM7Ibj3UrF9G/ZbWR68MVlXvs= X-Google-Smtp-Source: AMrXdXvgUq6EjRFocmsqWWNYsaDfvnRpwxtDOvRgwbUsxotT8lr9AYnAZhcBp5dpibOm+JJmCxh8IQ== X-Received: by 2002:a0c:b2d4:0:b0:534:8f10:e1a with SMTP id d20-20020a0cb2d4000000b005348f100e1amr29409884qvf.0.1674337189583; Sat, 21 Jan 2023 13:39:49 -0800 (PST) From: Jason Andryuk To: xen-devel@lists.xenproject.org Cc: Jason Andryuk , Wei Liu , Juergen Gross , Julien Grall , Anthony PERARD Subject: [PATCH 2/2] Revert "tools/xenstore: simplify loop handling connection I/O" Date: Sat, 21 Jan 2023 16:39:08 -0500 Message-Id: <20230121213908.6504-3-jandryuk@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230121213908.6504-1-jandryuk@gmail.com> References: <20230121213908.6504-1-jandryuk@gmail.com> MIME-Version: 1.0 I'm observing guest kexec trigger xenstored to abort on a double free. gdb output: Program received signal SIGABRT, Aborted. __pthread_kill_implementation (no_tid=0, signo=6, threadid=140645614258112) at ./nptl/pthread_kill.c:44 44 ./nptl/pthread_kill.c: No such file or directory. (gdb) bt at ./nptl/pthread_kill.c:44 at ./nptl/pthread_kill.c:78 at ./nptl/pthread_kill.c:89 at ../sysdeps/posix/raise.c:26 at talloc.c:119 ptr=ptr@entry=0x559fae724290) at talloc.c:232 at xenstored_core.c:2945 (gdb) frame 5 at talloc.c:119 119 TALLOC_ABORT("Bad talloc magic value - double free"); (gdb) frame 7 at xenstored_core.c:2945 2945 talloc_increase_ref_count(conn); (gdb) p conn $1 = (struct connection *) 0x559fae724290 Looking at a xenstore trace, we have: IN 0x559fae71f250 20230120 17:40:53 READ (/local/domain/3/image/device-model-dom id ) wrl: dom 0 1 msec 10000 credit 1000000 reserve 100 disc ard wrl: dom 3 1 msec 10000 credit 1000000 reserve 100 disc ard wrl: dom 0 0 msec 10000 credit 1000000 reserve 0 disc ard wrl: dom 3 0 msec 10000 credit 1000000 reserve 0 disc ard OUT 0x559fae71f250 20230120 17:40:53 ERROR (ENOENT ) wrl: dom 0 1 msec 10000 credit 1000000 reserve 100 disc ard wrl: dom 3 1 msec 10000 credit 1000000 reserve 100 disc ard IN 0x559fae71f250 20230120 17:40:53 RELEASE (3 ) DESTROY watch 0x559fae73f630 DESTROY watch 0x559fae75ddf0 DESTROY watch 0x559fae75ec30 DESTROY watch 0x559fae75ea60 DESTROY watch 0x559fae732c00 DESTROY watch 0x559fae72cea0 DESTROY watch 0x559fae728fc0 DESTROY watch 0x559fae729570 DESTROY connection 0x559fae724290 orphaned node /local/domain/3/device/suspend/event-channel deleted orphaned node /local/domain/3/device/vbd/51712 deleted orphaned node /local/domain/3/device/vkbd/0 deleted orphaned node /local/domain/3/device/vif/0 deleted orphaned node /local/domain/3/control/shutdown deleted orphaned node /local/domain/3/control/feature-poweroff deleted orphaned node /local/domain/3/control/feature-reboot deleted orphaned node /local/domain/3/control/feature-suspend deleted orphaned node /local/domain/3/control/feature-s3 deleted orphaned node /local/domain/3/control/feature-s4 deleted orphaned node /local/domain/3/control/sysrq deleted orphaned node /local/domain/3/data deleted orphaned node /local/domain/3/drivers deleted orphaned node /local/domain/3/feature deleted orphaned node /local/domain/3/attr deleted orphaned node /local/domain/3/error deleted orphaned node /local/domain/3/console/backend-id deleted and no further output. The trace shows that DESTROY was called for connection 0x559fae724290, but that is the same pointer (conn) main() was looping through from connections. So it wasn't actually removed from the connections list? Reverting commit e8e6e42279a5 "tools/xenstore: simplify loop handling connection I/O" fixes the abort/double free. I think the use of list_for_each_entry_safe is incorrect. list_for_each_entry_safe makes traversal safe for deleting the current iterator, but RELEASE/do_release will delete some other entry in the connections list. I think the observed abort is because list_for_each_entry has next pointing to the deleted connection, and it is used in the subsequent iteration. Add a comment explaining the unsuitability of list_for_each_entry_safe. Also notice that the old code takes a reference on next which would prevents a use-after-free. This reverts commit e8e6e42279a5723239c5c40ba4c7f579a979465d. Signed-off-by: Jason Andryuk Reviewed-by: Juergen Gross --- I didn't verify the stale pointers, which is why there are a lot of "I think" qualifiers. But reverting the commit has xenstored still running whereas it was aborting consistently beforehand. --- tools/xenstore/xenstored_core.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c index 78a3edaa4e..029e3852fc 100644 --- a/tools/xenstore/xenstored_core.c +++ b/tools/xenstore/xenstored_core.c @@ -2941,8 +2941,23 @@ int main(int argc, char *argv[]) } } - list_for_each_entry_safe(conn, next, &connections, list) { - talloc_increase_ref_count(conn); + /* + * list_for_each_entry_safe is not suitable here because + * handle_input may delete entries besides the current one, but + * those may be in the temporary next which would trigger a + * use-after-free. list_for_each_entry_safe is only safe for + * deleting the current entry. + */ + next = list_entry(connections.next, typeof(*conn), list); + if (&next->list != &connections) + talloc_increase_ref_count(next); + while (&next->list != &connections) { + conn = next; + + next = list_entry(conn->list.next, + typeof(*conn), list); + if (&next->list != &connections) + talloc_increase_ref_count(next); if (conn_can_read(conn)) handle_input(conn);