From patchwork Mon Feb 22 02:52:32 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wen Congyang X-Patchwork-Id: 8370661 Return-Path: X-Original-To: patchwork-xen-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 7F0609F88A for ; Mon, 22 Feb 2016 02:54:21 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id F1DDD203DA for ; Mon, 22 Feb 2016 02:54:19 +0000 (UTC) Received: from lists.xen.org (lists.xenproject.org [50.57.142.19]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1FE002039E for ; Mon, 22 Feb 2016 02:54:18 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xen.org) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aXgbw-0001hb-Tt; Mon, 22 Feb 2016 02:52:32 +0000 Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aXgbp-0001S8-I8 for xen-devel@lists.xen.org; Mon, 22 Feb 2016 02:52:25 +0000 Received: from [85.158.137.68] by server-7.bemta-3.messagelabs.com id D0/FF-04060-8E77AC65; Mon, 22 Feb 2016 02:52:24 +0000 X-Env-Sender: wency@cn.fujitsu.com X-Msg-Ref: server-8.tower-31.messagelabs.com!1456109540!24096346!1 X-Originating-IP: [59.151.112.132] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 7.35.1; banners=-,-,- X-VirusChecked: Checked Received: (qmail 53100 invoked from network); 22 Feb 2016 02:52:22 -0000 Received: from cn.fujitsu.com (HELO heian.cn.fujitsu.com) (59.151.112.132) by server-8.tower-31.messagelabs.com with SMTP; 22 Feb 2016 02:52:22 -0000 X-IronPort-AV: E=Sophos;i="5.20,346,1444665600"; d="scan'208";a="3766086" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 22 Feb 2016 10:52:20 +0800 Received: from G08CNEXCHPEKD01.g08.fujitsu.local (unknown [10.167.33.80]) by cn.fujitsu.com (Postfix) with ESMTP id F084542AC84D; Mon, 22 Feb 2016 10:50:25 +0800 (CST) Received: from G08FNSTD140052.g08.fujitsu.local (10.167.226.52) by G08CNEXCHPEKD01.g08.fujitsu.local (10.167.33.89) with Microsoft SMTP Server (TLS) id 14.3.181.6; Mon, 22 Feb 2016 10:52:16 +0800 From: Wen Congyang To: xen devel , Konrad Rzeszutek Wilk , Andrew Cooper , Ian Campbell , Ian Jackson , Wei Liu Date: Mon, 22 Feb 2016 10:52:32 +0800 Message-ID: <1456109555-28299-29-git-send-email-wency@cn.fujitsu.com> X-Mailer: git-send-email 2.5.0 In-Reply-To: <1456109555-28299-1-git-send-email-wency@cn.fujitsu.com> References: <1456109555-28299-1-git-send-email-wency@cn.fujitsu.com> MIME-Version: 1.0 X-Originating-IP: [10.167.226.52] X-yoursite-MailScanner-ID: F084542AC84D.ABBC7 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: wency@cn.fujitsu.com X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 Cc: Lars Kurth , Changlong Xie , Wen Congyang , Gui Jianfeng , Jiang Yunhong , Dong Eddie , Shriram Rajagopalan , Yang Hongyang Subject: [Xen-devel] [PATCH v10 28/31] COLO nic: implement COLO nic subkind X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP implement COLO nic subkind. Signed-off-by: Yang Hongyang Signed-off-by: Wen Congyang --- tools/hotplug/Linux/Makefile | 1 + tools/hotplug/Linux/colo-proxy-setup | 135 +++++++++++++++ tools/libxl/Makefile | 1 + tools/libxl/libxl_colo_nic.c | 321 +++++++++++++++++++++++++++++++++++ tools/libxl/libxl_internal.h | 5 + tools/libxl/libxl_types.idl | 1 + 6 files changed, 464 insertions(+) create mode 100755 tools/hotplug/Linux/colo-proxy-setup create mode 100644 tools/libxl/libxl_colo_nic.c diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile index 6e10118..9bb852b 100644 --- a/tools/hotplug/Linux/Makefile +++ b/tools/hotplug/Linux/Makefile @@ -28,6 +28,7 @@ XEN_SCRIPTS += block-iscsi XEN_SCRIPTS += block-tap XEN_SCRIPTS += block-drbd-probe XEN_SCRIPTS += $(XEN_SCRIPTS-y) +XEN_SCRIPTS += colo-proxy-setup SUBDIRS-$(CONFIG_SYSTEMD) += systemd diff --git a/tools/hotplug/Linux/colo-proxy-setup b/tools/hotplug/Linux/colo-proxy-setup new file mode 100755 index 0000000..94e2034 --- /dev/null +++ b/tools/hotplug/Linux/colo-proxy-setup @@ -0,0 +1,135 @@ +#! /bin/bash + +dir=$(dirname "$0") +. "$dir/xen-hotplug-common.sh" +. "$dir/hotplugpath.sh" + +findCommand "$@" + +if [ "$command" != "setup" -a "$command" != "teardown" ] +then + echo "Invalid command: $command" + log err "Invalid command: $command" + exit 1 +fi + +evalVariables "$@" + +: ${vifname:?} +: ${forwarddev:?} +: ${mode:?} +: ${index:?} +: ${bridge:?} + +forwardbr="colobr0" + +if [ "$mode" != "primary" -a "$mode" != "secondary" ] +then + echo "Invalid mode: $mode" + log err "Invalid mode: $mode" + exit 1 +fi + +if [ $index -lt 0 ] || [ $index -gt 100 ]; then + echo "index overflow" + exit 1 +fi + +function setup_primary() +{ + do_without_error tc qdisc add dev $vifname root handle 1: prio + do_without_error tc filter add dev $vifname parent 1: protocol ip prio 10 \ + u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $forwarddev + do_without_error tc filter add dev $vifname parent 1: protocol arp prio 11 \ + u32 match u32 0 0 flowid 1:2 action mirred egress mirror dev $forwarddev + do_without_error tc filter add dev $vifname parent 1: protocol ipv6 prio \ + 12 u32 match u32 0 0 flowid 1:2 action mirred egress mirror \ + dev $forwarddev + + do_without_error modprobe nf_conntrack_ipv4 + do_without_error modprobe xt_PMYCOLO sec_dev=$forwarddev + + iptables -t mangle -I PREROUTING -m physdev --physdev-in \ + $vifname -j PMYCOLO --index $index + ip6tables -t mangle -I PREROUTING -m physdev --physdev-in \ + $vifname -j PMYCOLO --index $index + do_without_error arptables -I INPUT -i $forwarddev -j MARK --set-mark $index +} + +function teardown_primary() +{ + do_without_error tc filter del dev $vifname parent 1: protocol ip prio 10 u32 match u32 \ + 0 0 flowid 1:2 action mirred egress mirror dev $forwarddev + do_without_error tc filter del dev $vifname parent 1: protocol arp prio 11 u32 match u32 \ + 0 0 flowid 1:2 action mirred egress mirror dev $forwarddev + do_without_error tc filter del dev $vifname parent 1: protocol ipv6 prio 12 u32 match u32 \ + 0 0 flowid 1:2 action mirred egress mirror dev $forwarddev + do_without_error tc qdisc del dev $vifname root handle 1: prio + + do_without_error iptables -t mangle -D PREROUTING -m physdev --physdev-in \ + $vifname -j PMYCOLO --index $index + do_without_error ip6tables -t mangle -D PREROUTING -m physdev --physdev-in \ + $vifname -j PMYCOLO --index $index + do_without_error arptables -F + do_without_error rmmod xt_PMYCOLO +} + +function setup_secondary() +{ + do_without_error brctl delif $bridge $vifname + do_without_error brctl addbr $forwardbr + do_without_error brctl addif $forwardbr $vifname + do_without_error brctl addif $forwardbr $forwarddev + do_without_error ip link set dev $forwardbr up + do_without_error modprobe xt_SECCOLO + + iptables -t mangle -I PREROUTING -m physdev --physdev-in \ + $vifname -j SECCOLO --index $index + ip6tables -t mangle -I PREROUTING -m physdev --physdev-in \ + $vifname -j SECCOLO --index $index +} + +function teardown_secondary() +{ + do_without_error brctl delif $forwardbr $forwarddev + do_without_error brctl delif $forwardbr $vifname + do_without_error brctl delbr $forwardbr + do_without_error brctl addif $bridge $vifname + + do_without_error iptables -t mangle -D PREROUTING -m physdev --physdev-in \ + $vifname -j SECCOLO --index $index + do_without_error ip6tables -t mangle -D PREROUTING -m physdev --physdev-in \ + $vifname -j SECCOLO --index $index + do_without_error rmmod xt_SECCOLO +} + +case "$command" in + setup) + if [ "$mode" = "primary" ] + then + setup_primary + else + setup_secondary + fi + + success + ;; + teardown) + if [ "$mode" = "primary" ] + then + teardown_primary + else + teardown_secondary + fi + ;; +esac + +if [ "$mode" = "primary" ] +then + log debug "Successful colo-proxy-setup $command for $vifname." \ + " vifname: $vifname, index: $index, forwarddev: $forwarddev." +else + log debug "Successful colo-proxy-setup $command for $vifname." \ + " vifname: $vifname, index: $index, forwarddev: $forwarddev,"\ + " forwardbr: $forwardbr." +fi diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index 6fea9e0..aefc378 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -68,6 +68,7 @@ LIBXL_OBJS-y += libxl_remus.o libxl_checkpoint_device.o libxl_remus_disk_drbd.o LIBXL_OBJS-y += libxl_colo_restore.o libxl_colo_save.o LIBXL_OBJS-y += libxl_colo_qdisk.o LIBXL_OBJS-y += libxl_colo_proxy.o +LIBXL_OBJS-y += libxl_colo_nic.o LIBXL_OBJS-$(CONFIG_X86) += libxl_cpuid.o libxl_x86.o libxl_psr.o LIBXL_OBJS-$(CONFIG_ARM) += libxl_nocpuid.o libxl_arm.o libxl_libfdt_compat.o diff --git a/tools/libxl/libxl_colo_nic.c b/tools/libxl/libxl_colo_nic.c new file mode 100644 index 0000000..998e09c --- /dev/null +++ b/tools/libxl/libxl_colo_nic.c @@ -0,0 +1,321 @@ +/* + * Copyright (C) 2014 FUJITSU LIMITED + * Author: Wen Congyang + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include "libxl_osdeps.h" /* must come before any other headers */ + +#include "libxl_internal.h" + +typedef struct libxl__colo_device_nic { + int devid; + const char *vif; +} libxl__colo_device_nic; + +enum { + primary, + secondary, +}; + + +/* ========== init() and cleanup() ========== */ +int init_subkind_colo_nic(libxl__checkpoint_devices_state *cds) +{ + return 0; +} + +void cleanup_subkind_colo_nic(libxl__checkpoint_devices_state *cds) +{ +} + +/* ========== helper functions ========== */ +static void colo_save_setup_script_cb(libxl__egc *egc, + libxl__async_exec_state *aes, + int rc, int status); +static void colo_save_teardown_script_cb(libxl__egc *egc, + libxl__async_exec_state *aes, + int rc, int status); + +/* + * If the device has a vifname, then use that instead of + * the vifX.Y format. + * it must ONLY be used for remus because if driver domains + * were in use it would constitute a security vulnerability. + */ +static const char *get_vifname(libxl__checkpoint_device *dev, + const libxl_device_nic *nic) +{ + const char *vifname = NULL; + const char *path; + int rc; + + STATE_AO_GC(dev->cds->ao); + + /* Convenience aliases */ + const uint32_t domid = dev->cds->domid; + + path = GCSPRINTF("%s/backend/vif/%d/%d/vifname", + libxl__xs_get_dompath(gc, 0), domid, nic->devid); + rc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname); + if (!rc && !vifname) { + vifname = libxl__device_nic_devname(gc, domid, + nic->devid, + nic->nictype); + } + + return vifname; +} + +/* + * the script needs the following env & args + * $vifname + * $forwarddev + * $mode(primary/secondary) + * $index + * $bridge + * setup/teardown as command line arg. + */ +static void setup_async_exec(libxl__checkpoint_device *dev, char *op, + libxl__colo_proxy_state *cps, int side, + char *colo_proxy_script) +{ + int arraysize, nr = 0; + char **env = NULL, **args = NULL; + libxl__colo_device_nic *colo_nic = dev->concrete_data; + libxl__checkpoint_devices_state *cds = dev->cds; + libxl__async_exec_state *aes = &dev->aodev.aes; + const libxl_device_nic *nic = dev->backend_dev; + + STATE_AO_GC(cds->ao); + + /* Convenience aliases */ + const char *const vif = colo_nic->vif; + + arraysize = 11; + GCNEW_ARRAY(env, arraysize); + env[nr++] = "vifname"; + env[nr++] = libxl__strdup(gc, vif); + env[nr++] = "forwarddev"; + env[nr++] = libxl__strdup(gc, nic->forwarddev); + env[nr++] = "mode"; + if (side == primary) + env[nr++] = "primary"; + else + env[nr++] = "secondary"; + env[nr++] = "index"; + env[nr++] = GCSPRINTF("%d", cps->index); + env[nr++] = "bridge"; + env[nr++] = libxl__strdup(gc, nic->bridge); + env[nr++] = NULL; + assert(nr == arraysize); + + arraysize = 3; nr = 0; + GCNEW_ARRAY(args, arraysize); + args[nr++] = colo_proxy_script; + args[nr++] = op; + args[nr++] = NULL; + assert(nr == arraysize); + + aes->ao = dev->cds->ao; + aes->what = GCSPRINTF("%s %s", args[0], args[1]); + aes->env = env; + aes->args = args; + aes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000; + aes->stdfds[0] = -1; + aes->stdfds[1] = -1; + aes->stdfds[2] = -1; + + if (!strcmp(op, "teardown")) + aes->callback = colo_save_teardown_script_cb; + else + aes->callback = colo_save_setup_script_cb; +} + +/* ========== setup() and teardown() ========== */ +static void colo_nic_setup(libxl__egc *egc, libxl__checkpoint_device *dev, + libxl__colo_proxy_state *cps, int side, + char *colo_proxy_script) +{ + int rc; + libxl__colo_device_nic *colo_nic; + const libxl_device_nic *nic = dev->backend_dev; + + STATE_AO_GC(dev->cds->ao); + + /* + * thers's no subkind of nic devices, so nic ops is always matched + * with nic devices, we begin to setup the nic device + */ + dev->matched = 1; + + if (!nic->forwarddev) { + rc = ERROR_FAIL; + goto out; + } + + GCNEW(colo_nic); + dev->concrete_data = colo_nic; + colo_nic->devid = nic->devid; + colo_nic->vif = get_vifname(dev, nic); + if (!colo_nic->vif) { + rc = ERROR_FAIL; + goto out; + } + + setup_async_exec(dev, "setup", cps, side, colo_proxy_script); + rc = libxl__async_exec_start(&dev->aodev.aes); + if (rc) + goto out; + + return; + +out: + dev->aodev.rc = rc; + dev->aodev.callback(egc, &dev->aodev); +} + +static void colo_save_setup_script_cb(libxl__egc *egc, + libxl__async_exec_state *aes, + int rc, int status) +{ + libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes); + libxl__checkpoint_device *dev = CONTAINER_OF(aodev, *dev, aodev); + libxl__colo_device_nic *colo_nic = dev->concrete_data; + libxl__checkpoint_devices_state *cds = dev->cds; + const char *out_path_base, *hotplug_error = NULL; + + EGC_GC; + + /* Convenience aliases */ + const uint32_t domid = cds->domid; + const int devid = colo_nic->devid; + const char *const vif = colo_nic->vif; + + if (status && !rc) + rc = ERROR_FAIL; + if (rc) + goto out; + + out_path_base = GCSPRINTF("%s/colo_proxy/%d", + libxl__xs_libxl_path(gc, domid), devid); + + rc = libxl__xs_read_checked(gc, XBT_NULL, + GCSPRINTF("%s/hotplug-error", out_path_base), + &hotplug_error); + if (rc) + goto out; + + if (hotplug_error) { + LOG(ERROR, "colo_proxy script %s setup failed for vif %s: %s", + aes->args[0], vif, hotplug_error); + rc = ERROR_FAIL; + goto out; + } + + if (status) { + rc = ERROR_FAIL; + goto out; + } + + rc = 0; + +out: + aodev->rc = rc; + aodev->callback(egc, aodev); +} + +static void colo_nic_teardown(libxl__egc *egc, libxl__checkpoint_device *dev, + libxl__colo_proxy_state *cps, int side, + char *colo_proxy_script) +{ + int rc; + libxl__colo_device_nic *colo_nic = dev->concrete_data; + + if (!colo_nic || !colo_nic->vif) { + /* colo nic has not yet been set up, just return */ + rc = 0; + goto out; + } + + setup_async_exec(dev, "teardown", cps, side, colo_proxy_script); + + rc = libxl__async_exec_start(&dev->aodev.aes); + if (rc) + goto out; + + return; + +out: + dev->aodev.rc = rc; + dev->aodev.callback(egc, &dev->aodev); +} + +static void colo_save_teardown_script_cb(libxl__egc *egc, + libxl__async_exec_state *aes, + int rc, int status) +{ + libxl__ao_device *aodev = CONTAINER_OF(aes, *aodev, aes); + + if (status && !rc) + rc = ERROR_FAIL; + else + rc = 0; + + aodev->rc = rc; + aodev->callback(egc, aodev); +} + +/* ======== primary ======== */ +static void colo_nic_save_setup(libxl__egc *egc, libxl__checkpoint_device *dev) +{ + libxl__colo_save_state *css = dev->cds->concrete_data; + + colo_nic_setup(egc, dev, &css->cps, primary, css->colo_proxy_script); +} + +static void colo_nic_save_teardown(libxl__egc *egc, + libxl__checkpoint_device *dev) +{ + libxl__colo_save_state *css = dev->cds->concrete_data; + + colo_nic_teardown(egc, dev, &css->cps, primary, css->colo_proxy_script); +} + +const libxl__checkpoint_device_instance_ops colo_save_device_nic = { + .kind = LIBXL__DEVICE_KIND_VIF, + .setup = colo_nic_save_setup, + .teardown = colo_nic_save_teardown, +}; + +/* ======== secondary ======== */ +static void colo_nic_restore_setup(libxl__egc *egc, + libxl__checkpoint_device *dev) +{ + libxl__colo_restore_state *crs = dev->cds->concrete_data; + + colo_nic_setup(egc, dev, &crs->cps, secondary, crs->colo_proxy_script); +} + +static void colo_nic_restore_teardown(libxl__egc *egc, + libxl__checkpoint_device *dev) +{ + libxl__colo_restore_state *crs = dev->cds->concrete_data; + + colo_nic_teardown(egc, dev, &crs->cps, secondary, crs->colo_proxy_script); +} + +const libxl__checkpoint_device_instance_ops colo_restore_device_nic = { + .kind = LIBXL__DEVICE_KIND_VIF, + .setup = colo_nic_restore_setup, + .teardown = colo_nic_restore_teardown, +}; diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 3b44b09..3a8c268 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2904,6 +2904,8 @@ int init_subkind_drbd_disk(libxl__checkpoint_devices_state *cds); void cleanup_subkind_drbd_disk(libxl__checkpoint_devices_state *cds); int init_subkind_qdisk(libxl__checkpoint_devices_state *cds); void cleanup_subkind_qdisk(libxl__checkpoint_devices_state *cds); +int init_subkind_colo_nic(libxl__checkpoint_devices_state *cds); +void cleanup_subkind_colo_nic(libxl__checkpoint_devices_state *cds); typedef void libxl__checkpoint_callback(libxl__egc *, libxl__checkpoint_devices_state *, @@ -3125,6 +3127,7 @@ typedef struct libxl__colo_save_state libxl__colo_save_state; struct libxl__colo_save_state { int send_fd; int recv_fd; + char *colo_proxy_script; /* private */ libxl__stream_read_state srs; @@ -3537,6 +3540,7 @@ struct libxl__colo_restore_state { int recv_fd; int hvm; libxl__colo_callback *callback; + char *colo_proxy_script; /* private, colo restore checkpoint state */ libxl__domain_create_cb *saved_cb; @@ -3566,6 +3570,7 @@ struct libxl__domain_create_state { libxl_asyncprogress_how aop_console_how; /* private to domain_create */ int guest_domid; + const char *colo_proxy_script; libxl__domain_build_state build_state; libxl__colo_restore_state crs; libxl__checkpoint_devices_state cds; diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index a2078d1..6b57aba 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -599,6 +599,7 @@ libxl_device_nic = Struct("device_nic", [ ("rate_bytes_per_interval", uint64), ("rate_interval_usecs", uint32), ("gatewaydev", string), + ("forwarddev", string) ]) libxl_device_pci = Struct("device_pci", [