From patchwork Mon Nov 21 23:16:19 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Pandruvada, Srinivas" X-Patchwork-Id: 9440289 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 60636606DB for ; Mon, 21 Nov 2016 23:16:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 54B2D289ED for ; Mon, 21 Nov 2016 23:16:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4859E28AE3; Mon, 21 Nov 2016 23:16:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DCB45289ED for ; Mon, 21 Nov 2016 23:16:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754500AbcKUXQW (ORCPT ); Mon, 21 Nov 2016 18:16:22 -0500 Received: from mga04.intel.com ([192.55.52.120]:24905 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754345AbcKUXQV (ORCPT ); Mon, 21 Nov 2016 18:16:21 -0500 Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga104.fm.intel.com with ESMTP; 21 Nov 2016 15:16:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,677,1473145200"; d="scan'208";a="34168650" Received: from orsmsx110.amr.corp.intel.com ([10.22.240.8]) by fmsmga005.fm.intel.com with ESMTP; 21 Nov 2016 15:16:20 -0800 Received: from orsmsx154.amr.corp.intel.com (10.22.226.12) by ORSMSX110.amr.corp.intel.com (10.22.240.8) with Microsoft SMTP Server (TLS) id 14.3.248.2; Mon, 21 Nov 2016 15:16:20 -0800 Received: from orsmsx109.amr.corp.intel.com ([169.254.11.207]) by ORSMSX154.amr.corp.intel.com ([10.22.226.12]) with mapi id 14.03.0248.002; Mon, 21 Nov 2016 15:16:19 -0800 From: "Pandruvada, Srinivas" To: "tglx@linutronix.de" CC: "linux-kernel@vger.kernel.org" , "peterz@infradead.org" , "Zhang, Rui" , "rt@linutronix.de" , "edubezval@gmail.com" , "linux-pm@vger.kernel.org" , "x86@kernel.org" , "bp@alien8.de" Subject: Re: [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Thread-Topic: [patch 00/12] thermal/x86_pkg_temp: Sanitize yet another hotplug and locking trainwreck Thread-Index: AQHSRDImnrG09OBjwkOd+VyhmKijTqDke9YAgAAcZ4A= Date: Mon, 21 Nov 2016 23:16:19 +0000 Message-ID: <1479770177.6544.195.camel@intel.com> References: <20161117231435.891545908@linutronix.de> <1479758532.6544.169.camel@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.54.75.13] Content-ID: <19104B0AEB65BB40B49E3696E6987407@intel.com> MIME-Version: 1.0 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, 2016-11-21 at 22:34 +0100, Thomas Gleixner wrote: > On Mon, 21 Nov 2016, Pandruvada, Srinivas wrote: [...] > Stupid me. I tested putting a socket offline, which works, but did > not > check what happens on module removal. Delta fix below. That needs to > be > folded into the series as the wreckage already happens before the > last > patch. Your change below fixes the crash issue. Now I tested a case where the last cpu offlined from a package, it removed thermal zone and added zone back once any cpu from the package onlined. So this is working. I want to try to run some workload on those cpu to bump up the temperature and check interrupts. I am hitting some issue unrelated to this change may be. I onlined three cpus from the package 1. [189443.567728] smpboot: Booting Node 1 Processor 15 APIC 0x2e [189656.625947] smpboot: Booting Node 1 Processor 8 APIC 0x20 [189829.545851] smpboot: Booting Node 1 Processor 24 APIC 0x21 But I can't schedule anything on those CPUs. For example now can't run turbostat, it complains " turbostat: re-initialized with num_cpus 19 Could not migrate to CPU 8 " Same with #taskset 0x100 stress -c 1 taskset: failed to set pid 0's affinity: Invalid argument I am on the latest linux-pm/linux-next tree on this server. I will switch to latest main line and try. Thanks, Srinivas 8<--------------------         spin_unlock_irq(&pkg_temp_lock); @@ -399,13 +401,15 @@ static int pkg_temp_thermal_device_add(u    static int pkg_thermal_cpu_offline(unsigned int cpu)  { -       int target = cpumask_any_but(topology_core_cpumask(cpu), cpu);         struct pkg_device *pkgdev = pkg_temp_thermal_get_dev(cpu);         bool lastcpu, was_target; +       int target;           if (!pkgdev)                 return 0;   +       target = cpumask_any_but(&pkgdev->cpumask, cpu); +       cpumask_clear_cpu(cpu, &pkgdev->cpumask);         lastcpu = target >= nr_cpu_ids;           /* @@ -492,8 +496,10 @@ static int pkg_thermal_cpu_online(unsign                 return -ENODEV;           /* If the package exists, nothing to do */ -       if (pkgdev) +       if (pkgdev) { +               cpumask_set_cpu(cpu, &pkgdev->cpumask);                 return 0; +       }         return pkg_temp_thermal_device_add(cpu);  } --- a/drivers/thermal/x86_pkg_temp_thermal.c +++ b/drivers/thermal/x86_pkg_temp_thermal.c @@ -63,6 +63,7 @@ struct pkg_device {         u32                             msr_pkg_therm_high;         struct delayed_work             work;         struct thermal_zone_device      *tzone; +       struct cpumask                  cpumask;  };    static struct thermal_zone_params pkg_temp_tz_params = { @@ -391,6 +392,7 @@ static int pkg_temp_thermal_device_add(u         rdmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT, pkgdev- >msr_pkg_therm_low,               pkgdev->msr_pkg_therm_high);   +       cpumask_set_cpu(cpu, &pkgdev->cpumask);         spin_lock_irq(&pkg_temp_lock);         packages[pkgid] = pkgdev;