From patchwork Wed Jun 10 13:18:45 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Prarit Bhargava X-Patchwork-Id: 6579851 Return-Path: X-Original-To: patchwork-linux-pm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 8C6E49F3D1 for ; Wed, 10 Jun 2015 13:18:56 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 6F92C20604 for ; Wed, 10 Jun 2015 13:18:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E4992205F9 for ; Wed, 10 Jun 2015 13:18:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754116AbbFJNSv (ORCPT ); Wed, 10 Jun 2015 09:18:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55893 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932809AbbFJNSu (ORCPT ); Wed, 10 Jun 2015 09:18:50 -0400 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (Postfix) with ESMTPS id 5455C2D44F2; Wed, 10 Jun 2015 13:18:49 +0000 (UTC) Received: from praritdesktop.bos.redhat.com (prarit-guest.khw.lab.eng.bos.redhat.com [10.16.186.145]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t5ADImws030819; Wed, 10 Jun 2015 09:18:48 -0400 From: Prarit Bhargava To: linux-kernel@vger.kernel.org Cc: Prarit Bhargava , Kristen Carlson Accardi , "Rafael J. Wysocki" , Viresh Kumar , linux-pm@vger.kernel.org Subject: [PATCH] cpufreq, Fix overflow in busy_scaled due to long delay Date: Wed, 10 Jun 2015 09:18:45 -0400 Message-Id: <1433942325-6610-1-git-send-email-prarit@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP I looked into switching to div64_s64() instead of the 32-bit version in div_fp(), however, this would result in sample_ratio and core_busy returning 0 which is something we don't want. P. ---8<--- The kernel may delay interrupts for a long time which can result in timers being delayed. If this occurs the intel_pstate driver will crash with a divide by zero error: divide error: 0000 [#1] SMP Modules linked in: btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc arc4 md4 nls_utf8 cifs dns_resolver tcp_lp bnep bluetooth rfkill fuse dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ftp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables intel_powerclamp coretemp vfat fat kvm_intel iTCO_wdt iTCO_vendor_support ipmi_devintf sr_mod kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel cdc_ether lrw usbnet cdrom mii gf128mul glue_helper ablk_helper cryptd lpc_ich mfd_core pcspkr sb_edac edac_core ipmi_si ipmi_msghandler ioatdma wmi shpchp acpi_pad nfsd auth_rpcgss nfs_acl lockd uinput dm_multipath sunrpc xfs libcrc32c usb_storage sd_mod crc_t10dif crct10dif_common ixgbe mgag200 syscopyarea sysfillrect sysimgblt mdio drm_kms_helper ttm igb drm ptp pps_core dca i2c_algo_bit megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 113 PID: 0 Comm: swapper/113 Tainted: G W -------------- 3.10.0-229.1.2.el7.x86_64 #1 Hardware name: IBM x3950 X6 -[3837AC2]-/00FN827, BIOS -[A8E112BUS-1.00]- 08/27/2014 task: ffff880fe8abe660 ti: ffff880fe8ae4000 task.ti: ffff880fe8ae4000 RIP: 0010:[] [] intel_pstate_timer_func+0x179/0x3d0 RSP: 0018:ffff883fff4e3db8 EFLAGS: 00010206 RAX: 0000000027100000 RBX: ffff883fe6965100 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000010 RDI: 000000002e53632d RBP: ffff883fff4e3e20 R08: 000e6f69a5a125c0 R09: ffff883fe84ec001 R10: 0000000000000002 R11: 0000000000000005 R12: 00000000000049f5 R13: 0000000000271000 R14: 00000000000049f5 R15: 0000000000000246 FS: 0000000000000000(0000) GS:ffff883fff4e0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7668601000 CR3: 000000000190a000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff883fff4e3e58 ffffffff81099dc1 0000000000000086 0000000000000071 ffff883fff4f3680 0000000000000071 fbdc8a965e33afee ffffffff810b69dd ffff883fe84ec000 ffff883fe6965108 0000000000000100 ffffffff814a9100 Call Trace: [] ? run_posix_cpu_timers+0x51/0x840 [] ? trigger_load_balance+0x5d/0x200 [] ? pid_param_set+0x130/0x130 [] call_timer_fn+0x36/0x110 [] ? pid_param_set+0x130/0x130 [] run_timer_softirq+0x21f/0x320 [] __do_softirq+0xef/0x280 [] call_softirq+0x1c/0x30 [] do_softirq+0x65/0xa0 [] irq_exit+0x115/0x120 [] smp_apic_timer_interrupt+0x45/0x60 [] apic_timer_interrupt+0x6d/0x80 [] ? cpuidle_enter_state+0x52/0xc0 [] ? cpuidle_enter_state+0x48/0xc0 [] cpuidle_idle_call+0xc5/0x200 [] arch_cpu_idle+0xe/0x30 [] cpu_startup_entry+0xf1/0x290 [] start_secondary+0x1ba/0x230 Code: 42 0f 00 45 89 e6 48 01 c2 43 8d 44 6d 00 39 d0 73 26 49 c1 e5 08 89 d2 4d 63 f4 49 63 c5 48 c1 e2 08 48 c1 e0 08 48 63 ca 48 99 <48> f7 f9 48 98 4c 0f af f0 49 c1 ee 08 8b 43 78 c1 e0 08 44 29 RIP [] intel_pstate_timer_func+0x179/0x3d0 RSP The kernel values for cpudata for CPU 113 were: struct cpudata { cpu = 113, timer = { entry = { next = 0x0, prev = 0xdead000000200200 }, expires = 8357799745, base = 0xffff883fe84ec001, function = 0xffffffff814a9100 , data = 18446612406765768960, i_gain = 0, d_gain = 0, deadband = 0, last_err = 22489 }, last_sample_time = { tv64 = 4063132438017305 }, prev_aperf = 287326796397463, prev_mperf = 251427432090198, sample = { core_pct_busy = 23081, aperf = 2937407, mperf = 3257884, freq = 2524484, time = { tv64 = 4063149215234118 } } } which results in the time between samples = last_sample_time - sample.time = 4063149215234118 - 4063132438017305 = 16777216813 which is 16.777 seconds. The duration between reads of the APERF and MPERF registers overflowed a s32 sized integer in intel_pstate_get_scaled_busy()'s call to div_fp(). The result is that int_tofp(duration_us) == 0, and the kernel attempts to divide by 0. While the kernel shouldn't be delaying for a long time, it can and does happen, and the intel_pstate driver should not panic in this situation. This patch checks for an overflow and ignores the current calculation cycle by returning -EINVAL. Since intel_pstate_sample() has been called, subsequent timer function calls will then again pick up the correct calculations and the system will continue functioning properly. Cc: Kristen Carlson Accardi Cc: "Rafael J. Wysocki" Cc: Viresh Kumar Cc: linux-pm@vger.kernel.org Signed-off-by: Prarit Bhargava --- drivers/cpufreq/intel_pstate.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 6414661..6a145e5 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -823,6 +823,9 @@ static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu) sample_time = pid_params.sample_rate_ms * USEC_PER_MSEC; duration_us = (u32) ktime_us_delta(cpu->sample.time, cpu->last_sample_time); + if (duration_us > S32_MAX) + return -EINVAL; + if (duration_us > sample_time * 3) { sample_ratio = div_fp(int_tofp(sample_time), int_tofp(duration_us)); @@ -832,7 +835,7 @@ static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu) return core_busy; } -static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu) +static inline int32_t intel_pstate_adjust_busy_pstate(struct cpudata *cpu) { int32_t busy_scaled; struct _pid *pid; @@ -840,11 +843,15 @@ static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu) pid = &cpu->pid; busy_scaled = intel_pstate_get_scaled_busy(cpu); + if (busy_scaled < 0) + return S32_MAX; ctl = pid_calc(pid, busy_scaled); /* Negative values of ctl increase the pstate and vice versa */ intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - ctl); + + return busy_scaled; } static void intel_hwp_timer_func(unsigned long __data) @@ -859,15 +866,16 @@ static void intel_pstate_timer_func(unsigned long __data) { struct cpudata *cpu = (struct cpudata *) __data; struct sample *sample; + int32_t busy_scaled; intel_pstate_sample(cpu); sample = &cpu->sample; - intel_pstate_adjust_busy_pstate(cpu); + busy_scaled = intel_pstate_adjust_busy_pstate(cpu); trace_pstate_sample(fp_toint(sample->core_pct_busy), - fp_toint(intel_pstate_get_scaled_busy(cpu)), + fp_toint(busy_scaled), cpu->pstate.current_pstate, sample->mperf, sample->aperf,