From patchwork Tue Jul 5 01:07:03 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Huang, Ying" X-Patchwork-Id: 943182 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter2.kernel.org (8.14.4/8.14.4) with ESMTP id p651798t013600 for ; Tue, 5 Jul 2011 01:07:10 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751426Ab1GEBHI (ORCPT ); Mon, 4 Jul 2011 21:07:08 -0400 Received: from mga11.intel.com ([192.55.52.93]:35313 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751624Ab1GEBHH (ORCPT ); Mon, 4 Jul 2011 21:07:07 -0400 Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP; 04 Jul 2011 18:07:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.65,476,1304319600"; d="scan'208";a="26244680" Received: from yhuang-dev.sh.intel.com ([10.239.13.105]) by fmsmga001.fm.intel.com with ESMTP; 04 Jul 2011 18:07:05 -0700 From: Huang Ying To: Len Brown Cc: linux-kernel@vger.kernel.org, Andi Kleen , Tony Luck , ying.huang@intel.com, linux-acpi@vger.kernel.org Subject: [RFC] ACPI, APEI, GHES, Error records content based throttle Date: Tue, 5 Jul 2011 09:07:03 +0800 Message-Id: <1309828023-21842-1-git-send-email-ying.huang@intel.com> X-Mailer: git-send-email 1.7.5.4 Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter2.kernel.org [140.211.167.43]); Tue, 05 Jul 2011 01:07:10 +0000 (UTC) This patch depends on APEI recoverable error printk support patch. -------------------------------------------------------> printk is used by GHES to report hardware errors. Ratelimit is enforced on the printk to avoid too many hardware error reports in kernel log. Because there may be thousands or even millions of corrected hardware errors during system running. Currently, a simple scheme is used. That is, the total number of hardware error reporting is ratelimited. This may cause some issues in practice. For example, there are two kinds of hardware errors occurred in system. One is corrected memory error, because the fault memory address is accessed frequently, there may be hundreds error report per-second. The other is corrected PCIe AER error, it will be reported once per-second. Because they share one ratelimit control structure, it is highly possible that only memory error is reported. To avoid the above issue, an error record content based throttle algorithm is implemented in the patch. Where after the first successful reporting, all error records that are same are throttled for some time, to let other kinds of error records have the opportunity to be reported. In above example, the memory errors will be throttled for some time, after being printked. Then the PCIe AER error will be printked successfully. Signed-off-by: Huang Ying --- drivers/acpi/apei/ghes.c | 177 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 169 insertions(+), 8 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -60,6 +60,18 @@ #define GHES_ESTATUS_POOL_MIN_ALLOC_ORDER 3 +#define GHES_ESTATUS_CACHES_SIZE 4 + +#define GHES_ESTATUS_IN_CACHE_MAX_NSEC (10 * NSEC_PER_SEC) +/* Prevent too many caches are allocated because of RCU */ +#define GHES_ESTATUS_CACHE_ALLOCED_MAX (GHES_ESTATUS_CACHES_SIZE * 3 / 2) + +#define GHES_ESTATUS_CACHE_LEN(estatus_len) \ + (sizeof(struct ghes_estatus_cache) + (estatus_len)) +#define GHES_ESTATUS_FROM_CACHE(estatus_cache) \ + ((struct acpi_hest_generic_status *) \ + ((struct ghes_estatus_cache *)(estatus_cache) + 1)) + #define GHES_ESTATUS_NODE_LEN(estatus_len) \ (sizeof(struct ghes_estatus_node) + (estatus_len)) #define GHES_ESTATUS_FROM_NODE(estatus_node) \ @@ -94,6 +106,14 @@ struct ghes_estatus_node { struct acpi_hest_generic *generic; }; +struct ghes_estatus_cache { + u32 estatus_len; + atomic_t count; + struct acpi_hest_generic *generic; + unsigned long long time_in; + struct rcu_head rcu; +}; + int ghes_disable; module_param_named(disable, ghes_disable, bool, 0); @@ -154,6 +174,9 @@ static unsigned long ghes_estatus_pool_s static struct llist_head ghes_estatus_llist; static struct irq_work ghes_proc_irq_work; +struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE]; +static atomic_t ghes_estatus_cache_alloced; + static int ghes_ioremap_init(void) { ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES, @@ -458,9 +481,9 @@ static void __ghes_print_estatus(const c apei_estatus_print(pfx, estatus); } -static void ghes_print_estatus(const char *pfx, - const struct acpi_hest_generic *generic, - const struct acpi_hest_generic_status *estatus) +static int ghes_print_estatus(const char *pfx, + const struct acpi_hest_generic *generic, + const struct acpi_hest_generic_status *estatus) { /* Not more than 2 messages every 5 seconds */ static DEFINE_RATELIMIT_STATE(ratelimit_corrected, 5*HZ, 2); @@ -471,8 +494,137 @@ static void ghes_print_estatus(const cha ratelimit = &ratelimit_corrected; else ratelimit = &ratelimit_uncorrected; - if (__ratelimit(ratelimit)) + if (__ratelimit(ratelimit)) { __ghes_print_estatus(pfx, generic, estatus); + return 1; + } + return 0; +} + +/* + * GHES error status reporting throttle, to report more kinds of + * errors, instead of just most frequently occurred errors. + */ +static int ghes_estatus_cached(struct acpi_hest_generic_status *estatus) +{ + u32 len; + int i, cached = 0; + unsigned long long now; + struct ghes_estatus_cache *cache; + struct acpi_hest_generic_status *cache_estatus; + + len = apei_estatus_len(estatus); + rcu_read_lock(); + for (i = 0; i < GHES_ESTATUS_CACHES_SIZE; i++) { + cache = rcu_dereference(ghes_estatus_caches[i]); + if (cache == NULL) + continue; + if (len != cache->estatus_len) + continue; + cache_estatus = GHES_ESTATUS_FROM_CACHE(cache); + if (memcmp(estatus, cache_estatus, len)) + continue; + atomic_inc(&cache->count); + now = sched_clock(); + if (now - cache->time_in < GHES_ESTATUS_IN_CACHE_MAX_NSEC) + cached = 1; + break; + } + rcu_read_unlock(); + return cached; +} + +static struct ghes_estatus_cache *ghes_estatus_cache_alloc( + struct acpi_hest_generic *generic, + struct acpi_hest_generic_status *estatus) +{ + int alloced; + u32 len, cache_len; + struct ghes_estatus_cache *cache; + struct acpi_hest_generic_status *cache_estatus; + + alloced = atomic_add_return(1, &ghes_estatus_cache_alloced); + if (alloced > GHES_ESTATUS_CACHE_ALLOCED_MAX) { + atomic_dec(&ghes_estatus_cache_alloced); + return NULL; + } + len = apei_estatus_len(estatus); + cache_len = GHES_ESTATUS_CACHE_LEN(len); + cache = (void *)gen_pool_alloc(ghes_estatus_pool, cache_len); + if (!cache) { + atomic_dec(&ghes_estatus_cache_alloced); + return NULL; + } + cache_estatus = GHES_ESTATUS_FROM_CACHE(cache); + memcpy(cache_estatus, estatus, len); + cache->estatus_len = len; + atomic_set(&cache->count, 0); + cache->generic = generic; + cache->time_in = sched_clock(); + return cache; +} + +static void ghes_estatus_cache_free(struct ghes_estatus_cache *cache) +{ + u32 len; + + len = apei_estatus_len(GHES_ESTATUS_FROM_CACHE(cache)); + len = GHES_ESTATUS_CACHE_LEN(len); + gen_pool_free(ghes_estatus_pool, (unsigned long)cache, len); + atomic_dec(&ghes_estatus_cache_alloced); +} + +static void ghes_estatus_cache_rcu_free(struct rcu_head *head) +{ + struct ghes_estatus_cache *cache; + + cache = container_of(head, struct ghes_estatus_cache, rcu); + ghes_estatus_cache_free(cache); +} + +static void ghes_estatus_cache_add( + struct acpi_hest_generic *generic, + struct acpi_hest_generic_status *estatus) +{ + int i, slot = -1, count; + unsigned long long now, duration, period, max_period = 0; + struct ghes_estatus_cache *cache, *slot_cache = NULL, *new_cache; + + new_cache = ghes_estatus_cache_alloc(generic, estatus); + if (new_cache == NULL) + return; + rcu_read_lock(); + now = sched_clock(); + for (i = 0; i < GHES_ESTATUS_CACHES_SIZE; i++) { + cache = rcu_dereference(ghes_estatus_caches[i]); + if (cache == NULL) { + slot = i; + slot_cache = NULL; + break; + } + duration = now - cache->time_in; + if (duration >= GHES_ESTATUS_IN_CACHE_MAX_NSEC) { + slot = i; + slot_cache = cache; + break; + } + count = atomic_read(&cache->count); + period = duration / (count + 1); + if (period > max_period) { + max_period = period; + slot = i; + slot_cache = cache; + } + } + /* new_cache must be put into array after its contents are written */ + smp_wmb(); + if (slot != -1 && cmpxchg(ghes_estatus_caches + slot, + slot_cache, new_cache) == slot_cache) { + if (slot_cache) + call_rcu(&slot_cache->rcu, ghes_estatus_cache_rcu_free); + } else + ghes_estatus_cache_free(new_cache); + rcu_read_unlock(); } static int ghes_proc(struct ghes *ghes) @@ -482,9 +634,11 @@ static int ghes_proc(struct ghes *ghes) rc = ghes_read_estatus(ghes, 0); if (rc) goto out; - ghes_print_estatus(NULL, ghes->generic, ghes->estatus); + if (!ghes_estatus_cached(ghes->estatus)) { + if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus)) + ghes_estatus_cache_add(ghes->generic, ghes->estatus); + } ghes_do_proc(ghes->estatus); - out: ghes_clear_estatus(ghes); return 0; @@ -546,6 +700,7 @@ static void ghes_proc_in_irq(struct irq_ { struct llist_node *llnode, *next, *tail = NULL; struct ghes_estatus_node *estatus_node; + struct acpi_hest_generic *generic; struct acpi_hest_generic_status *estatus; u32 len, node_len; @@ -569,7 +724,11 @@ static void ghes_proc_in_irq(struct irq_ len = apei_estatus_len(estatus); node_len = GHES_ESTATUS_NODE_LEN(len); ghes_do_proc(estatus); - ghes_print_estatus(NULL, estatus_node->generic, estatus); + if (!ghes_estatus_cached(estatus)) { + generic = estatus_node->generic; + if (ghes_print_estatus(NULL, generic, estatus)) + ghes_estatus_cache_add(generic, estatus); + } gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node, node_len); llnode = next; @@ -621,7 +780,10 @@ static int ghes_notify_nmi(struct notifi #endif if (!(ghes->flags & GHES_TO_CLEAR)) continue; + ghes_clear_estatus(ghes); #ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG + if (ghes_estatus_cached(ghes->estatus)) + continue; /* Save estatus for further processing in IRQ context */ len = apei_estatus_len(ghes->estatus); node_len = GHES_ESTATUS_NODE_LEN(len); @@ -634,7 +796,6 @@ static int ghes_notify_nmi(struct notifi llist_add(&estatus_node->llnode, &ghes_estatus_llist); } #endif - ghes_clear_estatus(ghes); } #ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG irq_work_queue(&ghes_proc_irq_work);