From patchwork Thu Nov 15 14:27:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Devin Hussey X-Patchwork-Id: 10684443 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0B29613B5 for ; Thu, 15 Nov 2018 14:27:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EAF472C836 for ; Thu, 15 Nov 2018 14:27:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DC10E2C844; Thu, 15 Nov 2018 14:27:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 607DE2C836 for ; Thu, 15 Nov 2018 14:27:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388319AbeKPAf6 (ORCPT ); Thu, 15 Nov 2018 19:35:58 -0500 Received: from mail-vk1-f195.google.com ([209.85.221.195]:45426 "EHLO mail-vk1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729034AbeKPAf6 (ORCPT ); Thu, 15 Nov 2018 19:35:58 -0500 Received: by mail-vk1-f195.google.com with SMTP id n126so4494604vke.12 for ; Thu, 15 Nov 2018 06:27:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=PREy5cVa67Vb868KKRfkTJYc9Tkct8sefdcGD2ogvGM=; b=FAOiZZhWG+UmpCch2tApsmQi/kVNE4RUoYhfkdX7YwxsY85zhS60F92XL9ydj3dvTZ 7cwi08YyLFqFnZfY4EkGSjZ8zc108rCWPYJkLGyCdgCbP0aeFHvedMx/AZSYJHoEeqTm Ze2vmClkizUYGpnEuG5E4vYMuUyclYjatOCxyROdxwGQY9AAt3iEP+xBZkLUB7Vtyt0C zO5NvQzlevwhJFprf47Q+6/53rSx+vYy15KM8HbcwIVllg+EHPoP2cBZivuV1mvc6eUZ NJiAkrtqoMsohCsjUkt7VxFuiuflcWHOsHtS5wQlA6EtihlzLGIAHy6E85bjH+OPzKoO NUBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=PREy5cVa67Vb868KKRfkTJYc9Tkct8sefdcGD2ogvGM=; b=TkY/dsHk5NYvo+HKAYM1lV/pvCf6jqTy87ZQ4ScPZE24MqrMCISA7QrweJE2ZGhTCw WKNIG2h/cLxxnNo0H1ThRqlA+62Wbh4w7RbU7g6SZmWFqAIKDn8ByClRgOWyJ5BwIQDm IPzh3fuCFRkXI70Gm9LYD4Sc8pluL9OH4NhgDU6KTz9btQGGdq0Ys9tPT5HzUcPh/R83 G1giJrmRLfma3PRios3EtOADKYB/hbJfsMKVfelG2v48twNtyZ3l6et/OsT0rptN7gdx M8tHVdja3VI8QdbWSWLlV9ThNQnCnppj7x0tTaFnWa6aqGVFUC6HveUYjwi/qZr8cuf0 alQg== X-Gm-Message-State: AGRZ1gJZLIt/ituYMjafAZeSw5M5y12r2BmtGCzGFm66E8+J5veATJr/ p4U+h8YCowBbTwunYds0oZRzjKcQb5wdS+FIbQT2RndV X-Google-Smtp-Source: AJdET5cPAsQMKMfZKHKBr0rqGABhCWne6cIh088xweYqcfPq99WGn/JIXQY+hqSViqWC+egGx16kW68228nFH21LS3M= X-Received: by 2002:a1f:4301:: with SMTP id q1mr2805858vka.70.1542292074167; Thu, 15 Nov 2018 06:27:54 -0800 (PST) MIME-Version: 1.0 From: Devin Hussey Date: Thu, 15 Nov 2018 09:27:42 -0500 Message-ID: Subject: [PATCH] Use a real non-cryptographic hash algorithm To: dash@vger.kernel.org Sender: dash-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: dash@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From 502cbd61e386eb014035df66b032e90a9de926b7 Mon Sep 17 00:00:00 2001 From: Devin Hussey Date: Thu, 15 Nov 2018 09:12:24 -0500 Subject: [PATCH] Use a real non-cryptographic hash algorithm dash previously used a modified version of the LoseLose algorithm. LoseLose is a TERRIBLE algorithm. It has terrible distribution, leaving many hash entries unused. However, more importantly, it is incredibly easy to make a collision; even something as simple as rearranging the letters. For example. these all produce the hash 1652: Hello Holle Hlleo Hoell Haxed\n HASH\tBAD Hater Collsions in hash tables are very bad because it requires looking through a linked list, and the benefits of O(1) time in a hash table are completely lost. The FNV-1a algorithm has comparable performance and code size while having a much better collision rate. While there are some other faster and more resistant hashes out there, namely xxHash, Murmur2, and CityHash, the benefits are minimal on short strings and they expect a length argument, unlike FNV-1a which simply uses null-termination, and they are not in the public domain like FNV-1a. Signed-off-by: Devin Hussey --- src/alias.c | 14 ++++---------- src/exec.c | 11 +++++------ src/shell.h | 3 +++ src/var.c | 5 ++--- 4 files changed, 14 insertions(+), 19 deletions(-) diff --git a/src/alias.c b/src/alias.c index daeacbb..f625199 100644 --- a/src/alias.c +++ b/src/alias.c @@ -202,19 +202,13 @@ printalias(const struct alias *ap) { STATIC struct alias ** __lookupalias(const char *name) { - unsigned int hashval; struct alias **app; - const char *p; - unsigned int ch; + unsigned int hashval = FNV_OFFSET_BASIS; + const char *p = name; - p = name; + while (*p) + hashval = (hashval ^ (unsigned char)*p++) * FNV_PRIME; - ch = (unsigned char)*p; - hashval = ch << 4; - while (ch) { - hashval += ch; - ch = (unsigned char)*++p; - } app = &atab[hashval % ATABSIZE]; for (; *app; app = &(*app)->next) { diff --git a/src/exec.c b/src/exec.c index 9d0215a..c6b515d 100644 --- a/src/exec.c +++ b/src/exec.c @@ -638,16 +638,15 @@ struct tblentry **lastcmdentry; STATIC struct tblentry * cmdlookup(const char *name, int add) { - unsigned int hashval; - const char *p; struct tblentry *cmdp; struct tblentry **pp; - p = name; - hashval = (unsigned char)*p << 4; + unsigned int hashval = FNV_OFFSET_BASIS; + const char *p = name; + while (*p) - hashval += (unsigned char)*p++; - hashval &= 0x7FFF; + hashval = (hashval ^ (unsigned char)*p++) * FNV_PRIME; + pp = &cmdtable[hashval % CMDTABLESIZE]; for (cmdp = *pp ; cmdp ; cmdp = cmdp->next) { if (equal(cmdp->cmdname, name)) diff --git a/src/shell.h b/src/shell.h index 98edc8b..d08d0d5 100644 --- a/src/shell.h +++ b/src/shell.h @@ -82,6 +82,9 @@ extern char nullstr[1]; /* null string */ #define TRACEV(param) #endif +#define FNV_PRIME 16777619u +#define FNV_OFFSET_BASIS 2166136261u + #if defined(__GNUC__) && __GNUC__ < 3 #define va_copy __va_copy #endif diff --git a/src/var.c b/src/var.c index 0d7e1db..be34731 100644 --- a/src/var.c +++ b/src/var.c @@ -637,11 +637,10 @@ void unsetvar(const char *s) STATIC struct var ** hashvar(const char *p) { - unsigned int hashval; + unsigned int hashval = FNV_OFFSET_BASIS; - hashval = ((unsigned char) *p) << 4; while (*p && *p != '=') - hashval += (unsigned char) *p++; + hashval = (hashval ^ (unsigned char) *p++) + FNV_PRIME; return &vartab[hashval % VTABSIZE]; }