From patchwork Mon Oct 23 18:36:45 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timofey Titovets X-Patchwork-Id: 10022943 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 21473603D7 for ; Mon, 23 Oct 2017 18:37:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0933E28956 for ; Mon, 23 Oct 2017 18:37:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F1DF328946; Mon, 23 Oct 2017 18:37:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID, T_TVD_MIME_EPI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2A64228956 for ; Mon, 23 Oct 2017 18:37:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751609AbdJWSh2 (ORCPT ); Mon, 23 Oct 2017 14:37:28 -0400 Received: from mail-it0-f52.google.com ([209.85.214.52]:48779 "EHLO mail-it0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751566AbdJWSh1 (ORCPT ); Mon, 23 Oct 2017 14:37:27 -0400 Received: by mail-it0-f52.google.com with SMTP id c3so7070666itc.3 for ; Mon, 23 Oct 2017 11:37:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=C8BjkDR7/wvXeg9bj/Q5uKWUqDOGoa7C/k0o3KbgfNc=; b=FAaJFFpeYnDLyA5IVHBoCOrhiahIeNt2/hoAiPsvHasOx8LyH/utlvjF0UlB1OsTlc E712xVvLuWd61do7R7T4oe6ZFkHBvB0Vdd7R5wdoTCsEe6Tb8EJoiWgVQexEvrPelCpX RHoE0mDv6/0W9MxzmsRiiDGSneuSwBB1/4US8EErU/AQsGDGkfUh9fbKyBKmfzK9h0uF VSUj5til0DOYb2Yw1oQ7NUy0bKEtR0Pq5GOjAmiUHIgLpdsP7HBy7UbQmCnmpkijaunk E3e3ndVHX5migi6w9DBUh8tnFlys8eWpL3mchbvf48D3ne9m35lJAq+XhfE7CxhsVZ+h bzsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=C8BjkDR7/wvXeg9bj/Q5uKWUqDOGoa7C/k0o3KbgfNc=; b=r/ndMRsP2KBVswYhOzDiERmmcodHta1ifkGZLz0OBwnwEBabM4SPmh+ZE6ZtJabDr0 ASp1pcV6/LuF9ebaybZiL56hbCYP3wQGnto/RSBLz5JuOt0bwwpZCgWcQ6z7kJWro39v 6zwujhXD+aby5ifhumMTqnCl1pjUxyQurZYEduX/LCo7Qo7UbNSGhMTaRg8djQxohjil Iqk7IbwnHhyFsXz+hW7NlHm0n1niCssZAVqZvolrtAGHLhK/DFbtr8AhbRyQ9ojX1jsm LefeosUJWI7fV/iCEdAsPZQq1AwefputeGvFe8PuztweGcWsi772MPMUausV8x0ro2z/ tX7Q== X-Gm-Message-State: AMCzsaWDM2PWUVPBzfghSX7K2WMXmXlHM3xg/Tx9P3nZdBtv/kTNsGBz V8llN1KgOFwqBL96veObK/LDLBDonNIcjXw1EgA= X-Google-Smtp-Source: ABhQp+T2/ArjOMCLT9e3/h3roMvsidqHG1BAjWmA0Ti9T7mKqJTE7yi6REY3OyZ5ISFEKOOf5dreoLYM0mW/x22APoA= X-Received: by 10.36.28.200 with SMTP id c191mr9356094itc.93.1508783846206; Mon, 23 Oct 2017 11:37:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.2.165.141 with HTTP; Mon, 23 Oct 2017 11:36:45 -0700 (PDT) In-Reply-To: References: <20170928143341.24491-1-nefelim4ag@gmail.com> <20170929162200.GP31640@twin.jikos.cz> <20171019153940.GL3521@twin.jikos.cz> <20171020134511.GQ3521@suse.cz> From: Timofey Titovets Date: Mon, 23 Oct 2017 21:36:45 +0300 Message-ID: Subject: Re: [PATCH v8 0/6] Btrfs: populate heuristic with code To: David Sterba , Timofey Titovets , linux-btrfs Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP 2017-10-22 16:44 GMT+03:00 Timofey Titovets : > 2017-10-20 16:45 GMT+03:00 David Sterba : >> On Fri, Oct 20, 2017 at 01:48:01AM +0300, Timofey Titovets wrote: >>> 2017-10-19 18:39 GMT+03:00 David Sterba : >>> > On Fri, Sep 29, 2017 at 06:22:00PM +0200, David Sterba wrote: >>> >> On Thu, Sep 28, 2017 at 05:33:35PM +0300, Timofey Titovets wrote: >>> >> > Compile tested, hand tested on live system >>> >> > >>> >> > Change v7 -> v8 >>> >> > - All code moved to compression.c (again) >>> >> > - Heuristic workspaces inmplemented another way >>> >> > i.e. only share logic with compression workspaces >>> >> > - Some style fixes suggested by Devid >>> >> > - Move sampling function from heuristic code >>> >> > (I'm afraid of big functions) >>> >> > - Much more comments and explanations >>> >> >>> >> Thanks for the update, I went through the patches and they looked good >>> >> enough to be put into for-next. I may have more comments about a few >>> >> things, but nothing serious that would hinder testing. >>> > >>> > I did a final pass through the patches and edited comments wehre I was >>> > not able to undrerstand them. Please check the updated patches in [1] if >>> > I did not accidentally change the meaning. >>> >>> I don't see a link [1] in mail, may be you missed it? >> >> Yeah, sorry: >> https://github.com/kdave/btrfs-devel/commits/ext/timofey/heuristic > > I did re-read updated comments, looks ok to me > (i only found one typo, leave a comment). > > > Thanks > -- > Have a nice day, > Timofey. Can you please try that patch? (in attach) I think some time about performance hit of heuristic and how to avoid using sorting, That patch will try prefind min/max values (before sorting) in array, and (max - min), used to filter edge data cases where byte core size < 64 or bigger > 200 It's a bit hacky workaround =\, That show a ~same speedup on my data set as show using of radix sort. (i.e. x2 speed up) Thanks. From fb2a329828e64ad0e224a8cb97dbc17147149629 Mon Sep 17 00:00:00 2001 From: Timofey Titovets Date: Mon, 23 Oct 2017 21:24:29 +0300 Subject: [PATCH] Btrfs: heuristic try avoid bucket sorting on edge data cases Heap sort used in kernel are too slow and costly, So let's make some statistic assume about egde input data cases Based on observation of difference between min/max values in bucket. Signed-off-by: Timofey Titovets --- fs/btrfs/compression.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 0ca16909894e..56b67ec4fb5b 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -1310,8 +1310,46 @@ static int byte_core_set_size(struct heuristic_ws *ws) u32 i; u32 coreset_sum = 0; const u32 core_set_threshold = ws->sample_size * 90 / 100; + struct bucket_item *max, *min; + struct bucket_item tmp; struct bucket_item *bucket = ws->bucket; + + /* Presort for find min/max value */ + max = &bucket[0]; + min = &bucket[BUCKET_SIZE - 1]; + for (i = 1; i < BUCKET_SIZE - 1; i++) { + if (bucket[i].count > max->count) { + tmp = *max; + *max = bucket[i]; + bucket[i] = tmp; + } + if (bucket[i].count < min->count) { + tmp = *min; + *min = bucket[i]; + bucket[i] = tmp; + } + } + + /* + * Hacks for avoid sorting on Edge data cases (sorting too constly) + * i.e. that will fast filter easy compressible + * and bad compressible data + * Based on observation of number distribution on different data sets + * + * Assume 1: For bad compressible data distribution between min/max + * will be less then 0.6% of sample size + * + * Assume 2: For good compressible data distribution between min/max + * will be far bigger then 4% of sample size + */ + + if (max->count - min->count < ws->sample_size * 6 / 1000) + return BYTE_CORE_SET_HIGH + 1; + + if (max->count - min->count > ws->sample_size * 4 / 100) + return BYTE_CORE_SET_LOW - 1; + /* Sort in reverse order */ sort(bucket, BUCKET_SIZE, sizeof(*bucket), &bucket_comp_rev, NULL); -- 2.14.2