From patchwork Wed Oct  5 02:01:53 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Minchan Kim <minchan@kernel.org>
X-Patchwork-Id: 9362439
Return-Path: <linux-block-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	7AD89600C8 for <patchwork-linux-block@patchwork.kernel.org>;
	Wed,  5 Oct 2016 02:02:01 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5A5282873B
	for <patchwork-linux-block@patchwork.kernel.org>;
	Wed,  5 Oct 2016 02:02:01 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 3786128763; Wed,  5 Oct 2016 02:02:01 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
	autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EBC7F2873B
	for <patchwork-linux-block@patchwork.kernel.org>;
	Wed,  5 Oct 2016 02:01:59 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753973AbcJECB6 (ORCPT
	<rfc822;patchwork-linux-block@patchwork.kernel.org>);
	Tue, 4 Oct 2016 22:01:58 -0400
Received: from LGEAMRELO11.lge.com ([156.147.23.51]:32876 "EHLO
	lgeamrelo11.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753768AbcJECB6 (ORCPT
	<rfc822; linux-block@vger.kernel.org>); Tue, 4 Oct 2016 22:01:58 -0400
Received: from unknown (HELO lgeamrelo04.lge.com) (156.147.1.127)
	by 156.147.23.51 with ESMTP; 5 Oct 2016 11:01:55 +0900
X-Original-SENDERIP: 156.147.1.127
X-Original-MAILFROM: minchan@kernel.org
Received: from unknown (HELO bbox) (10.177.223.161)
	by 156.147.1.127 with ESMTP; 5 Oct 2016 11:01:54 +0900
X-Original-SENDERIP: 10.177.223.161
X-Original-MAILFROM: minchan@kernel.org
Date: Wed, 5 Oct 2016 11:01:53 +0900
From: Minchan Kim <minchan@kernel.org>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>, Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: Re: [PATCH 2/3] zram: support page-based parallel write
Message-ID: <20161005020153.GA2988@bbox>
References: <1474526565-6676-1-git-send-email-minchan@kernel.org>
	<1474526565-6676-2-git-send-email-minchan@kernel.org>
	<20160929031831.GA1175@swordfish> <20160930055221.GA16293@bbox>
	<20161004044314.GA835@swordfish>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20161004044314.GA835@swordfish>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Hi Sergey,

On Tue, Oct 04, 2016 at 01:43:14PM +0900, Sergey Senozhatsky wrote:

< snip >

> TEST
> ****
> 
> new tests results; same tests, same conditions, same .config.
> 4-way test:
> - BASE zram, fio direct=1
> - BASE zram, fio fsync_on_close=1
> - NEW zram, fio direct=1
> - NEW zram, fio fsync_on_close=1
> 
> 
> 
> and what I see is that:
>  - new zram is x3 times slower when we do a lot of direct=1 IO
> and
>  - 10% faster when we use buffered IO (fsync_on_close); but not always;
>    for instance, test execution time is longer (a reproducible behavior)
>    when the number of jobs equals the number of CPUs - 4.
> 
> 
> 
> if flushing is a problem for new zram during direct=1 test, then I would
> assume that writing a huge number of small files (creat/write 4k/close)
> would probably have same fsync_on_close=1 performance as direct=1.
> 
> 
> ENV
> ===
> 
>    x86_64 SMP (4 CPUs), "bare zram" 3g, lzo, static compression buffer.
> 
> 
> TEST COMMAND
> ============
> 
>   ZRAM_SIZE=3G ZRAM_COMP_ALG=lzo LOG_SUFFIX={NEW, OLD} FIO_LOOPS=2 ./zram-fio-test.sh
> 
> 
> EXECUTED TESTS
> ==============
> 
>   - [seq-read]
>   - [rand-read]
>   - [seq-write]
>   - [rand-write]
>   - [mixed-seq]
>   - [mixed-rand]
> 
> 
> fio-perf-o-meter.sh test-fio-zram-OLD test-fio-zram-OLD-flush test-fio-zram-NEW test-fio-zram-NEW-flush
> Processing test-fio-zram-OLD
> Processing test-fio-zram-OLD-flush
> Processing test-fio-zram-NEW
> Processing test-fio-zram-NEW-flush
> 
>                 BASE             BASE              NEW             NEW
>                 direct=1         fsync_on_close=1  direct=1        fsync_on_close=1
> 
> #jobs1                         	                	                	                
> READ:           2345.1MB/s	 2177.2MB/s	 2373.2MB/s	 2185.8MB/s
> READ:           1948.2MB/s	 1417.7MB/s	 1987.7MB/s	 1447.4MB/s
> WRITE:          1292.7MB/s	 1406.1MB/s	 275277KB/s	 1521.1MB/s
> WRITE:          1047.5MB/s	 1143.8MB/s	 257140KB/s	 1202.4MB/s
> READ:           429530KB/s	 779523KB/s	 175450KB/s	 782237KB/s
> WRITE:          429840KB/s	 780084KB/s	 175576KB/s	 782800KB/s
> READ:           414074KB/s	 408214KB/s	 164091KB/s	 383426KB/s
> WRITE:          414402KB/s	 408539KB/s	 164221KB/s	 383730KB/s


I tested your benchmark for job 1 on my 4 CPU mahcine with this diff.

Nothing different.

1. just changed ordering of test execution - hope to reduce testing time due to
   block population before the first reading or reading just zero pages
2. used sync_on_close instead of direct io
3. Don't use perf to avoid noise
4. echo 0 > /sys/block/zram0/use_aio to test synchronous IO for old behavior


And got following result.

1. ZRAM_SIZE=3G ZRAM_COMP_ALG=lzo LOG_SUFFIX=async FIO_LOOPS=2 MAX_ITER=1 ./zram-fio-test.sh
2. modify script to disable aio via /sys/block/zram0/use_aio
   ZRAM_SIZE=3G ZRAM_COMP_ALG=lzo LOG_SUFFIX=sync FIO_LOOPS=2 MAX_ITER=1 ./zram-fio-test.sh

      seq-write     380930     474325     124.52%
     rand-write     286183     357469     124.91%
       seq-read     266813     265731      99.59%
      rand-read     211747     210670      99.49%
   mixed-seq(R)     145750     171232     117.48%
   mixed-seq(W)     145736     171215     117.48%
  mixed-rand(R)     115355     125239     108.57%
  mixed-rand(W)     115371     125256     108.57%

LZO compression is fast and a CPU for queueing while 3 CPU for compressing
it cannot saturate CPU full bandwidth. Nonetheless, it shows 24% enhancement.
It could be more in slow CPU like embedded.

I tested it with deflate. The result is 300% enhancement.

      seq-write      33598     109882     327.05%
     rand-write      32815     102293     311.73%
       seq-read     154323     153765      99.64%
      rand-read     129978     129241      99.43%
   mixed-seq(R)      15887      44995     283.22%
   mixed-seq(W)      15885      44990     283.22%
  mixed-rand(R)      25074      55491     221.31%
  mixed-rand(W)      25078      55499     221.31%

So, curious with your test.
Am my test sync with yours? If you cannot see enhancment in job1, could
you test with deflate? It seems your CPU is really fast.
---
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/conf/fio-template-static-buffer b/conf/fio-template-static-buffer
index 1a9a473..22ddee8 100644
--- a/conf/fio-template-static-buffer
+++ b/conf/fio-template-static-buffer
@@ -1,7 +1,7 @@
 [global]
 bs=${BLOCK_SIZE}k
 ioengine=sync
-direct=1
+fsync_on_close=1
 nrfiles=${NRFILES}
 size=${SIZE}
 numjobs=${NUMJOBS}
@@ -14,18 +14,18 @@ new_group
 group_reporting
 threads=1
 
-[seq-read]
-rw=read
-
-[rand-read]
-rw=randread
-
 [seq-write]
 rw=write
 
 [rand-write]
 rw=randwrite
 
+[seq-read]
+rw=read
+
+[rand-read]
+rw=randread
+
 [mixed-seq]
 rw=rw
 
diff --git a/zram-fio-test.sh b/zram-fio-test.sh
index 39c11b3..ca2d065 100755
--- a/zram-fio-test.sh
+++ b/zram-fio-test.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
 
 
 # Sergey Senozhatsky. sergey.senozhatsky@gmail.com
@@ -37,6 +37,7 @@ function create_zram
 	echo $ZRAM_COMP_ALG > /sys/block/zram0/comp_algorithm
 	cat /sys/block/zram0/comp_algorithm
 
+	echo 0 > /sys/block/zram0/use_aio
 	echo $ZRAM_SIZE > /sys/block/zram0/disksize
 	if [ $? != 0 ]; then
 		return -1
@@ -137,7 +138,7 @@ function main
 		echo "#jobs$i fio" >> $LOG
 
 		BLOCK_SIZE=4 SIZE=100% NUMJOBS=$i NRFILES=$i FIO_LOOPS=$FIO_LOOPS \
-			$PERF stat -o $LOG-perf-stat $FIO ./$FIO_TEMPLATE >> $LOG
+			$FIO ./$FIO_TEMPLATE > $LOG
 
 		echo -n "perfstat jobs$i" >> $LOG
 		cat $LOG-perf-stat >> $LOG