diff mbox

[RFC,0/4] Add support for LZ4-compressed kernels

Message ID 510F16C9.2060901@oberhumer.com (mailing list archive)
State New, archived
Headers show

Commit Message

Markus F.X.J. Oberhumer Feb. 4, 2013, 2:02 a.m. UTC
On 2013-01-30 11:23, Johannes Stezenbach wrote:
> On Mon, Jan 28, 2013 at 11:29:14PM -0500, Nicolas Pitre wrote:
>> On Mon, 28 Jan 2013, Andrew Morton wrote:
>>
>>> On Sat, 26 Jan 2013 14:50:43 +0900
>>> Kyungsik Lee <kyungsik.lee@lge.com> wrote:
>>>
>>>> This patchset is for supporting LZ4 compressed kernel and initial ramdisk on
>>>> the x86 and ARM architectures.
>>>>
>>>> According to http://code.google.com/p/lz4/, LZ4 is a very fast lossless
>>>> compression algorithm and also features an extremely fast decoder.
>>>>
>>>> Kernel Decompression APIs are based on implementation by Yann Collet
>>>> (http://code.google.com/p/lz4/source/checkout).
>>>> De/compression Tools are also provided from the site above.
>>>>
>>>> The initial test result on ARM(v7) based board shows that the size of kernel
>>>> with LZ4 compressed is 8% bigger than LZO compressed  but the decompressing
>>>> speed is faster(especially under the enabled unaligned memory access).
>>>>
>>>> Test: 3.4 based kernel built with many modules
>>>> Uncompressed kernel size: 13MB
>>>> lzo: 6.3MB, 301ms
>>>> lz4: 6.8MB, 251ms(167ms, with enabled unaligned memory access)
>>>
>>> What's this "with enabled unaligned memory access" thing?  You mean "if
>>> the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"?  If so,
>>> that's only x86, which isn't really in the target market for this
>>> patch, yes?
>>
>> I'm guessing this is referring to commit 5010192d5a.
>>
>>> It's a lot of code for a 50ms boot-time improvement.  Does anyone have
>>> any opinions on whether or not the benefits are worth the cost?
>>
>> Well, we used to have only one compressed format.  Now we have nearly 
>> half a dozen, with the same worthiness issue between themselves.  
>> Either we keep it very simple, or we make it very flexible.  The former 
>> would argue in favor of removing some of the existing formats, the later 
>> would let this new format in.
> 
> This reminded me to check the status of the lzo update and it
> seems it got lost?
> http://lkml.org/lkml/2012/10/3/144

The proposed LZO update currently lives in the linux-next tree.

I had tried several times during the last 12 months to provide an update
of the kernel LZO version, but community interest seemed low and I
basically got no feedback about performance improvements - which made
we wonder if people actually care.

At least akpm did approve the LZO update for inclusion into 3.7, but the code
still has not been merged into the main tree.
  > On 2012-10-09 21:26, Andrew Morton wrote:
  > [...]
  > The changes look OK to me.  Please ask Stephen to include the tree in
  > linux-next, for a 3.7 merge.

Well, this probably means I have done a rather poor marketing. Anyway, as
people seem to love *synthetic* benchmarks I'm finally posting some timings
(including a brand new ARM unaligned version - this is just a quick hack which
probably still can get optimized further).

Hopefully publishing these numbers will help arousing more interest. :-)

Cheers,
Markus


x86_64 (Sandy Bridge), gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

                   compression speed   decompression speed

  LZO-2005    :         150 MB/sec          468 MB/sec
  LZO-2012    :         434 MB/sec         1210 MB/sec

i386 (Sandy Bridge), gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

                   compression speed   decompression speed

  LZO-2005    :         143 MB/sec          409 MB/sec
  LZO-2012    :         372 MB/sec         1121 MB/sec

armv7 (Cortex-A9), Linaro gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

                   compression speed   decompression speed

  LZO-2005    :          27 MB/sec           84 MB/sec
  LZO-2012    :          44 MB/sec          117 MB/sec
  LZO-2013-UA :          47 MB/sec          167 MB/sec

Legend:

  LZO-2005    : LZO version in current 3.8 rc6 kernel (which is based on
                   the LZO 2.02 release from 2005)
  LZO-2012    : updated LZO version available in linux-next
  LZO-2013-UA : updated LZO version available in linux-next plus
                   ARM Unaligned Access patch (attached below)


> (Cc: added, I hope Markus still cares and someone could
> eventually take his patch once he resends it.)
> 
> Johannes
>

Comments

Russell King - ARM Linux Feb. 4, 2013, 10:50 a.m. UTC | #1
On Mon, Feb 04, 2013 at 03:02:49AM +0100, Markus F.X.J. Oberhumer wrote:
> At least akpm did approve the LZO update for inclusion into 3.7, but the code
> still has not been merged into the main tree.
>   > On 2012-10-09 21:26, Andrew Morton wrote:
>   > [...]
>   > The changes look OK to me.  Please ask Stephen to include the tree in
>   > linux-next, for a 3.7 merge.
> 
> Well, this probably means I have done a rather poor marketing.

I assume this code is sitting in *your* tree?  How do you think it gets
into mainline?

There is no automatic way that code from linux-next gets merged into
mainline.  That is up to the tree owner to make happen, either by getting
their tree into a parent maintainers tree, or if there is none, asking
Linus to pull your tree at the appropriate time.
Johannes Stezenbach Feb. 5, 2013, 11:39 a.m. UTC | #2
On Mon, Feb 04, 2013 at 10:50:52AM +0000, Russell King - ARM Linux wrote:
> On Mon, Feb 04, 2013 at 03:02:49AM +0100, Markus F.X.J. Oberhumer wrote:
> > At least akpm did approve the LZO update for inclusion into 3.7, but the code
> > still has not been merged into the main tree.
> >   > On 2012-10-09 21:26, Andrew Morton wrote:
> >   > [...]
> >   > The changes look OK to me.  Please ask Stephen to include the tree in
> >   > linux-next, for a 3.7 merge.
> > 
> > Well, this probably means I have done a rather poor marketing.
> 
> I assume this code is sitting in *your* tree?  How do you think it gets
> into mainline?
> 
> There is no automatic way that code from linux-next gets merged into
> mainline.  That is up to the tree owner to make happen, either by getting
> their tree into a parent maintainers tree, or if there is none, asking
> Linus to pull your tree at the appropriate time.

My feeling is that in this case it is unneccessarily hard
for an outside contributor to get a patch accepted, all
because get_maintainer.pl doesn't put someone in charge.

Apparently it doesn't work to put all the usual maintainer
responsibilities onto the shoulders of a Linux development novice.
Thus it would be nice if some maintainer would come
forward and offer to handle the patches for Markus.


Thanks,
Johannes
diff mbox

Patch

commit 8745b927fcfcd6953ada9bd1220a73083db5948a
Author: Markus F.X.J. Oberhumer <markus@oberhumer.com>
Date:   Mon Feb 4 02:26:14 2013 +0100

    lib/lzo: huge LZO decompression speedup on ARM by using unaligned access
    
    Signed-off-by: Markus F.X.J. Oberhumer <markus@oberhumer.com>

diff --git a/lib/lzo/lzo1x_decompress_safe.c b/lib/lzo/lzo1x_decompress_safe.c
index 569985d..e3edc5f 100644
--- a/lib/lzo/lzo1x_decompress_safe.c
+++ b/lib/lzo/lzo1x_decompress_safe.c
@@ -72,9 +72,11 @@  copy_literal_run:
 						COPY8(op, ip);
 						op += 8;
 						ip += 8;
+#  if !defined(__arm__)
 						COPY8(op, ip);
 						op += 8;
 						ip += 8;
+#  endif
 					} while (ip < ie);
 					ip = ie;
 					op = oe;
@@ -159,9 +161,11 @@  copy_literal_run:
 					COPY8(op, m_pos);
 					op += 8;
 					m_pos += 8;
+#  if !defined(__arm__)
 					COPY8(op, m_pos);
 					op += 8;
 					m_pos += 8;
+#  endif
 				} while (op < oe);
 				op = oe;
 				if (HAVE_IP(6)) {
diff --git a/lib/lzo/lzodefs.h b/lib/lzo/lzodefs.h
index 5a4beb2..b230601 100644
--- a/lib/lzo/lzodefs.h
+++ b/lib/lzo/lzodefs.h
@@ -12,8 +12,14 @@ 
  */
 
 
+#if 1 && defined(__arm__) && ((__LINUX_ARM_ARCH__ >= 6) || defined(__ARM_FEATURE_UNALIGNED))
+#define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS 1
+#define COPY4(dst, src)	\
+		* (u32 *) (void *) (dst) = * (const u32 *) (const void *) (src)
+#else
 #define COPY4(dst, src)	\
 		put_unaligned(get_unaligned((const u32 *)(src)), (u32 *)(dst))
+#endif
 #if defined(__x86_64__)
 #define COPY8(dst, src)	\
 		put_unaligned(get_unaligned((const u64 *)(src)), (u64 *)(dst))