[ANNOUNCE] LLVM backend for Sparse

Message ID	CANeU7Q=PUDGGm9G6PnXvW3QeBnnMP39mfEE=HvszvBeAvrA4yQ@mail.gmail.com (mailing list archive)
State	Not Applicable, archived
Headers	show Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter1.kernel.org (8.14.4/8.14.4) with ESMTP id p7S8poe1007029 for <patchwork-sparse@patchwork.kernel.org>; Sun, 28 Aug 2011 08:51:50 GMT MIME-Version: 1.0 In-Reply-To: <alpine.DEB.2.00.1108280904070.7603@localhost6.localdomain6> References: <alpine.DEB.2.00.1108280904070.7603@localhost6.localdomain6> Date: Sun, 28 Aug 2011 01:51:47 -0700 Message-ID: <CANeU7Q=PUDGGm9G6PnXvW3QeBnnMP39mfEE=HvszvBeAvrA4yQ@mail.gmail.com> Subject: Re: [ANNOUNCE] LLVM backend for Sparse From: Christopher Li <sparse@chrisli.org> To: Pekka Enberg <penberg@kernel.org> Cc: linux-sparse@vger.kernel.org, Jeff Garzik <jeff@garzik.org>, Linus Torvalds <torvalds@linux-foundation.org>, Josh Triplett <josh@joshtriplett.org> Content-Type: multipart/mixed; boundary=001636831d56833f8b04ab8ce3ba Sender: linux-sparse-owner@vger.kernel.org Precedence: bulk

Christopher Li Aug. 28, 2011, 8:51 a.m. UTC

On Sat, Aug 27, 2011 at 11:08 PM, Pekka Enberg <penberg@kernel.org> wrote:
> Hi everyone,
>
> Jeff Garzik and myself have been hacking on LLVM backed for Sparse. The
> sources
> are available on Github:
>
>  git clone git://github.com/penberg/sparse-llvm.git
>

Very impressive. That is some sparse 0.5 material.
I will start merging it as soon as I release 0.4.4

I play around with it a little bit, It seems choke on the hello
world program. Shouldn't be hard to fix though.

I attach a patch to limit g++ usage only to llvm related programs.
Currently it use g++ to link other sparse programs.

Chris

Pekka Enberg Aug. 28, 2011, 1:51 p.m. UTC | #1

On Sun, 28 Aug 2011, Christopher Li wrote:
> I attach a patch to limit g++ usage only to llvm related programs.
> Currently it use g++ to link other sparse programs.

Applied, thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Pekka Enberg Aug. 28, 2011, 2:01 p.m. UTC | #2

On Sun, 28 Aug 2011, Christopher Li wrote:
> Very impressive. That is some sparse 0.5 material.
> I will start merging it as soon as I release 0.4.4
>
> I play around with it a little bit, It seems choke on the hello
> world program. Shouldn't be hard to fix though.

It's lack of PSEUDO_SYM support. I haven't quite figured out what kind of 
code to generate for that because, frankly, I'm not completely sure what 
it's all about. ;-)

 			Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Linus Torvalds Aug. 28, 2011, 4:54 p.m. UTC | #3

On Sun, Aug 28, 2011 at 7:01 AM, Pekka Enberg <penberg@kernel.org> wrote:
>
> It's lack of PSEUDO_SYM support. I haven't quite figured out what kind of
> code to generate for that because, frankly, I'm not completely sure what
> it's all about. ;-)

So "PSEUDO_SYM" is just a "link-time constant".

IOW, it's nothing but a pointer to an in-memory variable, and from a
code generation standpoint generating the symbol should be just about
the same as generating a constant, except rather than an actual value,
it's now a linker reference.

Of course, what complicates them is that you also need to generate the
symbol definition itself (somewhere else). In particular, the
sym->initializer points to the initializer of a symbol, and they can
be *complicated*. Generating the output for some symbols can be a
major pain in the *ss, just think about a complex structure array
initializer.

The PSEUDO_SYM you hit for the "hello world" program is trivial,
though. It's an unnamed symbol of type "char []", and it has a trivial
initializer (the string itself). So symbols can be hard to generate,
but there are simple cases.

So for a PSUEDO_SYM, you need to look up "struct symbol" for it
(pseudo->sym), which then has:

 - name of the symbol (sym->ident). Of course, some symbols don't have
names, like the constant string example.

 - type/size/alignment information (sym->ctype)

 - initializer information (sym->initializer - which can be a really
complex expression). Of course, for external symbols - or symbols with
no initializer - this is just empty.

There's way more to symbol types in sparse than that, but apart from
the initializer expression, most of the symbol complexity has been
handled by earlier stages (ie a *lot* of the front-end of sparse is
about symbols and their types).

                          Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Pekka Enberg Aug. 29, 2011, 7:41 a.m. UTC | #4

On Sun, Aug 28, 2011 at 7:01 AM, Pekka Enberg <penberg@kernel.org> wrote:
>> It's lack of PSEUDO_SYM support. I haven't quite figured out what kind of
>> code to generate for that because, frankly, I'm not completely sure what
>> it's all about. ;-)

On Sun, Aug 28, 2011 at 7:54 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> So "PSEUDO_SYM" is just a "link-time constant".
>
> IOW, it's nothing but a pointer to an in-memory variable, and from a
> code generation standpoint generating the symbol should be just about
> the same as generating a constant, except rather than an actual value,
> it's now a linker reference.
>
> Of course, what complicates them is that you also need to generate the
> symbol definition itself (somewhere else). In particular, the
> sym->initializer points to the initializer of a symbol, and they can
> be *complicated*. Generating the output for some symbols can be a
> major pain in the *ss, just think about a complex structure array
> initializer.
>
> The PSEUDO_SYM you hit for the "hello world" program is trivial,
> though. It's an unnamed symbol of type "char []", and it has a trivial
> initializer (the string itself). So symbols can be hard to generate,
> but there are simple cases.
>
> So for a PSUEDO_SYM, you need to look up "struct symbol" for it
> (pseudo->sym), which then has:
>
>  - name of the symbol (sym->ident). Of course, some symbols don't have
> names, like the constant string example.
>
>  - type/size/alignment information (sym->ctype)
>
>  - initializer information (sym->initializer - which can be a really
> complex expression). Of course, for external symbols - or symbols with
> no initializer - this is just empty.
>
> There's way more to symbol types in sparse than that, but apart from
> the initializer expression, most of the symbol complexity has been
> handled by earlier stages (ie a *lot* of the front-end of sparse is
> about symbols and their types).

Right. Will the "link time constant" 'struct symbol' be part of the
symbol_list? If it is, then we already do LLVMAddGlobal in
output_data() on it and could probably just stash that into a ->priv
member in 'struct symbol' and use that in pseudo_to_value().
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christopher Li Aug. 29, 2011, 9:19 a.m. UTC | #5

On 08/29/2011 12:41 AM, Pekka Enberg wrote:
> Right. Will the "link time constant" 'struct symbol' be part of the 
> symbol_list? If it is, then we already do LLVMAddGlobal in 
> output_data() on it and could probably just stash that into a ->priv 
> member in 'struct symbol' and use that in pseudo_to_value(). 

The link time constant is a LLVMValueRef, it is just the address of the 
symbol.
Yes, it will belong to the function scope's symbol_list but not 
necessary the global one.
Here is some tips that help me a lot on my own llvm branch, which lost 
due to a hard
drive crash. It was able to do hello world back then.

LLVM can output c++ source code that construct the llvm byte code.

# this will emit hello.s as the llvm asm code
$ clang -S -emit-llvm hello.c

# this will generate hello.s.cpp which is the c++ source code to construct
# the llvm byte code above

$ llc -march=cpp hello.s

The hello.s.cpp is a very good reference to build the llvm byte code in C.
It is in C++, but convert to llvm C binding should be easy.
BTW, "-march=c" does not generate C bindings, it generate the
equivalent IR in C code instead.

Hope that helps.

Chris

--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Pekka Enberg Aug. 29, 2011, 11:05 a.m. UTC | #6

On 08/29/2011 12:41 AM, Pekka Enberg wrote:
>> Right. Will the "link time constant" 'struct symbol' be part of the
>> symbol_list? If it is, then we already do LLVMAddGlobal in output_data() on
>> it and could probably just stash that into a ->priv member in 'struct
>> symbol' and use that in pseudo_to_value().

On Mon, Aug 29, 2011 at 12:19 PM, Christopher Li <sparse@chrisli.org> wrote:
> The link time constant is a LLVMValueRef, it is just the address of the
> symbol.
> Yes, it will belong to the function scope's symbol_list but not necessary
> the global one.
> Here is some tips that help me a lot on my own llvm branch, which lost due
> to a hard
> drive crash. It was able to do hello world back then.
>
> LLVM can output c++ source code that construct the llvm byte code.
>
> # this will emit hello.s as the llvm asm code
> $ clang -S -emit-llvm hello.c
>
> # this will generate hello.s.cpp which is the c++ source code to construct
> # the llvm byte code above
>
> $ llc -march=cpp hello.s
>
> The hello.s.cpp is a very good reference to build the llvm byte code in C.
> It is in C++, but convert to llvm C binding should be easy.
> BTW, "-march=c" does not generate C bindings, it generate the
> equivalent IR in C code instead.

Yup, I've been doing that online here:

http://llvm.org/demo/test.cgi
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Pekka Enberg Aug. 29, 2011, 2:22 p.m. UTC | #7

On Sat, Aug 27, 2011 at 11:08 PM, Pekka Enberg <penberg@kernel.org> wrote:
>> Jeff Garzik and myself have been hacking on LLVM backed for Sparse. The
>> sources
>> are available on Github:
>>
>>  git clone git://github.com/penberg/sparse-llvm.git

On Sun, Aug 28, 2011 at 11:51 AM, Christopher Li <sparse@chrisli.org> wrote:
> Very impressive. That is some sparse 0.5 material.
> I will start merging it as soon as I release 0.4.4
>
> I play around with it a little bit, It seems choke on the hello
> world program. Shouldn't be hard to fix though.

It's alive!

$ cat validation/backend/hello.c
#include <stdio.h>

int main(int argc, char *argv[])
{
	puts("hello, world");

	return 0;
}

/*
 * check-name: 'hello, world' code generation
 * check-command: ./sparsec -c $file -o tmp.o
 */
$ ./sparsec -c validation/backend/hello.c -o tmp.o && gcc tmp.o && ./a.out
hello, world

You can find details in this commit:

https://github.com/penberg/sparse-llvm/commit/deb11bd4ee6ed46bce6a5e35a69ada519d81a0a8

                        Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Derek M Jones Aug. 29, 2011, 3:14 p.m. UTC | #8

Pekka,

>>> Jeff Garzik and myself have been hacking on LLVM backed for Sparse. The
...
> It's alive!

Congratulations.  It is always great to see a compiler generate
executable code for the first time.

It is useful to have a semantic checker generate executable code
because it provides confirmation that lots of internal processing
is working as intended and therefore increases confidence
that the semantic checks are correct.

Can people allay my concern that this work is not on the slippery
slope leading to sparse becoming the recommended compiler for
building the kernel?

Jeff Garzik Aug. 29, 2011, 3:27 p.m. UTC | #9

On 08/29/2011 10:22 AM, Pekka Enberg wrote:
> On Sat, Aug 27, 2011 at 11:08 PM, Pekka Enberg<penberg@kernel.org>  wrote:
>>> Jeff Garzik and myself have been hacking on LLVM backed for Sparse. The
>>> sources
>>> are available on Github:
>>>
>>>   git clone git://github.com/penberg/sparse-llvm.git
>
> On Sun, Aug 28, 2011 at 11:51 AM, Christopher Li<sparse@chrisli.org>  wrote:
>> Very impressive. That is some sparse 0.5 material.
>> I will start merging it as soon as I release 0.4.4
>>
>> I play around with it a little bit, It seems choke on the hello
>> world program. Shouldn't be hard to fix though.
>
> It's alive!
>
> $ cat validation/backend/hello.c
> #include<stdio.h>
>
> int main(int argc, char *argv[])
> {
> 	puts("hello, world");
>
> 	return 0;
> }

you should be more adventurous, and try varargs!  :)

This also works:

[WARNING: it only works if I disable optimization, for some reason]

[jgarzik@bd sparse-llvm]$ ./sparse-llvm foo.c | llc > foo.s
[jgarzik@bd sparse-llvm]$ gcc -o foo foo.s
[jgarzik@bd sparse-llvm]$ ./foo
hello, world!
[jgarzik@bd sparse-llvm]$ cat foo.c

#include <stdio.h>

int main (int argc, char *argv[])
{
	printf("%s\n", "hello, world!");

	return 0;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jeff Garzik Aug. 29, 2011, 3:33 p.m. UTC | #10

On 08/29/2011 11:14 AM, Derek M Jones wrote:
> Can people allay my concern that this work is not on the slippery
> slope leading to sparse becoming the recommended compiler for
> building the kernel?

I doubt sparse will ever have the testing, support and deployment 
infrastructure of the gcc ecosystem.

	Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Derek M Jones Aug. 29, 2011, 3:42 p.m. UTC | #11

Jeff,

>> Can people allay my concern that this work is not on the slippery
>> slope leading to sparse becoming the recommended compiler for
>> building the kernel?
>
> I doubt sparse will ever have the testing, support and deployment
> infrastructure of the gcc ecosystem.

Exactly.  Unfortunately people's desire to control their world is often
stronger.

Pekka Enberg Aug. 29, 2011, 3:45 p.m. UTC | #12

On Mon, Aug 29, 2011 at 6:42 PM, Derek M Jones <derek@knosof.co.uk> wrote:
>>> Can people allay my concern that this work is not on the slippery
>>> slope leading to sparse becoming the recommended compiler for
>>> building the kernel?
>>
>> I doubt sparse will ever have the testing, support and deployment
>> infrastructure of the gcc ecosystem.

On Mon, Aug 29, 2011 at 6:42 PM, Derek M Jones <derek@knosof.co.uk> wrote:
> Exactly.  Unfortunately people's desire to control their world is often
> stronger.

I doubt sparse will ever become the 'recommended compiler' for Linux
but I fail to see why that would be a bad thing.

                          Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Linus Torvalds Aug. 29, 2011, 4:58 p.m. UTC | #13

On Mon, Aug 29, 2011 at 8:14 AM, Derek M Jones <derek@knosof.co.uk> wrote:
>
> Can people allay my concern that this work is not on the slippery
> slope leading to sparse becoming the recommended compiler for
> building the kernel?

That was never really a goal.

What *was* a goal for me a long time ago was to make sparse be a
"preprocessor" in front of gcc, because the standard C pre-processor
has always been weak and we've always abused it in nasty ways (look at
all the games we play with "sizeof" and "typeof" to make our
"kernel-C" have a kind of generic programming).

Having a semantically aware preprocessor would allow kernel-specific
type extensions (like the "__user" attribute we already use purely for
checking runs), and allow debug facilities like "trace accesses to
this variable" without having to have ugly and unreadable wrappers
etc.

But even that got put on the back-burner when gcc improved the
pre-processor performance. It used to be a separate phase, and quite
slow, and having a quick semantic analysis in front of cc1 that
replaced the gcc preprocessor wouldn't have been a big slow-down. But
with the new (well, no longer new) built-in pre-processor, a sparse
front-end would slow down kernel compiles noticeably.

The main reason I used to want to have a back-end was that without a
back-end you cannot really verify the front-end. You can read
"test-linearize" output all day long, you'll never really get very
far. Once you generate code, and can use the thing to actually compile
real programs, that gives you *way* more confidence that you're doing
things right.

                          Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ANNOUNCE] LLVM backend for Sparse

Commit Message

Comments

Patch