[PATCHv2,0/3] catch non-sign-extended '~' brainos

Message ID	CANeU7QkVj1gm3wQ6r1=akKX9W1cQvE3Un32CD=1UBecFCaq18Q@mail.gmail.com (mailing list archive)
State	Rejected, archived
Headers	show Return-Path: <linux-sparse-owner@kernel.org> MIME-Version: 1.0 In-Reply-To: <20140627111900.GA19158@phil.dovecot.net> References: <1402315082-14102-1-git-send-email-phil@dovecot.fi> <1402386847-23477-1-git-send-email-phil@dovecot.fi> <20140627111900.GA19158@phil.dovecot.net> Date: Fri, 27 Jun 2014 10:16:23 -0700 Message-ID: <CANeU7QkVj1gm3wQ6r1=akKX9W1cQvE3Un32CD=1UBecFCaq18Q@mail.gmail.com> Subject: Re: [PATCHv2 0/3] catch non-sign-extended '~' brainos From: Christopher Li <sparse@chrisli.org> To: Phil Carmody <phil@dovecot.fi> Cc: Josh Triplett <josh@joshtriplett.org>, Linux-Sparse <linux-sparse@vger.kernel.org> Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-sparse-owner@vger.kernel.org Precedence: bulk

Christopher Li June 27, 2014, 5:16 p.m. UTC

On Fri, Jun 27, 2014 at 4:19 AM, Phil Carmody <phil@dovecot.fi> wrote:
>
> Any comments on these?
>

Sorry I am falling behind the patches, again.

I have been thinking about your patch for a few days.

It is very common to implicit cast up, do some binary operation
then cast back down.  e.g.

static unsigned long long val;
static unsigned int ok10  = val | ~1u;
static unsigned char a;
static unsigned char ok11  = val | ~a;

Your patch will give false positive warning about those.

Do you known how many new warning does it give on the new
kernel build after apply your patch? I suspect there is a lot of
false positive. I want to do that but haven't get around to it yet.
Hint hint...

To reduce the false positive, we can check the zero extended bits
survived to the final expression. e.g. the finial expression did
down cast again and lost those bits. I also want an option to
turn this warning off because the possible false positive.

Also, if you only care about warning against up cast "~" of
unsigned type, you can do the checking at cast_to().
It will be simpler. I have a sort patch like this, given that I
did not check for the "unsigned int" type, nor do I check for
binary op. You can still get the idea.

In this way, you don't need to predict the up cast will happen
or not. You can call when the implicit up cast actually happens.

It does not solve the problem of checking down cast  after up cast
of finial expression though.

Chris

                        old->unop = cast_to(old->unop, type);
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Phil Carmody June 30, 2014, 8:56 a.m. UTC | #1

On Fri, Jun 27, 2014 at 10:16:23AM -0700, Christopher Li wrote:
> On Fri, Jun 27, 2014 at 4:19 AM, Phil Carmody <phil@dovecot.fi> wrote:
> >
> > Any comments on these?
> >
> 
> Sorry I am falling behind the patches, again.
> 
> I have been thinking about your patch for a few days.
> 
> It is very common to implicit cast up, do some binary operation
> then cast back down.  e.g.
> 
> 
> static unsigned long long val;
> static unsigned int ok10  = val | ~1u;
> static unsigned char a;
> static unsigned char ok11  = val | ~a;
> 
> 
> Your patch will give false positive warning about those.

Good call.

> Do you known how many new warning does it give on the new
> kernel build after apply your patch? I suspect there is a lot of
> false positive. I want to do that but haven't get around to it yet.
> Hint hint...

It found some things that I had intended to patch. My memory's hazy
though, and there's no point in making wild guesses. I'll do a couple
of builds again, with my patch and with your patch, and compare the
output. My builds will be powerpc ones, so if anyone wants to do an
x86 build it wouldn't be wasted effort.

> To reduce the false positive, we can check the zero extended bits
> survived to the final expression. e.g. the finial expression did
> down cast again and lost those bits. I also want an option to
> turn this warning off because the possible false positive.

The latter would make sense, yes. It's not clear how to achieve the
former though. Either some context needs to be passed down, or a flag
needs to be passed back up. Either could be a bit invasive.

> Also, if you only care about warning against up cast "~" of
> unsigned type, you can do the checking at cast_to().
> It will be simpler. I have a sort patch like this, given that I
> did not check for the "unsigned int" type, nor do I check for
> binary op. You can still get the idea.

It needs to check for unsigned-ness in order to trap zero-extension, though.

> In this way, you don't need to predict the up cast will happen
> or not. You can call when the implicit up cast actually happens.

That probably makes more sense. I was just mimicking the previous 
dubious check.

Will report back in the next day or so what the kernel builds say.

Cheers,
Phil

> It does not solve the problem of checking down cast  after up cast
> of finial expression though.
> 
> Chris
> 
> diff --git a/evaluate.c b/evaluate.c
> index 7ce8c55..47e660e 100644
> --- a/evaluate.c
> +++ b/evaluate.c
> @@ -294,8 +294,11 @@ static struct expression * cast_to(struct
> expression *old, struct symbol *type)
>          */
>         switch (old->type) {
>         case EXPR_PREOP:
> -               if (old->ctype->bit_size < type->bit_size)
> +               if (old->ctype->bit_size < type->bit_size) {
> +                       if (old->op == '~')
> +                               warning(old->pos, "dubious zero-extended '~'");
>                         break;
> +               }
>                 if (old->op == '~') {
>                         old->ctype = type;
>                         old->unop = cast_to(old->unop, type);
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christopher Li June 30, 2014, 5:27 p.m. UTC | #2

On Mon, Jun 30, 2014 at 9:44 AM, Christopher Li <sparse@chrisli.org> wrote:
> That give me 565 new warning. I attach the file here.
>
> I sample some, all the one I saw are false positive.

The real interesting question is, how many real kernel bug in that
565 warnings? It take some human effort to determine the warning
is false positive or not. If most of the warning are false positive, then
the warning is not usable.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Phil Carmody July 1, 2014, 11:30 a.m. UTC | #3

On Mon, Jun 30, 2014 at 09:44:35AM -0700, Christopher Li wrote:
> On Mon, Jun 30, 2014 at 1:56 AM, Phil Carmody <phil@dovecot.fi> wrote:
> >> Also, if you only care about warning against up cast "~" of
> >> unsigned type, you can do the checking at cast_to().
> >> It will be simpler. I have a sort patch like this, given that I
> >> did not check for the "unsigned int" type, nor do I check for
> >> binary op. You can still get the idea.
> >
> > It needs to check for unsigned-ness in order to trap zero-extension, though.
> 
> That is right, it is not a complete patch. Just to show you some ideas.

With unsigned, it becomes a lot quieter, but still much noiser than my
original. (I have a v3 which excludes an idiom using '/'.)

> > Will report back in the next day or so what the kernel builds say.
> 
> I did some test against your patch.
> BTW, I just submit a patch for the kernel to save sparse warnings.
> Subject line starts with: [PATCH] sparse: Add CLOG ....
> 
> With that patch in kernel. Here is what I did in the testing:
> 
> $ make allmodconfig
> $ make -j8 C=2 CLOG=std
> 
> # apply your patch and make sparse
> 
> $ make -j8 C=2 CLOG=std-exp
> $ find -name ".*.std.sparse" | while read -r file; do diff -du $file
> ${file/std.sparse/std-exp.sparse} ; done > /tmp/signed-diff
> 
> $ grep "dubious zero" /tmp/signed-diff | wc -l
> 
> That give me 565 new warning. I attach the file here.
> 
> I sample some, all the one I saw are false positive.
> e.g.
> 
> static inline int nlmsg_msg_size(int payload)
> {
>     return NLMSG_HDRLEN + payload; <--------warn here
> }
> 
> /**
>  * nlmsg_total_size - length of netlink message including padding
>  * @payload: length of message payload
>  */
> static inline int nlmsg_total_size(int payload)
> {
>     return NLMSG_ALIGN(nlmsg_msg_size(payload));
> }
> 
> #define NLMSG_ALIGNTO    4U

I consider that, pedantically, to be worth fixing. alignments are defined
by the C standard to be of type size_t. It should either be ((size_t)4)
if you care about the precise type, or 4 if you don't, but not 4U. The 
'U' adds nothing apart from a way to create an incorrect mask.

Fixing that line removes ~110 out of the ~120 warnings in my powerpc build.

> #define NLMSG_ALIGN(len) ( ((len)+NLMSG_ALIGNTO-1) & ~(NLMSG_ALIGNTO-1) )
> #define NLMSG_HDRLEN     ((int) NLMSG_ALIGN(sizeof(struct nlmsghdr)))
> <------------ sizeof cast up the type
> #define NLMSG_LENGTH(len) ((len) + NLMSG_HDRLEN)

However, that's a good example of where the casting back down should shut
the warning up.

Thanks for the time you've put into this, Chris. I'll do a bit more 
head-scratching. I get the feeling that the zero-extend should taint the
expression, and that down-casts should clear that taint. The problem with
that is that you need to trap absolutely everywhere whate taint might
escape.

Phil

--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christopher Li July 1, 2014, 7:42 p.m. UTC | #4

On Tue, Jul 1, 2014 at 4:30 AM, Phil Carmody <phil@dovecot.fi> wrote:
>
> With unsigned, it becomes a lot quieter, but still much noiser than my
> original. (I have a v3 which excludes an idiom using '/'.)
>

Yes, because it cover more than just binary operations.
If you consider zero extend the unsigned type introducing error,
the binary operation just how you use the error bits.

>> #define NLMSG_ALIGNTO    4U
>
> I consider that, pedantically, to be worth fixing. alignments are defined
> by the C standard to be of type size_t. It should either be ((size_t)4)

((size_t)4) is hell ugly.

> if you care about the precise type, or 4 if you don't, but not 4U. The
> 'U' adds nothing apart from a way to create an incorrect mask.

That is debatable. Enforcing 4U vs 4 is annoying and buy you nothing
most of the case.

I don't consider alignment  being 4U as wrong by itself.  The key question
is how you use it.

if we introduce:

#define ALIGNMASK(align)   (size_t)(~((align) -1))

And we always use the macro to create mask from alignment value.
Then there is no need to enforce signedness of the var passed into the
macro. Hey, it might even read better.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Phil Carmody July 2, 2014, 7:43 a.m. UTC | #5

On Tue, Jul 01, 2014 at 12:42:51PM -0700, Christopher Li wrote:
> On Tue, Jul 1, 2014 at 4:30 AM, Phil Carmody <phil@dovecot.fi> wrote:
> > With unsigned, it becomes a lot quieter, but still much noiser than my
> > original. (I have a v3 which excludes an idiom using '/'.)
> 
> Yes, because it cover more than just binary operations.
> If you consider zero extend the unsigned type introducing error,
> the binary operation just how you use the error bits.
> 
> >> #define NLMSG_ALIGNTO    4U
> >
> > I consider that, pedantically, to be worth fixing. alignments are defined
> > by the C standard to be of type size_t. It should either be ((size_t)4)
> 
> ((size_t)4) is hell ugly.

Do you care about the precise type of the value?
If you do, then it may be ugly, but it's pretty much obligatory.
And if you don't care about the precise type...

> > if you care about the precise type, or 4 if you don't, but not 4U. The
> > 'U' adds nothing apart from a way to create an incorrect mask.
> 
> That is debatable. Enforcing 4U vs 4 is annoying and buy you nothing
> most of the case.

What do you mean by "enforcing"? I consider persuading people to write
simpler, less breakable code to be a good thing.

> I don't consider alignment being 4U as wrong by itself. The key question
> is how you use it.

Absolutely. And doing a ~ and then an up-cast is using it dangerously.
Isn't that what sparse is supposed to be checking for?

> if we introduce:
> 
> #define ALIGNMASK(align)   (size_t)(~((align) -1))
> 
> And we always use the macro to create mask from alignment value.
> Then there is no need to enforce signedness of the var passed into the
> macro. Hey, it might even read better.

Put 4U in that, and it still has the flaw of creating a truncated mask.
Put 4 in and it doesn't.

Do you still want to defend 4U?

Phil

--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christopher Li July 2, 2014, 8:51 a.m. UTC | #6

On Wed, Jul 2, 2014 at 12:43 AM, Phil Carmody <phil@dovecot.fi> wrote:
>
> Do you care about the precise type of the value?
> If you do, then it may be ugly, but it's pretty much obligatory.
> And if you don't care about the precise type...

Yes, I make a mistake on my macro. But alignment can be
precise being unsigned. In a 32 bit system, what if I want to make
a 2G align memory? Your signed alignment will force to be a
negative value. Which is not precise.

>> I don't consider alignment being 4U as wrong by itself. The key question
>> is how you use it.
>
> Absolutely. And doing a ~ and then an up-cast is using it dangerously.
> Isn't that what sparse is supposed to be checking for?

Well, you still need to get the false positive rate down. Currently then
signal to noise ratio is too low. I would not recommend turning it on
by default. Did you find some real bug in that 500 error report? Real
bug I mean it produce actually bug on the object code it generated.

>
>> if we introduce:
>>
>> #define ALIGNMASK(align)   (size_t)(~((align) -1))
>>
>> And we always use the macro to create mask from alignment value.
>> Then there is no need to enforce signedness of the var passed into the
>> macro. Hey, it might even read better.
>
> Put 4U in that, and it still has the flaw of creating a truncated mask.
> Put 4 in and it doesn't.

Right, I make a mistake on that macro. Good catch.  How about

#define ALIGNMASK(align)   (~((size_t)((align) -1)))

Will you be happy if I use 4U with that macro?

My point is, with the right macro, you can still use 4U to create
alignment. We don't have to fight over 4U vs 4.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Phil Carmody July 2, 2014, 9:28 a.m. UTC | #7

On Wed, Jul 02, 2014 at 01:51:25AM -0700, Christopher Li wrote:
> On Wed, Jul 2, 2014 at 12:43 AM, Phil Carmody <phil@dovecot.fi> wrote:
> >
> > Do you care about the precise type of the value?
> > If you do, then it may be ugly, but it's pretty much obligatory.
> > And if you don't care about the precise type...
> 
> Yes, I make a mistake on my macro. But alignment can be
> precise being unsigned. In a 32 bit system, what if I want to make
> a 2G align memory? Your signed alignment will force to be a
> negative value. Which is not precise.

You always have the option of an explicit size_t in that case.

> >> I don't consider alignment being 4U as wrong by itself. The key question
> >> is how you use it.
> >
> > Absolutely. And doing a ~ and then an up-cast is using it dangerously.
> > Isn't that what sparse is supposed to be checking for?
> 
> Well, you still need to get the false positive rate down. Currently then
> signal to noise ratio is too low. I would not recommend turning it on
> by default. Did you find some real bug in that 500 error report? Real
> bug I mean it produce actually bug on the object code it generated.

I'd like to say I found a real bug in an different open source package
with that check enabled. However, that's not true; I wrote that check
because I was perturbed that sparse hadn't found the bug which I'd just 
had to debug manually. So yes, it can catch real world issues.

It's much more likely to find issues in userspace programs, where there's
more likelyhood of wanting to allocate blocks of memory larger than 4G,
of course.

> >> if we introduce:
> >>
> >> #define ALIGNMASK(align)   (size_t)(~((align) -1))
> >>
> >> And we always use the macro to create mask from alignment value.
> >> Then there is no need to enforce signedness of the var passed into the
> >> macro. Hey, it might even read better.
> >
> > Put 4U in that, and it still has the flaw of creating a truncated mask.
> > Put 4 in and it doesn't.
> 
> Right, I make a mistake on that macro. Good catch.  How about
> 
> #define ALIGNMASK(align)   (~((size_t)((align) -1)))
> 
> Will you be happy if I use 4U with that macro?

Indeed. However, if the only use of the 4U is when it is expanded
into the expression (size_t)((4U) -1), then what was the 'U' again?
It still seems like a cargo cult artefact.

> My point is, with the right macro, you can still use 4U to create
> alignment. We don't have to fight over 4U vs 4.

Agreed. And how do you know if you've got the right macro?
It's on that question that I'd like sparse to step in and help.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv2,0/3] catch non-sign-extended '~' brainos

Commit Message

Comments

Patch