[RFC] Let pseudo->users loop on duplicate version of list

Message ID	CANeU7Qk8VCsj_EDBrj9H1bjR5a-_bLPQJiza_-eC90j=qiw7xw@mail.gmail.com (mailing list archive)
State	Rejected, archived
Headers	show Return-Path: <linux-sparse-owner@kernel.org> MIME-Version: 1.0 In-Reply-To: <CACXZuxfxUDJKGcH6wFhnxmfXM9jxmMOKyO=5zswQvfGbE88=sw@mail.gmail.com> References: <CANeU7QmpvfZutvnSwrjH7ZYXhw3S3HDh19eHXkBGY7oz7ADtmw@mail.gmail.com> <CANeU7Q=d_+QD+MSGyYqwL82M2RdC_AFVaeR-y87z3U3OKjNBKw@mail.gmail.com> <CACXZuxerM5i33YgKn=GfnWQe+pViCUfUmJ10sz0K-+PjG28uwA@mail.gmail.com> <CANeU7QkTAQ6XaAGrd1PmQ_4Jcv2vxD-cK_9+kKFbyX5b+Pg+Wg@mail.gmail.com> <CACXZuxeGmWGFbMmJCKi+aZbbxx89upb=_Vzqe7Ta99mOGHuVrg@mail.gmail.com> <CANeU7Qk+81ABEnGLxwbHnx510JWk7U8s=RnFqm_jM8pj-4EyDw@mail.gmail.com> <CACXZuxfxUDJKGcH6wFhnxmfXM9jxmMOKyO=5zswQvfGbE88=sw@mail.gmail.com> From: Christopher Li <sparse@chrisli.org> Date: Wed, 12 Jul 2017 22:27:25 -0700 Message-ID: <CANeU7Qk8VCsj_EDBrj9H1bjR5a-_bLPQJiza_-eC90j=qiw7xw@mail.gmail.com> Subject: Re: [PATCH RFC] Let pseudo->users loop on duplicate version of list To: Dibyendu Majumdar <mobile@majumdar.org.uk> Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>, Linux-Sparse <linux-sparse@vger.kernel.org>, Linus Torvalds <torvalds@linux-foundation.org> Content-Type: text/plain; charset="UTF-8" Sender: linux-sparse-owner@vger.kernel.org Precedence: bulk

Christopher Li July 13, 2017, 5:27 a.m. UTC

On Wed, Jul 12, 2017 at 11:05 AM, Dibyendu Majumdar
<mobile@majumdar.org.uk> wrote:
>
> I did raise this before
> (http://marc.info/?l=linux-sparse&m=149943353732715&w=2) but maybe it
> got lost in the other stuff.

Sorry I must miss that part. Yes. the ptrlist ref count will not able to
handle the insert and split the node. The whole idea is def the modify to
the last owner of holding the node. Insert need to split the node right there.

Even lock will not work. The inner loop guy try to get the lock and failed.
What can it do? It can't just retry because the parent is holding the lock.
It is going to be very nasty.

Any way, the patch has been updated to use a full duplicated list.
The git branch is at:
https://git.kernel.org/pub/scm/devel/sparse/sparse.git/log/?h=sparse-next-20170712

The patch follows.

Chris

From 7264a822d9d9e545152ac675fbc5b5e970ce254b Mon Sep 17 00:00:00 2001
From: Christopher Li <sparse@chrisli.org>
Date: Wed, 12 Jul 2017 14:46:50 -0700
Subject: [PATCH] Let pseudo->users loop on duplicate versin of list

pseudo->users list will change during find dominator.
That cause a bug in the ptrlist because the outer loop iterator
is not award of the deletion of the entry.

Let the outer loop using a duplicate version of entry
to avoid this problem for now.

This is to fix the bug report by the ptrlist ref counting check.
With this change, the ptrlist ref counting check can complete
the kernel compile without reporting an error.

Signed-of-By: Christopher Li <sparse@chrisli.org>
---
 flow.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

  if (all && !mod) {

Luc Van Oostenryck July 19, 2017, 9:14 p.m. UTC | #1

On Wed, Jul 12, 2017 at 10:27:25PM -0700, Christopher Li wrote:
> From 7264a822d9d9e545152ac675fbc5b5e970ce254b Mon Sep 17 00:00:00 2001
> From: Christopher Li <sparse@chrisli.org>
> Date: Wed, 12 Jul 2017 14:46:50 -0700
> Subject: [PATCH] Let pseudo->users loop on duplicate versin of list
> 
> pseudo->users list will change during find dominator.
> That cause a bug in the ptrlist because the outer loop iterator
> is not award of the deletion of the entry.
> 
> Let the outer loop using a duplicate version of entry
> to avoid this problem for now.

When I see this description, the first thing I think is:
"you have a problem with an object, you duplicate the object,
 not you have two problem (at least)".
In short, how you will keep coherency between the two?

So yes, without any doubts, this fixes the iteration in the outer loop
not taking correctly in account the deletion in the inner loop.

But is the code working correctly now? I don't think so.
In fact, since the outer loop is using a copy of the initial list,
now *any* changes in the list will siply be ignored by the outer loop.
- if an user is removed from the users list (for example in a
  position much further than the current position in the inner),
  previously, the inner loop will of course not have processed
  this user, while now it will process it although this user no
  more a user of this pseudo.
  It there any explanation which could explain that the current
  behaviour is correct?
- same if the inner loop add a new user, now the outer loop will
  never process this user.

To be clear, the issue I'm raising here is not about 'maybe missing
an optimisation' but well a question of applying an optimization
while the real condition are in fatc not met, and thus producings
incorrect code.

-- Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christopher Li July 19, 2017, 9:42 p.m. UTC | #2

On Wed, Jul 19, 2017 at 2:14 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> When I see this description, the first thing I think is:
> "you have a problem with an object, you duplicate the object,
>  not you have two problem (at least)".
> In short, how you will keep coherency between the two?

We don't :-). Both the remove case and insert case has been
discussed. First of all, the ptrlist-refcount patch check for the
insert case, there is none so far. If you believe there is one,
I would like to know.

About the looping on the deleted entry. That is the very first
thing I shout in the V1 version of the patch. The thing is,
the pseudo_user have a point to instruction. If the instruction
is deleted then that instruction will have insn->bb = NULL.

Not a perfect fix. But better than the previous. Because
the for loop is actually looping on reverse order. Any delete
will cause major align problem __nr counting from the tail.

Any way, My V1 and email asking for suggestion how to fix it
About 10 days  ago. I haven't heard any thing better. If you
have other way to fix it properly, I am all for it. It is not too late
but time is running out soon.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Luc Van Oostenryck July 19, 2017, 10:51 p.m. UTC | #3

On Wed, Jul 19, 2017 at 11:42 PM, Christopher Li <sparse@chrisli.org> wrote:
> On Wed, Jul 19, 2017 at 2:14 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
>> When I see this description, the first thing I think is:
>> "you have a problem with an object, you duplicate the object,
>>  not you have two problem (at least)".
>> In short, how you will keep coherency between the two?
>
> We don't :-). Both the remove case and insert case has been
> discussed. First of all, the ptrlist-refcount patch check for the
> insert case, there is none so far. If you believe there is one,
> I would like to know.

Some add_user() can be called in this loop. so it's just a question
to some input code complex enough to have an add_user()
done on the same pseudo as the one concerned in the outer
loop.

> About the looping on the deleted entry. That is the very first
> thing I shout in the V1 version of the patch. The thing is,
> the pseudo_user have a point to instruction. If the instruction
> is deleted then that instruction will have insn->bb = NULL.
>
> Not a perfect fix. But better than the previous. Because
> the for loop is actually looping on reverse order. Any delete
> will cause major align problem __nr counting from the tail.

Yes, those list delete inside iteration loops are definitively to avoid.

> Any way, My V1 and email asking for suggestion how to fix it
> About 10 days  ago. I haven't heard any thing better. If you
> have other way to fix it properly, I am all for it. It is not too late
> but time is running out soon.

I stated several times that the only real solution will be to mark the
elements as being deleted (and only effectively delete them in some
safe situations) exactly like it is done for instruction (which are probably
the type the most often deleted from lists. Trying to effectively
removing them from lists like it is currently done for the others types
would most probably fail in the most horrible ways).
It's a simple & clean solution, provably correct in *all* situations.
But I don't remember having seen any replies from you on this subject.

-- Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christopher Li July 20, 2017, 2:34 a.m. UTC | #4

On Wed, Jul 19, 2017 at 3:51 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> Some add_user() can be called in this loop. so it's just a question
> to some input code complex enough to have an add_user()
> done on the same pseudo as the one concerned in the outer
> loop.

Can you point me to a call stack that trigger the add_user()?

I have run the full kernel compile did not trigger it. The other two
nested loop delete was catches within the first 15 files or so.

> I stated several times that the only real solution will be to mark the
> elements as being deleted (and only effectively delete them in some
> safe situations) exactly like it is done for instruction (which are probably
> the type the most often deleted from lists. Trying to effectively
> removing them from lists like it is currently done for the others types
> would most probably fail in the most horrible ways).
> It's a simple & clean solution, provably correct in *all* situations.
> But I don't remember having seen any replies from you on this subject.

That is because you are away and you did not read through the email.
I did comment on it. It doesn't correct in *all* situations. We need bigger
hammer.

I locate it out for you.

http://marc.info/?l=linux-sparse&m=149987902920449&w=3
==========quote================
Even Luc's suggestion for "mark and sweep" to delete the ptrlist
is not going to help with the nested loop split. I will add that check.
I don't expect new offenders but let's make sure about it.
==========end quote=============

http://marc.info/?l=linux-sparse&m=149969288517115&w=3
=============quote===============
> I earlier suggested to instead change the way elements are 'deleted'
> (like instructions are already never removed from their list but
> just marked as deleted).

That looks simple but it has hidden complications as well. The issue is that
we need to find out which list need this kind of special treatment. Who is
the outer loop. If the same function can be both call as the outer
loop and inner
loop then it is complicate to decide when it should do the finalize.
There is also
the price to pay for walking the list twice which does not exist if nested loop
can be avoided.

=============end quote==============

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Luc Van Oostenryck Aug. 2, 2017, 11:44 p.m. UTC | #5

On Thu, Jul 20, 2017 at 4:34 AM, Christopher Li <sparse@chrisli.org> wrote:
> On Wed, Jul 19, 2017 at 3:51 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
>>
>> Some add_user() can be called in this loop. so it's just a question
>> to some input code complex enough to have an add_user()
>> done on the same pseudo as the one concerned in the outer
>> loop.
>
> Can you point me to a call stack that trigger the add_user()?
>
> I have run the full kernel compile did not trigger it. The other two
> nested loop delete was catches within the first 15 files or so.
>
>> I stated several times that the only real solution will be to mark the
>> elements as being deleted (and only effectively delete them in some
>> safe situations) exactly like it is done for instruction (which are probably
>> the type the most often deleted from lists. Trying to effectively
>> removing them from lists like it is currently done for the others types
>> would most probably fail in the most horrible ways).
>> It's a simple & clean solution, provably correct in *all* situations.
>> But I don't remember having seen any replies from you on this subject.
>
> That is because you are away and you did not read through the email.
> I did comment on it. It doesn't correct in *all* situations. We need bigger
> hammer.
>
> I locate it out for you.
>
> http://marc.info/?l=linux-sparse&m=149987902920449&w=3
> ==========quote================
> Even Luc's suggestion for "mark and sweep" to delete the ptrlist
> is not going to help with the nested loop split. I will add that check.
> I don't expect new offenders but let's make sure about it.
> ==========end quote=============

I don't think you understood what I meant.

> http://marc.info/?l=linux-sparse&m=149969288517115&w=3
> =============quote===============
>> I earlier suggested to instead change the way elements are 'deleted'
>> (like instructions are already never removed from their list but
>> just marked as deleted).
>
> That looks simple but it has hidden complications as well. The issue is that
> we need to find out which list need this kind of special treatment.

It's very simple: *all* lists need this 'treatement'.

> Who is
> the outer loop. If the same function can be both call as the outer
> loop and inner
> loop then it is complicate to decide when it should do the finalize.
> There is also
> the price to pay for walking the list twice which does not exist if nested loop
> can be avoided.

Walking twice?

In general, we can't avoid nested loops.

Look at what is done currently with instructions: when we want to 'delete'
an instruction we simply set it's ->bb to NULL which basically means:
"for now, please just ignore this instruction". In other words, instructions
are *never* removed from their lists, they are simply marked as being
deleted. You don't need to know if you're in a nested loop or not.
You will never have the problem with deletion on nested list walking
because they are never removed from the lists.

Why do you think it has been done so?
Do you think it's bad and it would be better to effectively remove
instructions from their lists like others types?
Do you think it would be good or even possible to avoid having
nested loops of instructions?
I don't think so.

-- Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christopher Li Aug. 3, 2017, 12:50 a.m. UTC | #6

On Wed, Aug 2, 2017 at 7:44 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>> http://marc.info/?l=linux-sparse&m=149987902920449&w=3
>> ==========quote================
>> Even Luc's suggestion for "mark and sweep" to delete the ptrlist
>> is not going to help with the nested loop split. I will add that check.
>> I don't expect new offenders but let's make sure about it.
>> ==========end quote=============
>
> I don't think you understood what I meant.

I don't think so. Can you please explain?

Let's give it an example on the insert case.

The outer loop is doing

FOR_EACH_PTR_REVERSE(bbs, bb) {
             /* the outer loop macro is holding the
               __nr = list->nr;
                __list = list;
                It is using __nr as the slot number point to the current slot.
              */

              ...
              add_bb(bbs, new_bb);
              /* now list->list[] array has been split.
                   list->nr has less entry.
                __nr does not know about that. it is not  going to
point to the right element */

              ...
} END_FOR_EACH_PTR_REVERSE(bb);

The key insight here is that, we need to preserv list->list[] array not changed,
also list->nr not changed for make reverse ptr loop work.

For the delete case, set bb->ep = NULL; marking the bb dead works.
It does not touch list->list[] nor list->nr. As long as you skip the bb which
bb->ep == NULL you are fine.

Let's look at the insert case.

regardless how you set bb->ep. When you insert the new bb into the list,
the list split will modify list->list[] and list->nr. There is no way
to defer that.
It *will* mess up with the outer loop __nr slot pointer.

How does your purpose method handle this case?
Please explain. I am listening.

>
>> http://marc.info/?l=linux-sparse&m=149969288517115&w=3
>> =============quote===============
>>> I earlier suggested to instead change the way elements are 'deleted'
>>> (like instructions are already never removed from their list but
>>> just marked as deleted).
>>
>> That looks simple but it has hidden complications as well. The issue is that
>> we need to find out which list need this kind of special treatment.
>
> It's very simple: *all* lists need this 'treatement'.

I mean you need to identify the caller who is the outer loop, the outer
loop caller need to do the list packing. That is what I mean by
special treatment.
If packing inside the inner loop, that is the problem we try to avoid.

That is assume we still pack list. If we never pack list, then it is
not a problem
here per say. But we still need to deal with the insert problem.

>
> Walking twice?

I am more worry about the incorrect behavior on insert and packing
than walk twice.

>
> In general, we can't avoid nested loops.

I am thinking breaking the recursive calling into work queue.
Then there is no nest loop. If you known why that will not work,
please point out. I do assume I can convert the recursive call
to non recursive using work queue. Again I haven't dig very deep yet.

>
> Look at what is done currently with instructions: when we want to 'delete'
> an instruction we simply set it's ->bb to NULL which basically means:
> "for now, please just ignore this instruction". In other words, instructions
> are *never* removed from their lists, they are simply marked as being
> deleted. You don't need to know if you're in a nested loop or not.
> You will never have the problem with deletion on nested list walking
> because they are never removed from the lists.

Right, but does not help the insert case.

>
> Why do you think it has been done so?
> Do you think it's bad and it would be better to effectively remove
> instructions from their lists like others types?
> Do you think it would be good or even possible to avoid having
> nested loops of instructions?
> I don't think so.

Please, I *do* understand why it is done that way. I am trying to point out that
did not solve all the problem if inner function want to modify the list.
aka insert. That seems to be the point haven't come across to you.

If you know how to solve the insert case, please explain.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Luc Van Oostenryck Aug. 3, 2017, 10:18 a.m. UTC | #7

On Wed, Aug 02, 2017 at 08:50:14PM -0400, Christopher Li wrote:
> On Wed, Aug 2, 2017 at 7:44 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> >> http://marc.info/?l=linux-sparse&m=149987902920449&w=3
> >> ==========quote================
> >> Even Luc's suggestion for "mark and sweep" to delete the ptrlist
> >> is not going to help with the nested loop split. I will add that check.
> >> I don't expect new offenders but let's make sure about it.
> >> ==========end quote=============
> >
> > I don't think you understood what I meant.
> 
> I don't think so. Can you please explain?
> 
> Let's give it an example on the insert case.

Yes yes.
I already said the 7th July that reverse list walking and list
splitting will be another beast but the subject of this patch
is the *deletion* of elements from lists.
 
> > It's very simple: *all* lists need this 'treatement'.
> 
> I mean you need to identify the caller who is the outer loop, the outer
> loop caller need to do the list packing. That is what I mean by
> special treatment.
> If packing inside the inner loop, that is the problem we try to avoid.
> 
> That is assume we still pack list. If we never pack list, then it is
> not a problem
> here per say.

We don't have to pack all lists.
The ones that we pack can be packed at some specific place/moment.

> >
> > Walking twice?
> 
> I am more worry about the incorrect behavior on insert and packing
> than walk twice.
> 
> >
> > In general, we can't avoid nested loops.
> 
> I am thinking breaking the recursive calling into work queue.
> Then there is no nest loop. If you known why that will not work,
> please point out. I do assume I can convert the recursive call
> to non recursive using work queue. Again I haven't dig very deep yet.

Oh, I'm sure that using a work queue will work in some case.
I'm even sure that it's a good idea for some case. But I don't
think it will solve all problems because:
- some nested loops will remains
- it will be hard to avoid nested loops because we generaly don't
  know if we're nested or not. The mains ep - bbs - insns loop
  is obvious but what for other loops?
- using a work list is not without problems either. For example,
  we'll have no/few control over when things are effectively be
  done. Sometimes it will be fine, sometimes it won't be acceptable.
  If during some operation, we must delete *this* element or
  add a new element *here in the list* just pushing something
  on a work list and saying "OK, it will be done some time later"
  won't solve the problem, won't keep things coherent, ...

> >
> > Look at what is done currently with instructions: when we want to 'delete'
> > an instruction we simply set it's ->bb to NULL which basically means:
> > "for now, please just ignore this instruction". In other words, instructions
> > are *never* removed from their lists, they are simply marked as being
> > deleted. You don't need to know if you're in a nested loop or not.
> > You will never have the problem with deletion on nested list walking
> > because they are never removed from the lists.
> 
> Right, but does not help the insert case.

Sure, it's another problem.
 
> > Why do you think it has been done so?
> > Do you think it's bad and it would be better to effectively remove
> > instructions from their lists like others types?
> > Do you think it would be good or even possible to avoid having
> > nested loops of instructions?
> > I don't think so.
> 
> Please, I *do* understand why it is done that way. I am trying to point out that
> did not solve all the problem if inner function want to modify the list.
> aka insert. That seems to be the point haven't come across to you.
> 
> If you know how to solve the insert case, please explain.

Well, it's not as if saying "let's do everything with work queues"
solve anything in itself. It's a idea that would need lot of patches
and big changes in a lot of code to show if it's up to its promise
(and as I explain shortly here above, I have reasons to believe
that it won't be possible to solve all the list problems with it).

For the reverse walking and insert, I haven't thought much about
it yet. I won't be surprised that it will need some change in the
ptrlist struct & macros.
Maybe it's an operation that is not common enough that trying to ban
it wouldn't be a big constraint on existing & future code.
If all the ones we have go away once everything have been converted
to using work queues then fine, but what if they don't, what if
we can't convert everything to using work queues?

On the other hand, avoiding the problem of list deletion (which seems
to be an operation much more common than reverse & insert) by marking
deleted elements is something that can be done easily, with minimal
code impact and that *will* solve *all* problems with list deletion
(but yes, only list deletion).

-- Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christopher Li Aug. 3, 2017, 11:48 p.m. UTC | #8

On Thu, Aug 3, 2017 at 6:18 AM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> On Wed, Aug 02, 2017 at 08:50:14PM -0400, Christopher Li wrote:
>> On Wed, Aug 2, 2017 at 7:44 PM, Luc Van Oostenryck
>> <luc.vanoostenryck@gmail.com> wrote:
>> >> http://marc.info/?l=linux-sparse&m=149987902920449&w=3
>> >> ==========quote================
>> >> Even Luc's suggestion for "mark and sweep" to delete the ptrlist
>> >> is not going to help with the nested loop split. I will add that check.
>> >> I don't expect new offenders but let's make sure about it.
>> >> ==========end quote=============
>> >
>> > I don't think you understood what I meant.
>>
>> I don't think so. Can you please explain?
>>
>> Let's give it an example on the insert case.
>
> Yes yes.
> I already said the 7th July that reverse list walking and list
> splitting will be another beast but the subject of this patch
> is the *deletion* of elements from lists.

The text you quote from me clearing is discussing node split here.
" nested loop *split*". Even though the title of the email was original
about delete. The discussion expand to split because I haven't realize
that is a problem earlier. I consider this as the same category delete.
It is basically looping and modify the list at the same time.
I want to find a solution to solve this both situations.

> We don't have to pack all lists.
> The ones that we pack can be packed at some specific place/moment.

Right, we need to make sure the packing *never* call from the same list
looping body. There is no easy way to find out this did not happen.

>> I am thinking breaking the recursive calling into work queue.
>> Then there is no nest loop. If you known why that will not work,
>> please point out. I do assume I can convert the recursive call
>> to non recursive using work queue. Again I haven't dig very deep yet.
>
> Oh, I'm sure that using a work queue will work in some case.
> I'm even sure that it's a good idea for some case. But I don't
> think it will solve all problems because:
> - some nested loops will remains

Can you give an example what nest looping will have to remain.

Converting recursive call into work queue without recursive
is pretty stander algorithm. In theory, as long as you defer
the inner loop calling into the work queue. It can always
avoid the nest looping. In the extreme case you can think
the work queue holding a copy of the list and move the inner
loop body to the outer work queue.

In reality, we don't want to move every thing into work queue.
I think the loop just observe. Defer the modification into work queue
is relative conceptually clean.

> - it will be hard to avoid nested loops because we generaly don't
>   know if we're nested or not. The mains ep - bbs - insns loop

We don't mind nested looping as long as we don't modify it.
We can always know that we are about to modify the bb or insns.
That is a good place to move to the work queue.

My mental model is that, have the detect and modify phase.
The detect phase do all the looping as needed. Look but don't touch.
The modify request is push into work queue and execute after the
loop has been done.

>   is obvious but what for other loops?
> - using a work list is not without problems either. For example,
>   we'll have no/few control over when things are effectively be
>   done. Sometimes it will be fine, sometimes it won't be acceptable.
>   If during some operation, we must delete *this* element or
>   add a new element *here in the list* just pushing something
>   on a work list and saying "OK, it will be done some time later"
>   won't solve the problem, won't keep things coherent, ...

That usually means you have some other things need to defer as well.
You don't have to do it on the spot on the loop. Have a work queue is
actually easier to reason and enforce such ordering.
You can consider our current CSE loop as a poor form of work queue.
It just have one bit of flag for each stage. It does not point to which
insn need to be changed.

You can think it this way, the delete mark and sweep is just another way
to defer the delete as well. In the extreme case, we defer to never delete.

>>
>> If you know how to solve the insert case, please explain.
>
> Well, it's not as if saying "let's do everything with work queues"
> solve anything in itself. It's a idea that would need lot of patches
> and big changes in a lot of code to show if it's up to its promise
> (and as I explain shortly here above, I have reasons to believe
> that it won't be possible to solve all the list problems with it).

Right. It need a lot of discussion and patch experiment. Definitely
after the release.

As I said in the other email. If you have better  patch to solve the
nest loop delete on pseudo list for RC5. Feel free to replace the
current one on sparse. I am aware of the current solution is not perfect.
It can iterate on deleted ptr entry.

>
> For the reverse walking and insert, I haven't thought much about
> it yet. I won't be surprised that it will need some change in the
> ptrlist struct & macros.
> Maybe it's an operation that is not common enough that trying to ban
> it wouldn't be a big constraint on existing & future code.

I think that is let the tail wag the dog. Plus, I can demonstrate,
node split in FOR_EACH_PTR() can also have incorrect behavior as well.
Basically node split will change list->list[] and list->nr. That can have
impact on forward iteration too. The reverse one is just more obvious.

> If all the ones we have go away once everything have been converted
> to using work queues then fine, but what if they don't, what if
> we can't convert everything to using work queues?

We will have to use bigger hammer.  In short, it is going to be ugly.
e.g. have the list node has ref count and a pending change single list.

>
> On the other hand, avoiding the problem of list deletion (which seems
> to be an operation much more common than reverse & insert) by marking
> deleted elements is something that can be done easily, with minimal
> code impact and that *will* solve *all* problems with list deletion
> (but yes, only list deletion).

Yes, I consider that as a short term thing until we find a better way to handle
more general modification including insert.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Luc Van Oostenryck Aug. 4, 2017, 12:41 a.m. UTC | #9

On Thu, Aug 03, 2017 at 07:48:59PM -0400, Christopher Li wrote:
> On Thu, Aug 3, 2017 at 6:18 AM, Luc Van Oostenryck wrote:
> > We don't have to pack all lists.
> > The ones that we pack can be packed at some specific place/moment.
> 
> Right, we need to make sure the packing *never* call from the same list
> looping body. There is no easy way to find out this did not happen.
> 
> >> I am thinking breaking the recursive calling into work queue.
> >> Then there is no nest loop. If you known why that will not work,
> >> please point out. I do assume I can convert the recursive call
> >> to non recursive using work queue. Again I haven't dig very deep yet.
> >
> > Oh, I'm sure that using a work queue will work in some case.
> > I'm even sure that it's a good idea for some case. But I don't
> > think it will solve all problems because:
> > - some nested loops will remains
> 
> Can you give an example what nest looping will have to remain.

Can you give the assurance that none will remains? 

I said "I think that ..." and of course, I have reasons to
think so. It was explained here under.

> Converting recursive call into work queue without recursive
> is pretty stander algorithm. In theory, as long as you defer
> the inner loop calling into the work queue. It can always
> avoid the nest looping. In the extreme case you can think
> the work queue holding a copy of the list and move the inner
> loop body to the outer work queue.

*standard algorithm*, *in theory*, ...
 
> In reality, we don't want to move every thing into work queue.
> I think the loop just observe. Defer the modification into work queue
> is relative conceptually clean.
> 
> > - it will be hard to avoid nested loops because we generaly don't
> >   know if we're nested or not. The mains ep - bbs - insns loop
> 
> We don't mind nested looping as long as we don't modify it.
> We can always know that we are about to modify the bb or insns.
> That is a good place to move to the work queue.
> 
> My mental model is that, have the detect and modify phase.
> The detect phase do all the looping as needed. Look but don't touch.
> The modify request is push into work queue and execute after the
> loop has been done.
> 
> >   is obvious but what for other loops?
> > - using a work list is not without problems either. For example,
> >   we'll have no/few control over when things are effectively be
> >   done. Sometimes it will be fine, sometimes it won't be acceptable.
> >   If during some operation, we must delete *this* element or
> >   add a new element *here in the list* just pushing something
> >   on a work list and saying "OK, it will be done some time later"
> >   won't solve the problem, won't keep things coherent, ...
> 
> That usually means you have some other things need to defer as well.
> You don't have to do it on the spot on the loop. Have a work queue is
> actually easier to reason and enforce such ordering.

You have all the advantages to decouple things and
you have all the disadvantges to have decoupled things.
It's not a B&W thing.

> > Well, it's not as if saying "let's do everything with work queues"
> > solve anything in itself. It's a idea that would need lot of patches
> > and big changes in a lot of code to show if it's up to its promise
> > (and as I explain shortly here above, I have reasons to believe
> > that it won't be possible to solve all the list problems with it).
> 
> Right. It need a lot of discussion and patch experiment. Definitely
> after the release.

I think you very seriously underestimate the amount of work that will
need to be done.
 
> As I said in the other email. If you have better  patch to solve the
> nest loop delete on pseudo list for RC5. Feel free to replace the
> current one on sparse. I am aware of the current solution is not perfect.
> It can iterate on deleted ptr entry.

I just send a patch.
It's not very pretty and is only lightly tested but IMO it's vastly
superior to the duplication of list.

> > For the reverse walking and insert, I haven't thought much about
> > it yet. I won't be surprised that it will need some change in the
> > ptrlist struct & macros.
> > Maybe it's an operation that is not common enough that trying to ban
> > it wouldn't be a big constraint on existing & future code.
> 
> I think that is let the tail wag the dog. Plus, I can demonstrate,
> node split in FOR_EACH_PTR() can also have incorrect behavior as well.

Yes yes, we know that since the beginning.

> > If all the ones we have go away once everything have been converted
> > to using work queues then fine, but what if they don't, what if
> > we can't convert everything to using work queues?
> 
> We will have to use bigger hammer.  In short, it is going to be ugly.
> e.g. have the list node has ref count and a pending change single list.

Other solutions exist :)
 
> > On the other hand, avoiding the problem of list deletion (which seems
> > to be an operation much more common than reverse & insert) by marking
> > deleted elements is something that can be done easily, with minimal
> > code impact and that *will* solve *all* problems with list deletion
> > (but yes, only list deletion).
> 
> Yes, I consider that as a short term thing until we find a better way to handle
> more general modification including insert.

Good to hear that.

-- Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christopher Li Aug. 4, 2017, 2:22 a.m. UTC | #10

On Thu, Aug 3, 2017 at 8:41 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>>
>> Can you give an example what nest looping will have to remain.
>
> Can you give the assurance that none will remains?

I can have the ptr ref count patch to check if such thing
happen. But I am sure that is not what you are asking :-)

If you follow the observe and modify model. Put the modify
stage on the outer loop. Modify only one thing at  a time.
Then it is easy to reason weather "nest loop modify" exist.

What you are asking in more general setting is not going to
have trivial answer any way. It is equivalent of the truing halting
problem.

On the other hand, if you want to disprove this, you only need
one counter example.

>> Converting recursive call into work queue without recursive
>> is pretty stander algorithm. In theory, as long as you defer
>> the inner loop calling into the work queue. It can always
>> avoid the nest looping. In the extreme case you can think
>> the work queue holding a copy of the list and move the inner
>> loop body to the outer work queue.
>
> *standard algorithm*, *in theory*, ...

I am actually don't worry about weather it can be done or not.
I am sure you don't need me to google converting recursion
to iteration for you.

The question is weather it can be done on sparse cleanly.
It is hard to say without trying.

> You have all the advantages to decouple things and
> you have all the disadvantges to have decoupled things.
> It's not a B&W thing.

It is a white-gray-black thing if you know what I mean :-)
Black is the terminate condition.

>> > Well, it's not as if saying "let's do everything with work queues"
>> > solve anything in itself. It's a idea that would need lot of patches
>> > and big changes in a lot of code to show if it's up to its promise
>> > (and as I explain shortly here above, I have reasons to believe
>> > that it won't be possible to solve all the list problems with it).
>>
>> Right. It need a lot of discussion and patch experiment. Definitely
>> after the release.
>
> I think you very seriously underestimate the amount of work that will
> need to be done.

I realize there will be a lot of change. But actually sparse will need ssa
version of the dead code removal (correctness) and ssa version of constant
propagation any way. (performance, current we scan the whole insn list
to find out
optimization opportunity). A lot of similar algorithm already using work
queue. Some rewrites is expected as far as I can tell.

> I just send a patch.
> It's not very pretty and is only lightly tested but IMO it's vastly
> superior to the duplication of list.

I haven't see it. I saw there is an constant expression patch on the
mailing list.

>> I think that is let the tail wag the dog. Plus, I can demonstrate,
>> node split in FOR_EACH_PTR() can also have incorrect behavior as well.
>
> Yes yes, we know that since the beginning.

Then why ban reverse ptr loop? It is not going to help avoid the bad
situation.

>> We will have to use bigger hammer.  In short, it is going to be ugly.
>> e.g. have the list node has ref count and a pending change single list.
>
> Other solutions exist :)

Yes, idea and patch welcome.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Luc Van Oostenryck Aug. 4, 2017, 10:38 a.m. UTC | #11

On Thu, Aug 03, 2017 at 10:22:34PM -0400, Christopher Li wrote:
> On Thu, Aug 3, 2017 at 8:41 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> >>
> >> Can you give an example what nest looping will have to remain.
> >
> > Can you give the assurance that none will remains?
> 
> I can have the ptr ref count patch to check if such thing
> happen. But I am sure that is not what you are asking :-)

It was a rethorical question as a reponse to your own question
because your own asking to show something while we're talking
design and potentiality doesn't help at all.

> >> Converting recursive call into work queue without recursive
> >> is pretty stander algorithm. In theory, as long as you defer
> >> the inner loop calling into the work queue. It can always
> >> avoid the nest looping. In the extreme case you can think
> >> the work queue holding a copy of the list and move the inner
> >> loop body to the outer work queue.
> >
> > *standard algorithm*, *in theory*, ...
> 
> I am actually don't worry about weather it can be done or not.
> I am sure you don't need me to google converting recursion
> to iteration for you.

1) Converting a recursive algo to an iterative one is not always
   a good solution, for example the iterative one can be much
   more complex/less elegant. You may also need more memory
   (because you often need to store additional info).
   A typical example is searching in a tree: it can be done
   with iteration but it's generaly not done because the
   recursive version is so much simpler & elegant (and probably
   efficient too).
2) Converting something to a work queue is not the same as
   converting recursion to iteration anyway.
   Recursion vs. iteration is something about how walking through
   a data structure. Using a work queue is about how/when doing an
   operation on an object.

An example among other of complications you can have when converting
to work queue is something like:
Imagine you're iterating through a list of instruction checking/
searching after something. If found, you want to do an operation
on the instruction just found.
Sound like something for a work queue, right? Just put the
object you found in the work queue and let do the operation later.
Now what if the operation is about inserting another instruction
just before the one just found (like done when adding death notes)?
- if you do it immediatly, you have the context of the instruction
  and inserting the death notes is easy/cheap
- if you do the insertion later when processing the work queue
  you won't have anymore the context of the instruction
  and to insert the death note you will first have to 
  search after it (to have the context back) before you will
  be able to do the insertion and this search will cost you.

It's just an example but there is a whole class of similar situations.
Delaying thing by moving it to a work queue will also have negative
impact on cache (but in a way, it's also a problem of having lost the
context).

And again, I'm not telling that using workqueue is never a good solution,
I'm saying that most probably it won't always be a good solution.

-- Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christopher Li Aug. 4, 2017, 2:48 p.m. UTC | #12

On Fri, Aug 4, 2017 at 6:38 AM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> It was a rethorical question as a reponse to your own question
> because your own asking to show something while we're talking
> design and potentiality doesn't help at all.

That is exactly the feeling I am getting from you. You said some thing
might not work. I want to find out exactly what do you have in mind might
not work.  Otherwise the conversion is base on doubt is not very
useful. I want details. This goes both ways. You can also ask me to clarify
more details on certain things in my mental model.
>
> 1) Converting a recursive algo to an iterative one is not always
>    a good solution, for example the iterative one can be much
>    more complex/less elegant. You may also need more memory
>    (because you often need to store additional info).
>    A typical example is searching in a tree: it can be done
>    with iteration but it's generaly not done because the
>    recursive version is so much simpler & elegant (and probably
>    efficient too).

In my observe and modify model, the traverse of a tree can use
recursive, even on the same ptrlist. It is the modify get deferred.
I think I mention that observe and modify in previous email already.
So what you are describing above is not a good example to against it.

> 2) Converting something to a work queue is not the same as
>    converting recursion to iteration anyway.
>    Recursion vs. iteration is something about how walking through
>    a data structure. Using a work queue is about how/when doing an
>    operation on an object.

You walk the graph to do some thing. That is given. Nobody walk the graph
for the sake of just walking the graph. The optimize can be model
as an algorithm walk the graph and make modification of the graph.
Is that a fair statement?

>
> An example among other of complications you can have when converting
> to work queue is something like:
> Imagine you're iterating through a list of instruction checking/
> searching after something. If found, you want to do an operation
> on the instruction just found.

Now we are talking.

> Sound like something for a work queue, right? Just put the
> object you found in the work queue and let do the operation later.
> Now what if the operation is about inserting another instruction
> just before the one just found (like done when adding death notes)?
> - if you do it immediatly, you have the context of the instruction
>   and inserting the death notes is easy/cheap

Here you make some assumption that you can't save the context of
the instruction in the work queue. That is not necessary true. In
my mental model, the work queue will contain the context of what
needs to be done, e.g. the pointer to the instruct. Other context
if needed. I want to know more about what context you can't save.

BTW, most of the context can be found on the graph does not
need to save in to work queue.

Take simplify_branch_branch for example, when you know this
instruction should be simplified, there is very little information
need to be save in the work queue. Basically we can start with
call arguments to rewrite_branch(), go from there.

> - if you do the insertion later when processing the work queue
>   you won't have anymore the context of the instruction
>   and to insert the death note you will first have to
>   search after it (to have the context back) before you will
>   be able to do the insertion and this search will cost you.

Again, I don't have a clear picture of what context you are talking
about. Please clarify. If it is the instruction before this instruction,
I can't image why I need it. If really need it I can still cache it or
search it in the bb->insns list.

BTW, my impression is that, the algorithm "constant propagation using ssa"
is such an example what you are describing here. In the constant
propagation, you first find the instruction that use constant.
Then you find out if that instruction can be simplify as constant.
Next thing is put the instruction using of this simplify instruction result
into the queue to see if they can be simplify as well.

It is actually a lot better than our current approach to rescan the
whole instruction list just to find out, oh, we got some new instruction
have constant input now (due to last round constant propagation).

> It's just an example but there is a whole class of similar situations.
> Delaying thing by moving it to a work queue will also have negative
> impact on cache (but in a way, it's also a problem of having lost the
> context).

Caching, maybe. But I don't think caching is a strong point to against it.
I am just a system guy playing catch up game on compiler designs. The more
I read, the more I find out using work queue is a very common
thing amount those classic optimization algorithms.

> And again, I'm not telling that using workqueue is never a good solution,
> I'm saying that most probably it won't always be a good solution.

My reading is that, "I have doubt but I can't articulate precisely why it
is the case." Fill in more detail we can have more discussion.

Granted, no algorithm is *always" a good solution to all problem.
As I express in my previous email, I have no doubt this can be done.
The question is weather it can be done *clean* enough for sparse.

When we introduce "dead code elimination using ssa", work queue like
frame work will be needed any way.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Luc Van Oostenryck Aug. 4, 2017, 4:58 p.m. UTC | #13

On Fri, Aug 04, 2017 at 10:48:35AM -0400, Christopher Li wrote:
> On Fri, Aug 4, 2017 at 6:38 AM, Luc Van Oostenryck wrote:
> > It was a rethorical question as a reponse to your own question
> > because your own asking to show something while we're talking
> > design and potentiality doesn't help at all.
> 
> That is exactly the feeling I am getting from you. You said some thing
> might not work. I want to find out exactly what do you have in mind might
> not work.  Otherwise the conversion is base on doubt is not very
> useful.

Replace 'doubt' by 'warning' and it give already another sense to it.
My whole message has always been the same: "Warning, it's not so easy
or obvious and I have reasons to think that it won't always be possible
or desirable (so it may be good to also think about alternative or to
somehow accept thet idea that maybe not all cases will be solved the
same way".

> I want details. This goes both ways. You can also ask me to clarify
> more details on certain things in my mental model.

You can ask details about an existing implementation, you can ask
details about some idea already well elaborated. 
You can't ask details about something that by nature have not yet details,
but I have already tried to give you all sort of indices that would
give some matter to think about.

> > 1) Converting a recursive algo to an iterative one is not always
> >    a good solution, for example the iterative one can be much
> >    more complex/less elegant. You may also need more memory
> >    (because you often need to store additional info).
> >    A typical example is searching in a tree: it can be done
> >    with iteration but it's generaly not done because the
> >    recursive version is so much simpler & elegant (and probably
> >    efficient too).
> 
> In my observe and modify model, the traverse of a tree can use
> recursive, even on the same ptrlist.

When you're searching in a (binary) tree, at some point you will need
to backtrack, to return to some previous branching point. The usual
way to do that is to use recursion and the branching point is
implicitely stored in the stack. You can do the same without recursion,
without using the call stack but then you need to store info about the
branching point.
It's an example that this sort of changes is no for free.

> > 2) Converting something to a work queue is not the same as
> >    converting recursion to iteration anyway.
> >    Recursion vs. iteration is something about how walking through
> >    a data structure. Using a work queue is about how/when doing an
> >    operation on an object.
> 
> You walk the graph to do some thing. That is given. Nobody walk the graph
> for the sake of just walking the graph. The optimize can be model
> as an algorithm walk the graph and make modification of the graph.
> Is that a fair statement?

Sorry, I don't understand what you mean. 

> > An example among other of complications you can have when converting
> > to work queue is something like:
> > Imagine you're iterating through a list of instruction checking/
> > searching after something. If found, you want to do an operation
> > on the instruction just found.
> 
> Now we are talking.
> 
> > Sound like something for a work queue, right? Just put the
> > object you found in the work queue and let do the operation later.
> > Now what if the operation is about inserting another instruction
> > just before the one just found (like done when adding death notes)?
> > - if you do it immediatly, you have the context of the instruction
> >   and inserting the death notes is easy/cheap
> 
> Here you make some assumption that you can't save the context of
> the instruction in the work queue.

No, I don't make such assumption, it's an example showing that not
doing things there may have a cost. Saving the context is such a cost.

> That is not necessary true. In
> my mental model, the work queue will contain the context of what
> needs to be done, e.g. the pointer to the instruct. Other context
> if needed. I want to know more about what context you can't save.

I never said you can't save context.
On the contrary I've said that you'll most probably have to store
some context to be able to do things in a deferred way and this
will have a cost (and I use here the word 'context' in a very general
abstract meaning).

> BTW, most of the context can be found on the graph does not
> need to save in to work queue.
> 
> Take simplify_branch_branch for example, when you know this
> instruction should be simplified, there is very little information
> need to be save in the work queue.

Absolutely, in this example yes.
In some others examples, it is less so.

> > - if you do the insertion later when processing the work queue
> >   you won't have anymore the context of the instruction
> >   and to insert the death note you will first have to
> >   search after it (to have the context back) before you will
> >   be able to do the insertion and this search will cost you.
> 
> Again, I don't have a clear picture of what context you are talking
> about.

You have an instruction and you need to insert another instruction
in front of it. If you do that while you was searching after this
instruction, it's very easy because you have the head-list-nr trio
implicitely available via the ptrlist macros. This trio is here
the context you need in order to efficiently insert an instruction
in front ot it.

Now, for some reasons you want to defer this instruction.
What will you put in the work queue?
- the pointer to the instruction?
  then to insert you will need to search again in the list until
  you found the instruction. At this point you will have back
  the context needed to do the insertion but you had to do a search.
- the pointer to where the instruction stored in the list?
  then you can easily exchange this instruction with another one
  but how will this be usefull to the insertion? Well sure,
  with the pointer you can get back the original head-list-nr
  but is this usefull here? If you store somewhere the instruction
  pointer pointer, in practice that mean that the list that contain
  it become immutable (presumably you also hold pointers for the
  neighbour instructions). The only thing that can work at this point
  is a normal liked list, nor the block-based ptrlist.
  Is that a problem? Maybe, maybe not much.

> Please clarify. If it is the instruction before this instruction,
> I can't image why I need it. If really need it I can still cache it or
> search it in the bb->insns list.
> 
> BTW, my impression is that, the algorithm "constant propagation using ssa"
> is such an example what you are describing here. In the constant
> propagation, you first find the instruction that use constant.
> Then you find out if that instruction can be simplify as constant.
> Next thing is put the instruction using of this simplify instruction result
> into the queue to see if they can be simplify as well.

No, because in this case each instruction is still independend, you 
don't need to know anything about the surrounding instructions.
So it's a case where it should be easy to convert it.
The example for the deathnotes insertion is not like this.

> > It's just an example but there is a whole class of similar situations.
> > Delaying thing by moving it to a work queue will also have negative
> > impact on cache (but in a way, it's also a problem of having lost the
> > context).
> 
> Caching, maybe. But I don't think caching is a strong point to against it.
> I am just a system guy playing catch up game on compiler designs. The more
> I read, the more I find out using work queue is a very common
> thing amount those classic optimization algorithms.
> 
> > And again, I'm not telling that using workqueue is never a good solution,
> > I'm saying that most probably it won't always be a good solution.
> 
> My reading is that, "I have doubt but I can't articulate precisely why it
> is the case." Fill in more detail we can have more discussion.

If by "I have doubt" you understand "I have doubt about using it
systematically to avoid problems with nested lists", you would be
quite close. If you mean instead "I have doubt about using
it preferably (for cleanless and effective solution to avoid problems
with nested lists" you would be very wrong.

> Granted, no algorithm is *always" a good solution to all problem.
> As I express in my previous email, I have no doubt this can be done.

You have no doubts that it can be done everywhere in sparse to avoid
problems with nested lists or you have no doubts that it can be
done at least in some or most case?

Like I have already told since the beginning, I have doubts about doing
it everywhere and I have already explained why (or at least tried to).

-- Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC] Let pseudo->users loop on duplicate version of list

Commit Message

Comments

Patch