Message ID | alpine.LRH.2.02.1806191228110.25656@file01.intranet.prod.int.rdu2.redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu 21-06-18 21:17:24, Mikulas Patocka wrote: [...] > > But seriously, isn't the best way around the throttling issue to use > > PF_LESS_THROTTLE? > > Yes - it could be done by setting PF_LESS_THROTTLE. But I think it would > be better to change it just in one place than to add PF_LESS_THROTTLE to > every block device driver (because adding it to every block driver results > in more code). Why would every block device need this? I thought we were talking about mempool allocator and the md variant of it. They are explicitly doing their own back off so PF_LESS_THROTTLE sounds appropriate to me. > What about this patch? If __GFP_NORETRY and __GFP_FS is not set (i.e. the > request comes from a block device driver or a filesystem), we should not > sleep. Why? How are you going to audit all the callers that the behavior makes sense and moreover how are you going to ensure that future usage will still make sense. The more subtle side effects gfp flags have the harder they are to maintain. > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Cc: stable@vger.kernel.org > > Index: linux-2.6/mm/vmscan.c > =================================================================== > --- linux-2.6.orig/mm/vmscan.c > +++ linux-2.6/mm/vmscan.c > @@ -2674,6 +2674,7 @@ static bool shrink_node(pg_data_t *pgdat > * the LRU too quickly. > */ > if (!sc->hibernation_mode && !current_is_kswapd() && > + (sc->gfp_mask & (__GFP_NORETRY | __GFP_FS)) != __GFP_NORETRY && > current_may_throttle() && pgdat_memcg_congested(pgdat, root)) > wait_iff_congested(BLK_RW_ASYNC, HZ/10); >
On Fri 22-06-18 11:01:51, Michal Hocko wrote: > On Thu 21-06-18 21:17:24, Mikulas Patocka wrote: [...] > > What about this patch? If __GFP_NORETRY and __GFP_FS is not set (i.e. the > > request comes from a block device driver or a filesystem), we should not > > sleep. > > Why? How are you going to audit all the callers that the behavior makes > sense and moreover how are you going to ensure that future usage will > still make sense. The more subtle side effects gfp flags have the harder > they are to maintain. So just as an excercise. Try to explain the above semantic to users. We currently have the following. * __GFP_NORETRY: The VM implementation will try only very lightweight * memory direct reclaim to get some memory under memory pressure (thus * it can sleep). It will avoid disruptive actions like OOM killer. The * caller must handle the failure which is quite likely to happen under * heavy memory pressure. The flag is suitable when failure can easily be * handled at small cost, such as reduced throughput * __GFP_FS can call down to the low-level FS. Clearing the flag avoids the * allocator recursing into the filesystem which might already be holding * locks. So how are you going to explain gfp & (__GFP_NORETRY | ~__GFP_FS)? What is the actual semantic without explaining the whole reclaim or force users to look into the code to understand that? What about GFP_NOIO | __GFP_NORETRY? What does it mean to that "should not sleep". Do all shrinkers have to follow that as well?
On Fri, 22 Jun 2018, Michal Hocko wrote: > On Thu 21-06-18 21:17:24, Mikulas Patocka wrote: > [...] > > > But seriously, isn't the best way around the throttling issue to use > > > PF_LESS_THROTTLE? > > > > Yes - it could be done by setting PF_LESS_THROTTLE. But I think it would > > be better to change it just in one place than to add PF_LESS_THROTTLE to > > every block device driver (because adding it to every block driver results > > in more code). > > Why would every block device need this? I thought we were talking about > mempool allocator and the md variant of it. They are explicitly doing > their own back off so PF_LESS_THROTTLE sounds appropriate to me. Because every block driver is suspicible to this problem. Two years ago, there was a bug that when the user was swapping to dm-crypt device, memory management would stall the allocations done by dm-crypt by 100ms - that slowed down swapping rate and made the machine unuseable. Then, people are complaining about the same problem in dm-bufio. Next time, it will be something else. (you will answer : "we will not fix bugs unless people are reporting them" :-) > > What about this patch? If __GFP_NORETRY and __GFP_FS is not set (i.e. the > > request comes from a block device driver or a filesystem), we should not > > sleep. > > Why? How are you going to audit all the callers that the behavior makes > sense and moreover how are you going to ensure that future usage will > still make sense. The more subtle side effects gfp flags have the harder > they are to maintain. I did audit them - see the previous email in this thread: https://www.redhat.com/archives/dm-devel/2018-June/thread.html Mikulas > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > Cc: stable@vger.kernel.org > > > > Index: linux-2.6/mm/vmscan.c > > =================================================================== > > --- linux-2.6.orig/mm/vmscan.c > > +++ linux-2.6/mm/vmscan.c > > @@ -2674,6 +2674,7 @@ static bool shrink_node(pg_data_t *pgdat > > * the LRU too quickly. > > */ > > if (!sc->hibernation_mode && !current_is_kswapd() && > > + (sc->gfp_mask & (__GFP_NORETRY | __GFP_FS)) != __GFP_NORETRY && > > current_may_throttle() && pgdat_memcg_congested(pgdat, root)) > > wait_iff_congested(BLK_RW_ASYNC, HZ/10); > > > > -- > Michal Hocko > SUSE Labs >
On Fri 22-06-18 08:44:52, Mikulas Patocka wrote: > On Fri, 22 Jun 2018, Michal Hocko wrote: [...] > > Why? How are you going to audit all the callers that the behavior makes > > sense and moreover how are you going to ensure that future usage will > > still make sense. The more subtle side effects gfp flags have the harder > > they are to maintain. > > I did audit them - see the previous email in this thread: > https://www.redhat.com/archives/dm-devel/2018-June/thread.html I do not see any mention about throttling expectations for those users. You have focused only on the allocation failure fallback AFAIR
On Fri, 22 Jun 2018, Michal Hocko wrote: > On Fri 22-06-18 08:44:52, Mikulas Patocka wrote: > > On Fri, 22 Jun 2018, Michal Hocko wrote: > [...] > > > Why? How are you going to audit all the callers that the behavior makes > > > sense and moreover how are you going to ensure that future usage will > > > still make sense. The more subtle side effects gfp flags have the harder > > > they are to maintain. > > > > I did audit them - see the previous email in this thread: > > https://www.redhat.com/archives/dm-devel/2018-June/thread.html > > I do not see any mention about throttling expectations for those users. > You have focused only on the allocation failure fallback AFAIR How should the callers be analyzed with respect to throttling? Mikulas
Index: linux-2.6/mm/vmscan.c =================================================================== --- linux-2.6.orig/mm/vmscan.c +++ linux-2.6/mm/vmscan.c @@ -2674,6 +2674,7 @@ static bool shrink_node(pg_data_t *pgdat * the LRU too quickly. */ if (!sc->hibernation_mode && !current_is_kswapd() && + (sc->gfp_mask & (__GFP_NORETRY | __GFP_FS)) != __GFP_NORETRY && current_may_throttle() && pgdat_memcg_congested(pgdat, root)) wait_iff_congested(BLK_RW_ASYNC, HZ/10);