Message ID | 20200324142229.12028-5-bhe@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | improvements about lowmem_reserve and /proc/zoneinfo | expand |
On Tue, 24 Mar 2020, Baoquan He wrote: > This moving makes the layout of /proc/zoneinfo more sensible. And there > are 4 zones at most currently, it doesn't need to scroll down much to get > to the 1st populated zone, even though the 1st populated zone is MOVABLE > zone. > Doesn't this introduce risk that it will break existing parsers of /proc/zoneinfo in subtle ways? In some cases /proc/zoneinfo is a tricky file to correctly parse because you have to rely on the existing order in which it is printed to determine which zone is being described. We need to print zones even with unmanaged pages, for instance, otherwise userspace may be unaware of which zones are supported and what order they are in. That's important to be able to construct the proper string to use when writing vm.lowmem_reserve_ratio. I'd prefer not changing the order of /proc/zoneinfo if it can be avoided just because the risk outweighs the reward that we may break some initscript parsers. > Node 2, per-node stats > nr_inactive_anon 48 > nr_active_anon 15454 > ... > nr_foll_pin_acquired 0 > nr_foll_pin_released 0 > Node 2, zone DMA > pages free 0 > min 0 > low 0 > high 0 > spanned 0 > present 0 > managed 0 > Node 2, zone DMA32 > pages free 0 > min 0 > low 0 > high 0 > spanned 0 > present 0 > managed 0 > Node 2, zone Normal > pages free 0 > min 0 > low 0 > high 0 > spanned 0 > present 0 > managed 0 > Node 2, zone Movable > pages free 196346 > min 3540 > ... > managed 262144 > > Signed-off-by: Baoquan He <bhe@redhat.com> > --- > mm/vmstat.c | 18 +++++++++++------- > 1 file changed, 11 insertions(+), 7 deletions(-) > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 6fd1407f4632..4bbf9be786da 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -1567,13 +1567,6 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, > { > int i; > seq_printf(m, "Node %d, zone %8s", pgdat->node_id, zone->name); > - if (is_zone_first_populated(pgdat, zone)) { > - seq_printf(m, "\n per-node stats"); > - for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) { > - seq_printf(m, "\n %-12s %lu", node_stat_name(i), > - node_page_state(pgdat, i)); > - } > - } > seq_printf(m, > "\n pages free %lu" > "\n min %lu" > @@ -1648,7 +1641,18 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, > */ > static int zoneinfo_show(struct seq_file *m, void *arg) > { > + int i; > pg_data_t *pgdat = (pg_data_t *)arg; > + > + if (node_state(pgdat->node_id, N_MEMORY)) { > + seq_printf(m, "Node %d, per-node stats", pgdat->node_id); > + for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) { > + seq_printf(m, "\n %-12s %lu", node_stat_name(i), > + node_page_state(pgdat, i)); > + } > + seq_putc(m, '\n'); > + } > + > walk_zones_in_node(m, pgdat, false, false, zoneinfo_show_print); > return 0; > } > -- > 2.17.2 > > >
On 03/24/20 at 12:25pm, David Rientjes wrote: > On Tue, 24 Mar 2020, Baoquan He wrote: > > > This moving makes the layout of /proc/zoneinfo more sensible. And there > > are 4 zones at most currently, it doesn't need to scroll down much to get > > to the 1st populated zone, even though the 1st populated zone is MOVABLE > > zone. > > > > Doesn't this introduce risk that it will break existing parsers of > /proc/zoneinfo in subtle ways? > > In some cases /proc/zoneinfo is a tricky file to correctly parse because > you have to rely on the existing order in which it is printed to determine > which zone is being described. We need to print zones even with unmanaged > pages, for instance, otherwise userspace may be unaware of which zones are > supported and what order they are in. That's important to be able to > construct the proper string to use when writing vm.lowmem_reserve_ratio. > > I'd prefer not changing the order of /proc/zoneinfo if it can be avoided > just because the risk outweighs the reward that we may break some > initscript parsers. Oh, I may not describe the change and result clearly. This patch doesn't change zone order at all. I only move the per-node stats to the front of each node, the zone order is completely kept the same, still DMA, DMA32, NORMAL, MOVABLE. Before this patch, per-node stats are printed inside the first populated zone of each node. E.g in this node 2 which only has movable zone, the per-node stats is embedded in the last zone. In fact, this per-node stats are made for the whole node, but not for one zone. Node 2, zone DMA pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 protection: (0, 0, 0, 1024, 1024) Node 2, zone DMA32 pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 protection: (0, 0, 0, 1024, 1024) Node 2, zone Normal pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 protection: (0, 0, 0, 8192, 8192) Node 2, zone Movable per-node stats --------------------------->> start of per-node stats nr_inactive_anon 42 nr_active_anon 11787 nr_inactive_file 32222 nr_active_file 6081 nr_unevictable 0 nr_slab_reclaimable 0 nr_slab_unreclaimable 0 ...... --------- (mid items are omitted) nr_anon_transparent_hugepages 0 nr_unstable 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_dirtied 25523 nr_written 25113 nr_kernel_misc_reclaimable 0 ------------------------->> end of per-node stats pages free 211331 ------------------------->> start printing data of zone Movable min 3524 low 4405 high 5286 spanned 262144 present 262144 managed 262144 protection: (0, 0, 0, 0, 0) nr_free_pages 211331 nr_zone_inactive_anon 42 nr_zone_active_anon 11787 nr_zone_inactive_file 32222 nr_zone_active_file 6081 nr_zone_unevictable 0 nr_zone_write_pending 2 With this patch applied, only the per-node stats part is moved to the front of each node. Node 2, per-node stats --------------------------->> start of per-node stats nr_inactive_anon 42 nr_active_anon 12358 nr_inactive_file 33139 nr_active_file 10088 nr_unevictable 0 nr_slab_reclaimable 0 ...... --------- (mid items are omitted) nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_dirtied 9082 nr_written 8844 nr_kernel_misc_reclaimable 0 nr_foll_pin_acquired 0 nr_foll_pin_released 0 ------------------------->> end of per-node stats Node 2, zone DMA pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 Node 2, zone DMA32 pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 Node 2, zone Normal pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 Node 2, zone Movable pages free 205601 ------------------------->> start printing data of zone Movable min 3525 low 4406 high 5287 spanned 262144 present 262144 managed 262144 protection: (0, 0, 0, 0) nr_free_pages 205601 nr_zone_inactive_anon 42 nr_zone_active_anon 12358 ........
On Wed 25-03-20 13:53:31, Baoquan He wrote: > On 03/24/20 at 12:25pm, David Rientjes wrote: > > On Tue, 24 Mar 2020, Baoquan He wrote: > > > > > This moving makes the layout of /proc/zoneinfo more sensible. And there > > > are 4 zones at most currently, it doesn't need to scroll down much to get > > > to the 1st populated zone, even though the 1st populated zone is MOVABLE > > > zone. > > > > > > > Doesn't this introduce risk that it will break existing parsers of > > /proc/zoneinfo in subtle ways? > > > > In some cases /proc/zoneinfo is a tricky file to correctly parse because > > you have to rely on the existing order in which it is printed to determine > > which zone is being described. We need to print zones even with unmanaged > > pages, for instance, otherwise userspace may be unaware of which zones are > > supported and what order they are in. That's important to be able to > > construct the proper string to use when writing vm.lowmem_reserve_ratio. > > > > I'd prefer not changing the order of /proc/zoneinfo if it can be avoided > > just because the risk outweighs the reward that we may break some > > initscript parsers. > > Oh, I may not describe the change and result clearly. This patch doesn't > change zone order at all. I only move the per-node stats to the front of > each node, the zone order is completely kept the same, still DMA, DMA32, > NORMAL, MOVABLE. Even this can break existing parsers. Fixing that up is likely not hard and existing parsers would be mostly debugging hacks here and there but I do miss any actual justification except for you considering it more sensible. I do not remember this would be a common pain point for people parsing this file. If anything the overal structure of the file makes it hard to parse and your patches do not really address that. We are likely too late to make the output much more sensible TBH. That being said, I haven't looked more closely on your patches because I do not have spare cycles for that. Your justification for touching the code which seems to be working relatively well is quite weak IMHO, yet it adds a non zero risk for breaking existing parsers.
On 03/25/20 at 09:55am, Michal Hocko wrote: > On Wed 25-03-20 13:53:31, Baoquan He wrote: > > On 03/24/20 at 12:25pm, David Rientjes wrote: > > > On Tue, 24 Mar 2020, Baoquan He wrote: > > > > > > > This moving makes the layout of /proc/zoneinfo more sensible. And there > > > > are 4 zones at most currently, it doesn't need to scroll down much to get > > > > to the 1st populated zone, even though the 1st populated zone is MOVABLE > > > > zone. > > > > > > > > > > Doesn't this introduce risk that it will break existing parsers of > > > /proc/zoneinfo in subtle ways? > > > > > > In some cases /proc/zoneinfo is a tricky file to correctly parse because > > > you have to rely on the existing order in which it is printed to determine > > > which zone is being described. We need to print zones even with unmanaged > > > pages, for instance, otherwise userspace may be unaware of which zones are > > > supported and what order they are in. That's important to be able to > > > construct the proper string to use when writing vm.lowmem_reserve_ratio. > > > > > > I'd prefer not changing the order of /proc/zoneinfo if it can be avoided > > > just because the risk outweighs the reward that we may break some > > > initscript parsers. > > > > Oh, I may not describe the change and result clearly. This patch doesn't > > change zone order at all. I only move the per-node stats to the front of > > each node, the zone order is completely kept the same, still DMA, DMA32, > > NORMAL, MOVABLE. > > Even this can break existing parsers. Fixing that up is likely not hard > and existing parsers would be mostly debugging hacks here and there but > I do miss any actual justification except for you considering it more > sensible. I do not remember this would be a common pain point for people > parsing this file. If anything the overal structure of the file makes it > hard to parse and your patches do not really address that. We are likely > too late to make the output much more sensible TBH. > > That being said, I haven't looked more closely on your patches because I > do not have spare cycles for that. Your justification for touching the > code which seems to be working relatively well is quite weak IMHO, yet > it adds a non zero risk for breaking existing parsers. I would take the saying of non zero risk for breaking existing parsers. When considering this change, I thought about the possible risk. However, found out the per-node stats was added in 2016 which is not so late, and assume nobody will rely on the order of per-node stats embeded into a zone. But I have to admit any concern or worry of risk is worth being considerred carefully since /proc/zoneinfo is a classic interface. So, in view of objections from you and David, I would like to drop this patch and patch 5. It's a small improvement, not worth taking any risk. But if it goes back to this time of 2017, I would like to spend some time to defend it :-) commit e2ecc8a79ed49f7838b4fdf352c4c48cec9424ac Author: Mel Gorman <mgorman@techsingularity.net> Date: Thu Jul 28 15:47:02 2016 -0700 mm, vmstat: print node-based stats in zoneinfo file
On Wed, 25 Mar 2020, Baoquan He wrote: > > Even this can break existing parsers. Fixing that up is likely not hard > > and existing parsers would be mostly debugging hacks here and there but > > I do miss any actual justification except for you considering it more > > sensible. I do not remember this would be a common pain point for people > > parsing this file. If anything the overal structure of the file makes it > > hard to parse and your patches do not really address that. We are likely > > too late to make the output much more sensible TBH. > > > > That being said, I haven't looked more closely on your patches because I > > do not have spare cycles for that. Your justification for touching the > > code which seems to be working relatively well is quite weak IMHO, yet > > it adds a non zero risk for breaking existing parsers. > > I would take the saying of non zero risk for breaking existing parsers. > When considering this change, I thought about the possible risk. However, > found out the per-node stats was added in 2016 which is not so late, and > assume nobody will rely on the order of per-node stats embeded into a > zone. But I have to admit any concern or worry of risk is worth being > considerred carefully since /proc/zoneinfo is a classic interface. > For context, we started parsing /proc/zoneinfo in initscripts to be able to determine the order in which vm.lowmem_reserve_ratio needs to be set and this required my kernel change from 2017: commit b2bd8598195f1b2a72130592125ac6b4218988a2 Author: David Rientjes <rientjes@google.com> Date: Wed May 3 14:52:59 2017 -0700 mm, vmstat: print non-populated zones in zoneinfo Otherwise, we found, it's much more difficult to determine how this array should be structured. So at least we parse this file very carefully, I'm not sure how much others do, but it seems like an unnecessary risk for little reward. I'm happy to see it has been decided to drop this patch and patch 5. > So, in view of objections from you and David, I would like to drop this > patch and patch 5. It's a small improvement, not worth taking any risk. > But if it goes back to this time of 2017, I would like to spend some > time to defend it :-) > > commit e2ecc8a79ed49f7838b4fdf352c4c48cec9424ac > Author: Mel Gorman <mgorman@techsingularity.net> > Date: Thu Jul 28 15:47:02 2016 -0700 > > mm, vmstat: print node-based stats in zoneinfo file > > >
On 03/25/20 at 12:45pm, David Rientjes wrote: > On Wed, 25 Mar 2020, Baoquan He wrote: > > > > Even this can break existing parsers. Fixing that up is likely not hard > > > and existing parsers would be mostly debugging hacks here and there but > > > I do miss any actual justification except for you considering it more > > > sensible. I do not remember this would be a common pain point for people > > > parsing this file. If anything the overal structure of the file makes it > > > hard to parse and your patches do not really address that. We are likely > > > too late to make the output much more sensible TBH. > > > > > > That being said, I haven't looked more closely on your patches because I > > > do not have spare cycles for that. Your justification for touching the > > > code which seems to be working relatively well is quite weak IMHO, yet > > > it adds a non zero risk for breaking existing parsers. > > > > I would take the saying of non zero risk for breaking existing parsers. > > When considering this change, I thought about the possible risk. However, > > found out the per-node stats was added in 2016 which is not so late, and > > assume nobody will rely on the order of per-node stats embeded into a > > zone. But I have to admit any concern or worry of risk is worth being > > considerred carefully since /proc/zoneinfo is a classic interface. > > > > For context, we started parsing /proc/zoneinfo in initscripts to be able > to determine the order in which vm.lowmem_reserve_ratio needs to be set > and this required my kernel change from 2017: > > commit b2bd8598195f1b2a72130592125ac6b4218988a2 > Author: David Rientjes <rientjes@google.com> > Date: Wed May 3 14:52:59 2017 -0700 > > mm, vmstat: print non-populated zones in zoneinfo > > Otherwise, we found, it's much more difficult to determine how this array > should be structured. So at least we parse this file very carefully, I'm > not sure how much others do, but it seems like an unnecessary risk for > little reward. I'm happy to see it has been decided to drop this patch > and patch 5. OK, I see why it is in such a situation, the empty zones were not printed. I could still not get how vm.lowmem_reserve_ratio is set with /proc/zoneinfo in the old initscripts, do you see any risk if not filling and showing the ->lowmem_reserve[] of empty zone in patch 2 and 3? Thanks in advance.
On Thu 26-03-20 12:24:54, Baoquan He wrote: > On 03/25/20 at 12:45pm, David Rientjes wrote: > > On Wed, 25 Mar 2020, Baoquan He wrote: > > > > > > Even this can break existing parsers. Fixing that up is likely not hard > > > > and existing parsers would be mostly debugging hacks here and there but > > > > I do miss any actual justification except for you considering it more > > > > sensible. I do not remember this would be a common pain point for people > > > > parsing this file. If anything the overal structure of the file makes it > > > > hard to parse and your patches do not really address that. We are likely > > > > too late to make the output much more sensible TBH. > > > > > > > > That being said, I haven't looked more closely on your patches because I > > > > do not have spare cycles for that. Your justification for touching the > > > > code which seems to be working relatively well is quite weak IMHO, yet > > > > it adds a non zero risk for breaking existing parsers. > > > > > > I would take the saying of non zero risk for breaking existing parsers. > > > When considering this change, I thought about the possible risk. However, > > > found out the per-node stats was added in 2016 which is not so late, and > > > assume nobody will rely on the order of per-node stats embeded into a > > > zone. But I have to admit any concern or worry of risk is worth being > > > considerred carefully since /proc/zoneinfo is a classic interface. > > > > > > > For context, we started parsing /proc/zoneinfo in initscripts to be able > > to determine the order in which vm.lowmem_reserve_ratio needs to be set > > and this required my kernel change from 2017: > > > > commit b2bd8598195f1b2a72130592125ac6b4218988a2 > > Author: David Rientjes <rientjes@google.com> > > Date: Wed May 3 14:52:59 2017 -0700 > > > > mm, vmstat: print non-populated zones in zoneinfo > > > > Otherwise, we found, it's much more difficult to determine how this array > > should be structured. So at least we parse this file very carefully, I'm > > not sure how much others do, but it seems like an unnecessary risk for > > little reward. I'm happy to see it has been decided to drop this patch > > and patch 5. > > > OK, I see why it is in such a situation, the empty zones were not printed. > > I could still not get how vm.lowmem_reserve_ratio is set with > /proc/zoneinfo in the old initscripts, do you see any risk if not > filling and showing the ->lowmem_reserve[] of empty zone in > patch 2 and 3? Thanks in advance. The point is why should we even care. Displaying that information shouldn't hurt anything, right?
On 03/26/20 at 07:43am, Michal Hocko wrote: > On Thu 26-03-20 12:24:54, Baoquan He wrote: > > On 03/25/20 at 12:45pm, David Rientjes wrote: > > > On Wed, 25 Mar 2020, Baoquan He wrote: > > > > > > > > Even this can break existing parsers. Fixing that up is likely not hard > > > > > and existing parsers would be mostly debugging hacks here and there but > > > > > I do miss any actual justification except for you considering it more > > > > > sensible. I do not remember this would be a common pain point for people > > > > > parsing this file. If anything the overal structure of the file makes it > > > > > hard to parse and your patches do not really address that. We are likely > > > > > too late to make the output much more sensible TBH. > > > > > > > > > > That being said, I haven't looked more closely on your patches because I > > > > > do not have spare cycles for that. Your justification for touching the > > > > > code which seems to be working relatively well is quite weak IMHO, yet > > > > > it adds a non zero risk for breaking existing parsers. > > > > > > > > I would take the saying of non zero risk for breaking existing parsers. > > > > When considering this change, I thought about the possible risk. However, > > > > found out the per-node stats was added in 2016 which is not so late, and > > > > assume nobody will rely on the order of per-node stats embeded into a > > > > zone. But I have to admit any concern or worry of risk is worth being > > > > considerred carefully since /proc/zoneinfo is a classic interface. > > > > > > > > > > For context, we started parsing /proc/zoneinfo in initscripts to be able > > > to determine the order in which vm.lowmem_reserve_ratio needs to be set > > > and this required my kernel change from 2017: > > > > > > commit b2bd8598195f1b2a72130592125ac6b4218988a2 > > > Author: David Rientjes <rientjes@google.com> > > > Date: Wed May 3 14:52:59 2017 -0700 > > > > > > mm, vmstat: print non-populated zones in zoneinfo > > > > > > Otherwise, we found, it's much more difficult to determine how this array > > > should be structured. So at least we parse this file very carefully, I'm > > > not sure how much others do, but it seems like an unnecessary risk for > > > little reward. I'm happy to see it has been decided to drop this patch > > > and patch 5. > > > > > > OK, I see why it is in such a situation, the empty zones were not printed. > > > > I could still not get how vm.lowmem_reserve_ratio is set with > > /proc/zoneinfo in the old initscripts, do you see any risk if not > > filling and showing the ->lowmem_reserve[] of empty zone in > > patch 2 and 3? Thanks in advance. > > The point is why should we even care. Displaying that information > shouldn't hurt anything, right? Well, I would say why not. If saying anything hurted, I often check /proc/zoneinfo to get information about system memory like many people, I was wondering why the protection data is over there, but it's am empty zone, and they protect what. I dare say it's more than once I asked to myself, just sometime I am too lazy to start to make it clear when focusing on another issue. Not sure if that is kind of hurting. I would like to see it's not there to confuse me if anyone else have stood up to fix it, I absolutely will vote for it. Surely, we also need to evaluate if any risk or complexity is involved, While with my understanding, I don't see risk, and the change is quite simple and easy to understand.
diff --git a/mm/vmstat.c b/mm/vmstat.c index 6fd1407f4632..4bbf9be786da 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1567,13 +1567,6 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, { int i; seq_printf(m, "Node %d, zone %8s", pgdat->node_id, zone->name); - if (is_zone_first_populated(pgdat, zone)) { - seq_printf(m, "\n per-node stats"); - for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) { - seq_printf(m, "\n %-12s %lu", node_stat_name(i), - node_page_state(pgdat, i)); - } - } seq_printf(m, "\n pages free %lu" "\n min %lu" @@ -1648,7 +1641,18 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, */ static int zoneinfo_show(struct seq_file *m, void *arg) { + int i; pg_data_t *pgdat = (pg_data_t *)arg; + + if (node_state(pgdat->node_id, N_MEMORY)) { + seq_printf(m, "Node %d, per-node stats", pgdat->node_id); + for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) { + seq_printf(m, "\n %-12s %lu", node_stat_name(i), + node_page_state(pgdat, i)); + } + seq_putc(m, '\n'); + } + walk_zones_in_node(m, pgdat, false, false, zoneinfo_show_print); return 0; }
This moving makes the layout of /proc/zoneinfo more sensible. And there are 4 zones at most currently, it doesn't need to scroll down much to get to the 1st populated zone, even though the 1st populated zone is MOVABLE zone. Node 2, per-node stats nr_inactive_anon 48 nr_active_anon 15454 ... nr_foll_pin_acquired 0 nr_foll_pin_released 0 Node 2, zone DMA pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 Node 2, zone DMA32 pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 Node 2, zone Normal pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 Node 2, zone Movable pages free 196346 min 3540 ... managed 262144 Signed-off-by: Baoquan He <bhe@redhat.com> --- mm/vmstat.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)