Message ID | 20191209194305.20828-3-sjpark@amazon.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xenbus/backend: Add a memory pressure handler callback | expand |
On 09.12.19 20:43, SeongJae Park wrote: > From: SeongJae Park <sjpark@amazon.de> > > Each `blkif` has a free pages pool for the grant mapping. The size of > the pool starts from zero and be increased on demand while processing > the I/O requests. If current I/O requests handling is finished or 100 > milliseconds has passed since last I/O requests handling, it checks and > shrinks the pool to not exceed the size limit, `max_buffer_pages`. > > Therefore, `blkfront` running guests can cause a memory pressure in the > `blkback` running guest by attaching a large number of block devices and > inducing I/O. System administrators can avoid such problematic > situations by limiting the maximum number of devices each guest can > attach. However, finding the optimal limit is not so easy. Improper > set of the limit can results in the memory pressure or a resource > underutilization. This commit avoids such problematic situations by > squeezing the pools (returns every free page in the pool to the system) > for a while (users can set this duration via a module parameter) if a > memory pressure is detected. > > Discussions > =========== > > The `blkback`'s original shrinking mechanism returns only pages in the > pool, which are not currently be used by `blkback`, to the system. In > other words, the pages are not mapped with foreign pages. Because this > commit is changing only the shrink limit but uses the mechanism as is, > this commit does not introduce improper mappings related security > issues. > > Once a memory pressure is detected, this commit keeps the squeezing > limit for a user-specified time duration. The duration should be > neither too long nor too short. If it is too long, the squeezing > incurring overhead can reduce the I/O performance. If it is too short, > `blkback` will not free enough pages to reduce the memory pressure. > This commit sets the value as `10 milliseconds` by default because it is > a short time in terms of I/O while it is a long time in terms of memory > operations. Also, as the original shrinking mechanism works for at > least every 100 milliseconds, this could be a somewhat reasonable > choice. I also tested other durations (refer to the below section for > more details) and confirmed that 10 milliseconds is the one that works > best with the test. That said, the proper duration depends on actual > configurations and workloads. That's why this commit is allowing users > to set it as their optimal value via the module parameter. > > Memory Pressure Test > ==================== > > To show how this commit fixes the memory pressure situation well, I > configured a test environment on a xen-running virtualization system. > On the `blkfront` running guest instances, I attach a large number of > network-backed volume devices and induce I/O to those. Meanwhile, I > measure the number of pages that swapped in and out on the `blkback` > running guest. The test ran twice, once for the `blkback` before this > commit and once for that after this commit. As shown below, this commit > has dramatically reduced the memory pressure: > > pswpin pswpout > before 76,672 185,799 > after 212 3,325 > > Optimal Aggressive Shrinking Duration > ------------------------------------- > > To find a best squeezing duration, I repeated the test with three > different durations (1ms, 10ms, and 100ms). The results are as below: > > duration pswpin pswpout > 1 852 6,424 > 10 212 3,325 > 100 203 3,340 > > As expected, the memory pressure has decreased as the duration is > increased, but the reduction stopped from the `10ms`. Based on this > results, I chose the default duration as 10ms. > > Performance Overhead Test > ========================= > > This commit could incur I/O performance degradation under severe memory > pressure because the squeezing will require more page allocations per > I/O. To show the overhead, I artificially made a worst-case squeezing > situation and measured the I/O performance of a `blkfront` running > guest. > > For the artificial squeezing, I set the `blkback.max_buffer_pages` using > the `/sys/module/xen_blkback/parameters/max_buffer_pages` file. We set > the value to `1024` and `0`. The `1024` is the default value. Setting > the value as `0` is same to a situation doing the squeezing always > (worst-case). > > For the I/O performance measurement, I use a simple `dd` command. > > Default Performance > ------------------- > > [dom0]# echo 1024 > /sys/module/xen_blkback/parameters/max_buffer_pages > [instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k count=$((256*512)); sync; done > 131072+0 records in > 131072+0 records out > 536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s > 131072+0 records in > 131072+0 records out > 536870912 bytes (537 MB) copied, 13.8827 s, 38.7 MB/s > 131072+0 records in > 131072+0 records out > 536870912 bytes (537 MB) copied, 13.8781 s, 38.7 MB/s > 131072+0 records in > 131072+0 records out > 536870912 bytes (537 MB) copied, 13.8737 s, 38.7 MB/s > 131072+0 records in > 131072+0 records out > 536870912 bytes (537 MB) copied, 13.8702 s, 38.7 MB/s > > Worst-case Performance > ---------------------- > > [dom0]# echo 0 > /sys/module/xen_blkback/parameters/max_buffer_pages > [instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k count=$((256*512)); sync; done > 131072+0 records in > 131072+0 records out > 536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s > 131072+0 records in > 131072+0 records out > 536870912 bytes (537 MB) copied, 13.878 s, 38.7 MB/s > 131072+0 records in > 131072+0 records out > 536870912 bytes (537 MB) copied, 13.8746 s, 38.7 MB/s > 131072+0 records in > 131072+0 records out > 536870912 bytes (537 MB) copied, 13.8786 s, 38.7 MB/s > 131072+0 records in > 131072+0 records out > 536870912 bytes (537 MB) copied, 13.8749 s, 38.7 MB/s > > In short, even worst case squeezing makes no visible performance > degradation. I think this is due to the slow speed of the I/O. In > other words, the additional page allocation overhead is hidden under the > much slower I/O latency. > > Nevertheless, pleaset note that this is just a very simple and minimal > test. > > Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Juergen Gross <jgross@suse.com> When dropping the domid parameter you can keep the Reviewed-by, of course. Juergen
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index fd1e19f1a49f..7e3994c20843 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -142,6 +142,22 @@ static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt) HZ * xen_blkif_pgrant_timeout); } +/* Once a memory pressure is detected, squeeze free page pools for a while. */ +static int xen_blkif_buffer_squeeze_duration_ms = 10; +module_param_named(buffer_squeeze_duration_ms, + xen_blkif_buffer_squeeze_duration_ms, int, 0644); +MODULE_PARM_DESC(buffer_squeeze_duration_ms, +"Duration in ms to squeeze pages buffer when a memory pressure is detected"); + +static unsigned long xen_blk_buffer_squeeze_end; + +unsigned xen_blkbk_reclaim(struct xenbus_device *dev, domid_t domid) +{ + xen_blk_buffer_squeeze_end = jiffies + + msecs_to_jiffies(xen_blkif_buffer_squeeze_duration_ms); + return 0; +} + static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page) { unsigned long flags; @@ -656,8 +672,11 @@ int xen_blkif_schedule(void *arg) ring->next_lru = jiffies + msecs_to_jiffies(LRU_INTERVAL); } - /* Shrink if we have more than xen_blkif_max_buffer_pages */ - shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); + /* Shrink the free pages pool if it is too large. */ + if (time_before(jiffies, xen_blk_buffer_squeeze_end)) + shrink_free_pagepool(ring, 0); + else + shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); if (log_stats && time_after(jiffies, ring->st_print)) print_stats(ring); diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h index 1d3002d773f7..de5c6a301f44 100644 --- a/drivers/block/xen-blkback/common.h +++ b/drivers/block/xen-blkback/common.h @@ -383,6 +383,7 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id); int xen_blkif_schedule(void *arg); int xen_blkif_purge_persistent(void *arg); void xen_blkbk_free_caches(struct xen_blkif_ring *ring); +unsigned xen_blkbk_reclaim(struct xenbus_device *dev, domid_t domid); int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt, struct backend_info *be, int state); diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index b90dbcd99c03..de49a09e6933 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -1115,7 +1115,8 @@ static struct xenbus_driver xen_blkbk_driver = { .ids = xen_blkbk_ids, .probe = xen_blkbk_probe, .remove = xen_blkbk_remove, - .otherend_changed = frontend_changed + .otherend_changed = frontend_changed, + .reclaim = xen_blkbk_reclaim }; int xen_blkif_xenbus_init(void)