Message ID | 1536957299-43536-1-git-send-email-yang.shi@linux.alibaba.com (mailing list archive) |
---|---|
Headers | show |
Series | mm: zap pages with read mmap_sem in munmap for large mapping | expand |
On Sat, Sep 15, 2018 at 04:34:56AM +0800, Yang Shi wrote: > Regression and performance data: > Did the below regression test with setting thresh to 4K manually in the code: > * Full LTP > * Trinity (munmap/all vm syscalls) > * Stress-ng: mmap/mmapfork/mmapfixed/mmapaddr/mmapmany/vm > * mm-tests: kernbench, phpbench, sysbench-mariadb, will-it-scale > * vm-scalability > > With the patches, exclusive mmap_sem hold time when munmap a 80GB address > space on a machine with 32 cores of E5-2680 @ 2.70GHz dropped to us level > from second. > > munmap_test-15002 [008] 594.380138: funcgraph_entry: | __vm_munmap { > munmap_test-15002 [008] 594.380146: funcgraph_entry: !2485684 us | unmap_region(); > munmap_test-15002 [008] 596.865836: funcgraph_exit: !2485692 us | } > > Here the excution time of unmap_region() is used to evaluate the time of > holding read mmap_sem, then the remaining time is used with holding > exclusive lock. Something I've been wondering about for a while is whether we should "sort" the readers together. ie if the acquirers look like this: A write B read C read D write E read F read G write then we should grant the lock to A, BCEF, D, G rather than A, BC, D, EF, G. A quick way to test this is in __rwsem_down_read_failed_common do something like: - if (list_empty(&sem->wait_list)) + if (list_empty(&sem->wait_list)) { adjustment += RWSEM_WAITING_BIAS; + list_add(&waiter.list, &sem->wait_list); + } else { + struct rwsem_waiter *first = list_first_entry(&sem->wait_list, + struct rwsem_waiter, list); + if (first.type == RWSEM_WAITING_FOR_READ) + list_add(&waiter.list, &sem->wait_list); + else + list_add_tail(&waiter.list, &sem->wait_list); + } - list_add_tail(&waiter.list, &sem->wait_list); It'd be interesting to know if this makes any difference with your tests. (this isn't perfect, of course; it'll fail to sort readers together if there's a writer at the head of the queue; eg: A write B write C read D write E read F write G read but it won't do any worse than we have at the moment).
On 9/15/18 3:10 AM, Matthew Wilcox wrote: > On Sat, Sep 15, 2018 at 04:34:56AM +0800, Yang Shi wrote: >> Regression and performance data: >> Did the below regression test with setting thresh to 4K manually in the code: >> * Full LTP >> * Trinity (munmap/all vm syscalls) >> * Stress-ng: mmap/mmapfork/mmapfixed/mmapaddr/mmapmany/vm >> * mm-tests: kernbench, phpbench, sysbench-mariadb, will-it-scale >> * vm-scalability >> >> With the patches, exclusive mmap_sem hold time when munmap a 80GB address >> space on a machine with 32 cores of E5-2680 @ 2.70GHz dropped to us level >> from second. >> >> munmap_test-15002 [008] 594.380138: funcgraph_entry: | __vm_munmap { >> munmap_test-15002 [008] 594.380146: funcgraph_entry: !2485684 us | unmap_region(); >> munmap_test-15002 [008] 596.865836: funcgraph_exit: !2485692 us | } >> >> Here the excution time of unmap_region() is used to evaluate the time of >> holding read mmap_sem, then the remaining time is used with holding >> exclusive lock. > Something I've been wondering about for a while is whether we should "sort" > the readers together. ie if the acquirers look like this: > > A write > B read > C read > D write > E read > F read > G write > > then we should grant the lock to A, BCEF, D, G rather than A, BC, D, EF, G. I'm not sure how much this can help to the real world workload. Typically, there are multi threads to contend for one mmap_sem. So, they are trying to read/write the same address space. There might be dependency or synchronization among them. Sorting read together might break the dependency? Thanks, Yang > A quick way to test this is in __rwsem_down_read_failed_common do > something like: > > - if (list_empty(&sem->wait_list)) > + if (list_empty(&sem->wait_list)) { > adjustment += RWSEM_WAITING_BIAS; > + list_add(&waiter.list, &sem->wait_list); > + } else { > + struct rwsem_waiter *first = list_first_entry(&sem->wait_list, > + struct rwsem_waiter, list); > + if (first.type == RWSEM_WAITING_FOR_READ) > + list_add(&waiter.list, &sem->wait_list); > + else > + list_add_tail(&waiter.list, &sem->wait_list); > + } > - list_add_tail(&waiter.list, &sem->wait_list); > > It'd be interesting to know if this makes any difference with your tests. > > (this isn't perfect, of course; it'll fail to sort readers together if there's > a writer at the head of the queue; eg: > > A write > B write > C read > D write > E read > F write > G read > > but it won't do any worse than we have at the moment).
On Mon, Sep 17, 2018 at 01:00:58PM -0700, Yang Shi wrote: > On 9/15/18 3:10 AM, Matthew Wilcox wrote: > > Something I've been wondering about for a while is whether we should "sort" > > the readers together. ie if the acquirers look like this: > > > > A write > > B read > > C read > > D write > > E read > > F read > > G write > > > > then we should grant the lock to A, BCEF, D, G rather than A, BC, D, EF, G. > > I'm not sure how much this can help to the real world workload. > > Typically, there are multi threads to contend for one mmap_sem. So, they are > trying to read/write the same address space. There might be dependency or > synchronization among them. Sorting read together might break the > dependency? I don't think that's true for the mmap_sem. If one thread is trying to get the sem for read then it's a page fault. Another thread trying to get the sem for write is trying to modify the address space. If an application depends on the ordering of an mmap vs a page fault, it has to have its own synchronisation.