Message ID | cover.1731038280.git.baolin.wang@linux.alibaba.com (mailing list archive) |
---|---|
Headers | show |
Series | Support large folios for tmpfs | expand |
On 08.11.24 05:12, Baolin Wang wrote: > Traditionally, tmpfs only supported PMD-sized huge folios. However nowadays > with other file systems supporting any sized large folios, and extending > anonymous to support mTHP, we should not restrict tmpfs to allocating only > PMD-sized huge folios, making it more special. Instead, we should allow > tmpfs can allocate any sized large folios. > > Considering that tmpfs already has the 'huge=' option to control the huge > folios allocation, we can extend the 'huge=' option to allow any sized huge > folios. The semantics of the 'huge=' mount option are: > > huge=never: no any sized huge folios > huge=always: any sized huge folios > huge=within_size: like 'always' but respect the i_size > huge=advise: like 'always' if requested with fadvise()/madvise() > > Note: for tmpfs mmap() faults, due to the lack of a write size hint, still > allocate the PMD-sized huge folios if huge=always/within_size/advise is set. So, no fallback to smaller sizes for now in case we fail to allocate a PMD one? Of course, this can be added later fairly easily. > > Moreover, the 'deny' and 'force' testing options controlled by > '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the same > semantics. The 'deny' can disable any sized large folios for tmpfs, while > the 'force' can enable PMD sized large folios for tmpfs. > > Any comments and suggestions are appreciated. Thanks. > > Hi David, > I did not add a new Kconfig option to control the default behavior of 'huge=' > in the current version. I have not changed the default behavior at this > time, and let's see if there is a need for this. Likely we want to change the default at some point so people might get a benefit in more scenarios automatically. But I did not investigate how /tmp is mapped as default by Fedora, for example.
On 2024/11/8 23:30, David Hildenbrand wrote: > On 08.11.24 05:12, Baolin Wang wrote: >> Traditionally, tmpfs only supported PMD-sized huge folios. However >> nowadays >> with other file systems supporting any sized large folios, and extending >> anonymous to support mTHP, we should not restrict tmpfs to allocating >> only >> PMD-sized huge folios, making it more special. Instead, we should allow >> tmpfs can allocate any sized large folios. >> >> Considering that tmpfs already has the 'huge=' option to control the huge >> folios allocation, we can extend the 'huge=' option to allow any sized >> huge >> folios. The semantics of the 'huge=' mount option are: >> >> huge=never: no any sized huge folios >> huge=always: any sized huge folios >> huge=within_size: like 'always' but respect the i_size >> huge=advise: like 'always' if requested with fadvise()/madvise() >> >> Note: for tmpfs mmap() faults, due to the lack of a write size hint, >> still >> allocate the PMD-sized huge folios if huge=always/within_size/advise >> is set. > > So, no fallback to smaller sizes for now in case we fail to allocate a > PMD one? Of course, this can be added later fairly easily. Right. I have no strong preference on this. If no one objects, I can add a fallback to smaller large folios if the PMD sized allocation fails in the next version. >> Moreover, the 'deny' and 'force' testing options controlled by >> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the >> same >> semantics. The 'deny' can disable any sized large folios for tmpfs, while >> the 'force' can enable PMD sized large folios for tmpfs. >> >> Any comments and suggestions are appreciated. Thanks. >> >> Hi David, >> I did not add a new Kconfig option to control the default behavior of >> 'huge=' >> in the current version. I have not changed the default behavior at this >> time, and let's see if there is a need for this. > > Likely we want to change the default at some point so people might get a > benefit in more scenarios automatically. But I did not investigate how > /tmp is mapped as default by Fedora, for example. Personally, adding a cmdline to change the default value might be more useful than the Kconfig. Anyway, I still want to investigate if there is a real need.
On 09.11.24 08:12, Baolin Wang wrote: > > > On 2024/11/8 23:30, David Hildenbrand wrote: >> On 08.11.24 05:12, Baolin Wang wrote: >>> Traditionally, tmpfs only supported PMD-sized huge folios. However >>> nowadays >>> with other file systems supporting any sized large folios, and extending >>> anonymous to support mTHP, we should not restrict tmpfs to allocating >>> only >>> PMD-sized huge folios, making it more special. Instead, we should allow >>> tmpfs can allocate any sized large folios. >>> >>> Considering that tmpfs already has the 'huge=' option to control the huge >>> folios allocation, we can extend the 'huge=' option to allow any sized >>> huge >>> folios. The semantics of the 'huge=' mount option are: >>> >>> huge=never: no any sized huge folios >>> huge=always: any sized huge folios >>> huge=within_size: like 'always' but respect the i_size >>> huge=advise: like 'always' if requested with fadvise()/madvise() >>> >>> Note: for tmpfs mmap() faults, due to the lack of a write size hint, >>> still >>> allocate the PMD-sized huge folios if huge=always/within_size/advise >>> is set. >> >> So, no fallback to smaller sizes for now in case we fail to allocate a >> PMD one? Of course, this can be added later fairly easily. > > Right. I have no strong preference on this. If no one objects, I can add > a fallback to smaller large folios if the PMD sized allocation fails in > the next version. I'm fine with a staged approach, to perform this change separately. > >>> Moreover, the 'deny' and 'force' testing options controlled by >>> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the >>> same >>> semantics. The 'deny' can disable any sized large folios for tmpfs, while >>> the 'force' can enable PMD sized large folios for tmpfs. >>> >>> Any comments and suggestions are appreciated. Thanks. >>> >>> Hi David, >>> I did not add a new Kconfig option to control the default behavior of >>> 'huge=' >>> in the current version. I have not changed the default behavior at this >>> time, and let's see if there is a need for this. >> >> Likely we want to change the default at some point so people might get a >> benefit in more scenarios automatically. But I did not investigate how >> /tmp is mapped as default by Fedora, for example. > > Personally, adding a cmdline to change the default value might be more > useful than the Kconfig. Anyway, I still want to investigate if there is > a real need. Likely both will be reasonable to have. FWIW, "systemctl cat tmp.mount" on a Fedora40 system tells me "Options=mode=1777,strictatime,nosuid,nodev,size=50%%,nr_inodes=1m" To be precise: $ grep tmpfs /etc/mtab vendorfw /usr/lib/firmware/vendor tmpfs rw,relatime,mode=755,inode64 0 0 devtmpfs /dev devtmpfs rw,nosuid,size=4096k,nr_inodes=4063361,mode=755,inode64 0 0 tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0 tmpfs /run tmpfs rw,nosuid,nodev,size=6511156k,nr_inodes=819200,mode=755,inode64 0 0 tmpfs /tmp tmpfs rw,nosuid,nodev,size=16277892k,nr_inodes=1048576,inode64 0 0 tmpfs /run/user/100813 tmpfs rw,nosuid,nodev,relatime,size=3255576k,nr_inodes=813894,mode=700,uid=100813,gid=100813,inode64 0 0 Having a way to change the default will likely be extremely helpful.
On 2024/11/12 03:47, David Hildenbrand wrote: > On 09.11.24 08:12, Baolin Wang wrote: >> >> >> On 2024/11/8 23:30, David Hildenbrand wrote: >>> On 08.11.24 05:12, Baolin Wang wrote: >>>> Traditionally, tmpfs only supported PMD-sized huge folios. However >>>> nowadays >>>> with other file systems supporting any sized large folios, and >>>> extending >>>> anonymous to support mTHP, we should not restrict tmpfs to allocating >>>> only >>>> PMD-sized huge folios, making it more special. Instead, we should allow >>>> tmpfs can allocate any sized large folios. >>>> >>>> Considering that tmpfs already has the 'huge=' option to control the >>>> huge >>>> folios allocation, we can extend the 'huge=' option to allow any sized >>>> huge >>>> folios. The semantics of the 'huge=' mount option are: >>>> >>>> huge=never: no any sized huge folios >>>> huge=always: any sized huge folios >>>> huge=within_size: like 'always' but respect the i_size >>>> huge=advise: like 'always' if requested with fadvise()/madvise() >>>> >>>> Note: for tmpfs mmap() faults, due to the lack of a write size hint, >>>> still >>>> allocate the PMD-sized huge folios if huge=always/within_size/advise >>>> is set. >>> >>> So, no fallback to smaller sizes for now in case we fail to allocate a >>> PMD one? Of course, this can be added later fairly easily. >> >> Right. I have no strong preference on this. If no one objects, I can add >> a fallback to smaller large folios if the PMD sized allocation fails in >> the next version. > > I'm fine with a staged approach, to perform this change separately. Sure. >>>> Moreover, the 'deny' and 'force' testing options controlled by >>>> '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the >>>> same >>>> semantics. The 'deny' can disable any sized large folios for tmpfs, >>>> while >>>> the 'force' can enable PMD sized large folios for tmpfs. >>>> >>>> Any comments and suggestions are appreciated. Thanks. >>>> >>>> Hi David, >>>> I did not add a new Kconfig option to control the default behavior of >>>> 'huge=' >>>> in the current version. I have not changed the default behavior at this >>>> time, and let's see if there is a need for this. >>> >>> Likely we want to change the default at some point so people might get a >>> benefit in more scenarios automatically. But I did not investigate how >>> /tmp is mapped as default by Fedora, for example. >> >> Personally, adding a cmdline to change the default value might be more >> useful than the Kconfig. Anyway, I still want to investigate if there is >> a real need. > > Likely both will be reasonable to have. > > FWIW, "systemctl cat tmp.mount" on a Fedora40 system tells me > "Options=mode=1777,strictatime,nosuid,nodev,size=50%%,nr_inodes=1m" > > To be precise: > > $ grep tmpfs /etc/mtab > vendorfw /usr/lib/firmware/vendor tmpfs rw,relatime,mode=755,inode64 0 0 > devtmpfs /dev devtmpfs > rw,nosuid,size=4096k,nr_inodes=4063361,mode=755,inode64 0 0 > tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0 > tmpfs /run tmpfs > rw,nosuid,nodev,size=6511156k,nr_inodes=819200,mode=755,inode64 0 0 > tmpfs /tmp tmpfs > rw,nosuid,nodev,size=16277892k,nr_inodes=1048576,inode64 0 0 > tmpfs /run/user/100813 tmpfs > rw,nosuid,nodev,relatime,size=3255576k,nr_inodes=813894,mode=700,uid=100813,gid=100813,inode64 0 0 > > > Having a way to change the default will likely be extremely helpful. Thanks. I'd like to add a command line option like 'transparent_hugepage_shmem' to control the default value.