Message ID | 1467102671-593138-1-git-send-email-yigal@plexistor.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Jun 28, 2016 at 1:31 AM, Yigal Korman <yigal@plexistor.com> wrote: > Before this patch, passing a range that is beyond the physical memory > range will succeed, the user will see a /dev/pmem0 and will be able to > access it. Reads will always return 0 and writes will be silently > ignored. > > I've gotten more than one bug report about mkfs.{xfs,ext4} or nvml > failing that were eventually tracked down to be wrong values passed to > memmap. > > This patch prevents the above issue by instead of adding a new memory > range, only update a RAM memory range with the PRAM type. This way, > passing the wrong memmap will either not give you a pmem at all or give > you a smaller one that actually has RAM behind it. > > And if someone still needs to fake a pmem that doesn't have RAM behind > it, they can simply do memmap=XX@YY,XX!YY. > > Signed-off-by: Yigal Korman <yigal@plexistor.com> > Acked-by: Dan Williams <dan.j.williams@intel.com> > Acked-by: Johannes Thumshirn <jthumshirn@suse.de> > Tested-by: Boaz Harrosh <boaz@plexistor.com> > --- I have some other libnvdimm fixes heading upstream shortly if x86 folks just want to ack this one...
On 06/28/16 09:33, Dan Williams wrote: > On Tue, Jun 28, 2016 at 1:31 AM, Yigal Korman <yigal@plexistor.com> wrote: >> Before this patch, passing a range that is beyond the physical memory >> range will succeed, the user will see a /dev/pmem0 and will be able to >> access it. Reads will always return 0 and writes will be silently >> ignored. >> >> I've gotten more than one bug report about mkfs.{xfs,ext4} or nvml >> failing that were eventually tracked down to be wrong values passed to >> memmap. >> >> This patch prevents the above issue by instead of adding a new memory >> range, only update a RAM memory range with the PRAM type. This way, >> passing the wrong memmap will either not give you a pmem at all or give >> you a smaller one that actually has RAM behind it. >> >> And if someone still needs to fake a pmem that doesn't have RAM behind >> it, they can simply do memmap=XX@YY,XX!YY. >> >> Signed-off-by: Yigal Korman <yigal@plexistor.com> >> Acked-by: Dan Williams <dan.j.williams@intel.com> >> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> >> Tested-by: Boaz Harrosh <boaz@plexistor.com> >> --- > > I have some other libnvdimm fixes heading upstream shortly if x86 > folks just want to ack this one... > I'm concerned about this. This would mean that memory not marked as RAM because it is persistent and you don't want the OS to accidentally clobber persistent RAM can't be added. So it seems like The Wrong Thing. If all you want is simulated pram then it shouldn't care about addresses in the first place and instead we should just specify it by quantity. -hpa
On Tue, Jun 28, 2016 at 10:58 AM, H. Peter Anvin <hpa@zytor.com> wrote: > On 06/28/16 09:33, Dan Williams wrote: >> On Tue, Jun 28, 2016 at 1:31 AM, Yigal Korman <yigal@plexistor.com> wrote: >>> Before this patch, passing a range that is beyond the physical memory >>> range will succeed, the user will see a /dev/pmem0 and will be able to >>> access it. Reads will always return 0 and writes will be silently >>> ignored. >>> >>> I've gotten more than one bug report about mkfs.{xfs,ext4} or nvml >>> failing that were eventually tracked down to be wrong values passed to >>> memmap. >>> >>> This patch prevents the above issue by instead of adding a new memory >>> range, only update a RAM memory range with the PRAM type. This way, >>> passing the wrong memmap will either not give you a pmem at all or give >>> you a smaller one that actually has RAM behind it. >>> >>> And if someone still needs to fake a pmem that doesn't have RAM behind >>> it, they can simply do memmap=XX@YY,XX!YY. >>> >>> Signed-off-by: Yigal Korman <yigal@plexistor.com> >>> Acked-by: Dan Williams <dan.j.williams@intel.com> >>> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> >>> Tested-by: Boaz Harrosh <boaz@plexistor.com> >>> --- >> >> I have some other libnvdimm fixes heading upstream shortly if x86 >> folks just want to ack this one... >> > > I'm concerned about this. This would mean that memory not marked as RAM > because it is persistent and you don't want the OS to accidentally > clobber persistent RAM can't be added. Ah true. Specifically you are worried about the case where a platform-firmware has mis-marked pmem as reserved memory (or some other type) and would like to correct it to be pram. > So it seems like The Wrong > Thing. If all you want is simulated pram then it shouldn't care about > addresses in the first place and instead we should just specify it by > quantity. Yes, agree we need an explicit "simulate pram" option independent of memmap=, or just continue to educate users that if they try to simulate pmem and specify an invalid range they get to keep all the broken pieces.
On Wed, Jun 29, 2016 at 4:09 AM, Dan Williams <dan.j.williams@intel.com> wrote: > > On Tue, Jun 28, 2016 at 10:58 AM, H. Peter Anvin <hpa@zytor.com> wrote: > > On 06/28/16 09:33, Dan Williams wrote: > >> On Tue, Jun 28, 2016 at 1:31 AM, Yigal Korman <yigal@plexistor.com> wrote: > >>> Before this patch, passing a range that is beyond the physical memory > >>> range will succeed, the user will see a /dev/pmem0 and will be able to > >>> access it. Reads will always return 0 and writes will be silently > >>> ignored. > >>> > >>> I've gotten more than one bug report about mkfs.{xfs,ext4} or nvml > >>> failing that were eventually tracked down to be wrong values passed to > >>> memmap. > >>> > >>> This patch prevents the above issue by instead of adding a new memory > >>> range, only update a RAM memory range with the PRAM type. This way, > >>> passing the wrong memmap will either not give you a pmem at all or give > >>> you a smaller one that actually has RAM behind it. > >>> > >>> And if someone still needs to fake a pmem that doesn't have RAM behind > >>> it, they can simply do memmap=XX@YY,XX!YY. > >>> > >>> Signed-off-by: Yigal Korman <yigal@plexistor.com> > >>> Acked-by: Dan Williams <dan.j.williams@intel.com> > >>> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> > >>> Tested-by: Boaz Harrosh <boaz@plexistor.com> > >>> --- > >> > >> I have some other libnvdimm fixes heading upstream shortly if x86 > >> folks just want to ack this one... > >> > > > > I'm concerned about this. This would mean that memory not marked as RAM > > because it is persistent and you don't want the OS to accidentally > > clobber persistent RAM can't be added. > > Ah true. Specifically you are worried about the case where a > platform-firmware has mis-marked pmem as reserved memory (or some > other type) and would like to correct it to be pram. As I mentioned in the patch, this is still possible by doing memmap=X@Y,X!Y Also, with fixes in grub and the kernel regarding mis-marking NVDIMMs this is much less common today. My purpose was simply to prevent a repeating user error for the common use case. > > > > So it seems like The Wrong > > Thing. If all you want is simulated pram then it shouldn't care about > > addresses in the first place and instead we should just specify it by > > quantity. > > Yes, agree we need an explicit "simulate pram" option independent of > memmap=, or just continue to educate users that if they try to > simulate pmem and specify an invalid range they get to keep all the > broken pieces. I'd love to have a simpler way to specify simulated pram, but quantity is not good enough. For my use case, for example, I need the quantity to be spread evenly over all NUMA nodes, so just getting a range "somewhere" is not good. And I can imagine other users that want to pin pram to same socket where their high speed NIC sits. So I not sure we can find a better general api than memmap and I not sure it's worth it. Thanks, Yigal
On Tue, Jun 28, 2016 at 10:12 PM, Yigal Korman <yigal@plexistor.com> wrote: > On Wed, Jun 29, 2016 at 4:09 AM, Dan Williams <dan.j.williams@intel.com> wrote: >> >> On Tue, Jun 28, 2016 at 10:58 AM, H. Peter Anvin <hpa@zytor.com> wrote: >> > On 06/28/16 09:33, Dan Williams wrote: >> >> On Tue, Jun 28, 2016 at 1:31 AM, Yigal Korman <yigal@plexistor.com> wrote: >> >>> Before this patch, passing a range that is beyond the physical memory >> >>> range will succeed, the user will see a /dev/pmem0 and will be able to >> >>> access it. Reads will always return 0 and writes will be silently >> >>> ignored. >> >>> >> >>> I've gotten more than one bug report about mkfs.{xfs,ext4} or nvml >> >>> failing that were eventually tracked down to be wrong values passed to >> >>> memmap. >> >>> >> >>> This patch prevents the above issue by instead of adding a new memory >> >>> range, only update a RAM memory range with the PRAM type. This way, >> >>> passing the wrong memmap will either not give you a pmem at all or give >> >>> you a smaller one that actually has RAM behind it. >> >>> >> >>> And if someone still needs to fake a pmem that doesn't have RAM behind >> >>> it, they can simply do memmap=XX@YY,XX!YY. >> >>> >> >>> Signed-off-by: Yigal Korman <yigal@plexistor.com> >> >>> Acked-by: Dan Williams <dan.j.williams@intel.com> >> >>> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> >> >>> Tested-by: Boaz Harrosh <boaz@plexistor.com> >> >>> --- >> >> >> >> I have some other libnvdimm fixes heading upstream shortly if x86 >> >> folks just want to ack this one... >> >> >> > >> > I'm concerned about this. This would mean that memory not marked as RAM >> > because it is persistent and you don't want the OS to accidentally >> > clobber persistent RAM can't be added. >> >> Ah true. Specifically you are worried about the case where a >> platform-firmware has mis-marked pmem as reserved memory (or some >> other type) and would like to correct it to be pram. > > > As I mentioned in the patch, this is still possible by doing memmap=X@Y,X!Y Sorry, yes I overlooked this in my response to Peter. > Also, with fixes in grub and the kernel regarding mis-marking NVDIMMs > this is much less common today. > My purpose was simply to prevent a repeating user error for the common use case. It makes sense, but at the same time it still involves a user education burden that memmap=X!Y is constrained by default. >> >> >> > So it seems like The Wrong >> > Thing. If all you want is simulated pram then it shouldn't care about >> > addresses in the first place and instead we should just specify it by >> > quantity. >> >> Yes, agree we need an explicit "simulate pram" option independent of >> memmap=, or just continue to educate users that if they try to >> simulate pmem and specify an invalid range they get to keep all the >> broken pieces. > > I'd love to have a simpler way to specify simulated pram, but quantity > is not good enough. > For my use case, for example, I need the quantity to be spread evenly > over all NUMA nodes, so just getting a range "somewhere" is not good. > And I can imagine other users that want to pin pram to same socket > where their high speed NIC sits. > So I not sure we can find a better general api than memmap and I not > sure it's worth it. I think it would be worthwhile to have something like testpmem= which takes the same parameters as memmap=, but it is constrained to RAM by default. Then it becomes clear that the user really does want a simulation configuration on RAM rather than explicitly specifying a new persistent memory range.
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 621b501..4bd4207 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -878,7 +878,7 @@ static int __init parse_memmap_one(char *p) e820_add_region(start_at, mem_size, E820_RESERVED); } else if (*p == '!') { start_at = memparse(p+1, &p); - e820_add_region(start_at, mem_size, E820_PRAM); + e820_update_range(start_at, mem_size, E820_RAM, E820_PRAM); } else e820_remove_range(mem_size, ULLONG_MAX - mem_size, E820_RAM, 1);