Message ID | AM3PR05MB0935AABF569F15EA846B8E72DC000@AM3PR05MB0935.eurprd05.prod.outlook.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
On Wed, Mar 18, 2015 at 10:39 AM, Shachar Raindel <raindel@mellanox.com> wrote: > Date: Sun, 04 Jan 2015 18:30:32 +0200 > Subject: [PATCH] IB/core: Prevent integer overflow in ib_umem_get address arithmetic Just so we're clear, this bug has been known since January 4, and it's getting sent upstream now? I assume we want it in 4.0 and -stable? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogUm9sYW5kIERyZWllciBb bWFpbHRvOnJvbGFuZEBwdXJlc3RvcmFnZS5jb21dDQo+IFNlbnQ6IFdlZG5lc2RheSwgQXByaWwg MDEsIDIwMTUgODoyOCBQTQ0KPiBUbzogU2hhY2hhciBSYWluZGVsDQo+IENjOiBvc3Mtc2VjdXJp dHlAbGlzdHMub3BlbndhbGwuY29tOyA8bGludXgtcmRtYUB2Z2VyLmtlcm5lbC5vcmc+DQo+IChs aW51eC1yZG1hQHZnZXIua2VybmVsLm9yZyk7IHN0YWJsZUB2Z2VyLmtlcm5lbC5vcmcNCj4gU3Vi amVjdDogUmU6IENWRS0yMDE0LTgxNTkga2VybmVsOiBpbmZpbmliYW5kOiB1dmVyYnM6IHVucHJv dGVjdGVkDQo+IHBoeXNpY2FsIG1lbW9yeSBhY2Nlc3MNCj4gDQo+IE9uIFdlZCwgTWFyIDE4LCAy MDE1IGF0IDEwOjM5IEFNLCBTaGFjaGFyIFJhaW5kZWwgPHJhaW5kZWxAbWVsbGFub3guY29tPg0K PiB3cm90ZToNCj4gPiBEYXRlOiBTdW4sIDA0IEphbiAyMDE1IDE4OjMwOjMyICswMjAwDQo+ID4g U3ViamVjdDogW1BBVENIXSBJQi9jb3JlOiBQcmV2ZW50IGludGVnZXIgb3ZlcmZsb3cgaW4gaWJf dW1lbV9nZXQNCj4gYWRkcmVzcyBhcml0aG1ldGljDQo+IA0KPiBKdXN0IHNvIHdlJ3JlIGNsZWFy LCB0aGlzIGJ1ZyBoYXMgYmVlbiBrbm93biBzaW5jZSBKYW51YXJ5IDQsIGFuZCBpdCdzDQo+IGdl dHRpbmcgc2VudCB1cHN0cmVhbSBub3c/DQo+IA0KDQpUaGlzIGJ1ZyBoYXMgYmVlbiBrbm93biBz aW5jZSBKYW51YXJ5LCB5ZXMuIFRvIHJlZHVjZSB0aGUgd2luZG93IG9mDQp2dWxuZXJhYmlsaXR5 LCB3ZSBwcml2YXRlbHkgcmVwb3J0ZWQgaXQgdG8gbWFqb3IgZGlzdHJpYnV0aW9ucyBmaXJzdCwN CmFsaWduaW5nIGV2ZXJ5b25lIG9uIHRoZSBzYW1lIHJlbGVhc2UgZGF0ZS4gU3VjaCBzeW5jaHJv bml6YXRpb24gdGFrZXMNCnRpbWUuIEdpdmVuIHRoYXQgdGhlIGJ1ZyB3YXMgbm90LCB0byB0aGUg YmVzdCBvZiBvdXIga25vd2xlZGdlLA0KZXhwbG9pdGVkIGluIHRoZSB3aWxkLCB3ZSBiZWxpZXZl ZCB0aGF0IDMgbW9udGhzIG9mIGVtYmFyZ28gaXMgbm90DQpleHViZXJhdGVkLiBUaGlzIGlzIGEg Y29tbW9uIHByYWN0aWNlIGluIHRoZSBzZWN1cml0eSBpbmR1c3RyeSwgY2FsbGVkDQoicmVzcG9u c2libGUgZGlzY2xvc3VyZS4iDQoNCkZvbGxvd2luZyB0aGUga2VybmVsICBzZWN1cml0eSBidWdz IHBvbGljeSBbMV0sIHdlIHJlcG9ydGVkIGl0IHRvIA0KdGhlIGtlcm5lbCBzZWN1cml0eSBjb250 YWN0cyBmZXcgZGF5cyBiZWZvcmUgbWFraW5nIHRoZSBpc3N1ZSBwdWJsaWMuDQpGZXcgZGF5cyBh ZnRlciBpc3N1ZSBiZWNhbWUgcHVibGljLCB3ZSBwdWJsaXNoZWQgYSBjbGVhciByZXBvcnQgdG8g YWxsDQpvZiB0aGUgcmVsZXZhbnQgbWFpbGluZyBsaXN0cy4NCg0KPiBJIGFzc3VtZSB3ZSB3YW50 IGl0IGluIDQuMCBhbmQgLXN0YWJsZT8NCg0KWWVzLiBTaG91bGQgYmUgYXBwbGllZCBBU0FQIHRv IGFsbCBrZXJuZWxzIHRoYXQgY29udGFpbiB1dmVyYnMuDQpXZSBwdXQgc29tZSBlZmZvcnQgdG8g bWFrZSBzdXJlIHRoZSBwYXRjaCB3aWxsIGFwcGx5IGNsZWFubHkgb24gYWxsDQpoaXN0b3JpY2Fs IHZlcnNpb25zIG9mIHRoZSBrZXJuZWwsIHNvIGJhY2twb3J0aW5nIHNob3VsZCBiZSBzaW1wbGUu DQoNCg0KVGhhbmtzLA0KLS1TaGFjaGFyDQoNClsxXSBodHRwczovL3d3dy5rZXJuZWwub3JnL2Rv Yy9Eb2N1bWVudGF0aW9uL1NlY3VyaXR5QnVncw0K -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit : > Hi, > > It was found that the Linux kernel's InfiniBand/RDMA subsystem did not > properly sanitize input parameters while registering memory regions > from user space via the (u)verbs API. A local user with access to > a /dev/infiniband/uverbsX device could use this flaw to crash the > system or, potentially, escalate their privileges on the system. > > The issue has been assigned CVE-2014-8159. > > The issue exists in the InfiniBand/RDMA/iWARP drivers since Linux > Kernel version 2.6.13. > > Mellanox OFED 2.4-1.0.4 fixes the issue. Available from: > http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers > > RedHat errata: https://access.redhat.com/security/cve/CVE-2014-8159 > Canonical errata: http://people.canonical.com/~ubuntu-security/cve/2014/CVE-2014-8159.html > Novell (Suse) bug tracking: https://bugzilla.novell.com/show_bug.cgi?id=914742 > > > The following patch fixes the issue: > > --------------- 8< ------------------------------ > > From d4d68430d4a12c569e28b4f4468284ea22111186 Mon Sep 17 00:00:00 2001 > From: Shachar Raindel <raindel@mellanox.com> > Date: Sun, 04 Jan 2015 18:30:32 +0200 > Subject: [PATCH] IB/core: Prevent integer overflow in ib_umem_get address arithmetic > > Properly verify that the resulting page aligned end address is larger > than both the start address and the length of the memory area > requested. > > Both the start and length arguments for ib_umem_get are controlled by > the user. A misbehaving user can provide values which will cause an > integer overflow when calculating the page aligned end address. > > This overflow can cause also miscalculation of the number of pages > mapped, and additional logic issues. > > Signed-off-by: Shachar Raindel <raindel@mellanox.com> > Signed-off-by: Jack Morgenstein <jackm@mellanox.com> > Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> > --- > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c > index aec7a6a..8c014b5 100644 > --- a/drivers/infiniband/core/umem.c > +++ b/drivers/infiniband/core/umem.c > @@ -99,6 +99,14 @@ > if (dmasync) > dma_set_attr(DMA_ATTR_WRITE_BARRIER, &attrs); > > + /* > + * If the combination of the addr and size requested for this memory > + * region causes an integer overflow, return error. > + */ > + if ((PAGE_ALIGN(addr + size) <= size) || > + (PAGE_ALIGN(addr + size) <= addr)) > + return ERR_PTR(-EINVAL); > + Can access_ok() be used here ? if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ, addr, size)) return ERR_PTR(-EINVAL); > if (!can_do_mlock()) > return ERR_PTR(-EPERM); > Regards.
Hi, > -----Original Message----- > From: Yann Droneaud [mailto:ydroneaud@opteya.com] > Sent: Thursday, April 02, 2015 1:05 PM > To: Shachar Raindel > Cc: oss-security@lists.openwall.com; <linux-rdma@vger.kernel.org> > (linux-rdma@vger.kernel.org); linux-kernel@vger.kernel.org; > stable@vger.kernel.org > Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected > physical memory access > > Hi, > > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit : > > Hi, > > <snipped long e-mail> > > + /* > > + * If the combination of the addr and size requested for this > memory > > + * region causes an integer overflow, return error. > > + */ > > + if ((PAGE_ALIGN(addr + size) <= size) || > > + (PAGE_ALIGN(addr + size) <= addr)) > > + return ERR_PTR(-EINVAL); > > + > > Can access_ok() be used here ? > > if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ, > addr, size)) > return ERR_PTR(-EINVAL); > No, this will break the current ODP semantics. ODP allows the user to register memory that is not accessible yet. This is a critical design feature, as it allows avoiding holding a registration cache. Adding this check will break the behavior, forcing memory to be all accessible when registering an ODP MR. Thanks, --Shachar
Hi, Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit : > > -----Original Message----- > > From: Yann Droneaud [mailto:ydroneaud@opteya.com] > > Sent: Thursday, April 02, 2015 1:05 PM > > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit : > > > + /* > > > + * If the combination of the addr and size requested for this > > memory > > > + * region causes an integer overflow, return error. > > > + */ > > > + if ((PAGE_ALIGN(addr + size) <= size) || > > > + (PAGE_ALIGN(addr + size) <= addr)) > > > + return ERR_PTR(-EINVAL); > > > + > > > > Can access_ok() be used here ? > > > > if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ, > > addr, size)) > > return ERR_PTR(-EINVAL); > > > > No, this will break the current ODP semantics. > > ODP allows the user to register memory that is not accessible yet. > This is a critical design feature, as it allows avoiding holding > a registration cache. Adding this check will break the behavior, > forcing memory to be all accessible when registering an ODP MR. > Where's the check for the range being in userspace memory space, especially for the ODP case ? For non ODP case (eg. plain old behavior), does get_user_pages() ensure the requested pages fit in userspace region on all architectures ? I think so. In ODP case, I'm not sure such check is ever done ? (Aside, does it take special mesure to protect shared mapping from being read and/or *written* ?) Regards.
On 02/04/2015 16:30, Yann Droneaud wrote: > Hi, > > Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit : >>> -----Original Message----- >>> From: Yann Droneaud [mailto:ydroneaud@opteya.com] >>> Sent: Thursday, April 02, 2015 1:05 PM >>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit : > >>>> + /* >>>> + * If the combination of the addr and size requested for this >>> memory >>>> + * region causes an integer overflow, return error. >>>> + */ >>>> + if ((PAGE_ALIGN(addr + size) <= size) || >>>> + (PAGE_ALIGN(addr + size) <= addr)) >>>> + return ERR_PTR(-EINVAL); >>>> + >>> >>> Can access_ok() be used here ? >>> >>> if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ, >>> addr, size)) >>> return ERR_PTR(-EINVAL); >>> >> >> No, this will break the current ODP semantics. >> >> ODP allows the user to register memory that is not accessible yet. >> This is a critical design feature, as it allows avoiding holding >> a registration cache. Adding this check will break the behavior, >> forcing memory to be all accessible when registering an ODP MR. >> > > Where's the check for the range being in userspace memory space, > especially for the ODP case ? > > For non ODP case (eg. plain old behavior), does get_user_pages() > ensure the requested pages fit in userspace region on all > architectures ? I think so. Yes, get_user_pages will return a smaller amount of pages than requested if it encounters an unmapped region (or a region without write permissions for write requests). If this happens, the loop in ib_umem_get calls get_user_pages again with the next set of pages, and this time if it the first page still cannot be mapped an error is returned. > > In ODP case, I'm not sure such check is ever done ? In ODP, we also call get_user_pages, but only when a page fault occurs (see ib_umem_odp_map_dma_pages()). This allows the user to pre-register a memory region that contains unmapped virtual space, and then mmap different files into that area without needing to re-register. > (Aside, does it take special mesure to protect shared mapping from > being read and/or *written* ?) I'm not sure I understand the question. Shared mappings that the process is allowed to read or write are also allowed for the HCA (specifically, to local and remote operations the same process performs using the HCA), provided the application has registered their virtual address space as a memory region. Regards, Haggai -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 2, 2015 at 12:52 AM, Shachar Raindel <raindel@mellanox.com> wrote: > This is a common practice in the security industry, called > "responsible disclosure." > > Following the kernel security bugs policy [1], we reported it to > the kernel security contacts few days before making the issue public. > Few days after issue became public, we published a clear report to all > of the relevant mailing lists. Isn't the point of responsible disclosure to delay disclosure until a fix is in place? What's the point of sending a notification to the kernel security team if you're going to disclose publicly before the upstream kernel is fixed? - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Haggai, Le jeudi 02 avril 2015 à 18:18 +0300, Haggai Eran a écrit : > On 02/04/2015 16:30, Yann Droneaud wrote: > > Hi, > > > > Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit : > >>> -----Original Message----- > >>> From: Yann Droneaud [mailto:ydroneaud@opteya.com] > >>> Sent: Thursday, April 02, 2015 1:05 PM > >>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit : > > > >>>> + /* > >>>> + * If the combination of the addr and size requested for this > >>> memory > >>>> + * region causes an integer overflow, return error. > >>>> + */ > >>>> + if ((PAGE_ALIGN(addr + size) <= size) || > >>>> + (PAGE_ALIGN(addr + size) <= addr)) > >>>> + return ERR_PTR(-EINVAL); > >>>> + > >>> > >>> Can access_ok() be used here ? > >>> > >>> if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ, > >>> addr, size)) > >>> return ERR_PTR(-EINVAL); > >>> > >> > >> No, this will break the current ODP semantics. > >> > >> ODP allows the user to register memory that is not accessible yet. > >> This is a critical design feature, as it allows avoiding holding > >> a registration cache. Adding this check will break the behavior, > >> forcing memory to be all accessible when registering an ODP MR. > >> > > > > Where's the check for the range being in userspace memory space, > > especially for the ODP case ? > > > > For non ODP case (eg. plain old behavior), does get_user_pages() > > ensure the requested pages fit in userspace region on all > > architectures ? I think so. > > Yes, get_user_pages will return a smaller amount of pages than requested > if it encounters an unmapped region (or a region without write > permissions for write requests). If this happens, the loop in > ib_umem_get calls get_user_pages again with the next set of pages, and > this time if it the first page still cannot be mapped an error is returned. > > > > > In ODP case, I'm not sure such check is ever done ? > > In ODP, we also call get_user_pages, but only when a page fault occurs > (see ib_umem_odp_map_dma_pages()). This allows the user to pre-register > a memory region that contains unmapped virtual space, and then mmap > different files into that area without needing to re-register. > OK, thanks for the description. > > (Aside, does it take special mesure to protect shared mapping from > > being read and/or *written* ?) > > I'm not sure I understand the question. Shared mappings that the process > is allowed to read or write are also allowed for the HCA (specifically, > to local and remote operations the same process performs using the HCA), > provided the application has registered their virtual address space as a > memory region. > I was refering to description of get_user_pages(): http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/mm/gup.c?id=v4.0-rc6#n765 * @force: whether to force access even when user mapping is currently * protected (but never forces write access to shared mapping). But since ib_umem_odp_map_dma_pages() use get_user_pages() with force argument set to 0, it's OK. Another related question: as the large memory range could be registered by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND), what's prevent the kernel to map a file as the result of mmap(0, ...) in this region, making it available remotely through IBV_WR_RDMA_READ / IBV_WR_RDMA_WRITE ? Again, thanks for the information I was missing to understand how ODP is checking the memory ranges. Regards.
> -----Original Message----- > From: Roland Dreier [mailto:roland@purestorage.com] > Sent: Thursday, April 02, 2015 7:33 PM > To: Shachar Raindel > Cc: oss-security@lists.openwall.com; <linux-rdma@vger.kernel.org> > (linux-rdma@vger.kernel.org); stable@vger.kernel.org > Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected > physical memory access > > On Thu, Apr 2, 2015 at 12:52 AM, Shachar Raindel <raindel@mellanox.com> > wrote: > > This is a common practice in the security industry, called > > "responsible disclosure." > > > > Following the kernel security bugs policy [1], we reported it to > > the kernel security contacts few days before making the issue public. > > Few days after issue became public, we published a clear report to all > > of the relevant mailing lists. > > Isn't the point of responsible disclosure to delay disclosure until a > fix is in place? What's the point of sending a notification to the > kernel security team if you're going to disclose publicly before the > upstream kernel is fixed? > We delayed the disclosure until most major Linux vendors released a fix for the issue, give or take in synchronization. The Linux security contact list only guarantee secrecy for 7 days. We therefore contacted them only close to the date at which fixes were going to be released, to follow their expectations for period of time between contact and public disclosure. Thanks, --Shachar
> -----Original Message----- > From: Yann Droneaud [mailto:ydroneaud@opteya.com] > Sent: Thursday, April 02, 2015 7:35 PM > To: Haggai Eran > Cc: Shachar Raindel; Sagi Grimberg; oss-security@lists.openwall.com; > <linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org); linux- > kernel@vger.kernel.org; stable@vger.kernel.org > Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected > physical memory access > > Hi Haggai, > > Le jeudi 02 avril 2015 à 18:18 +0300, Haggai Eran a écrit : > > On 02/04/2015 16:30, Yann Droneaud wrote: > > > Hi, > > > > > > Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit : > > >>> -----Original Message----- > > >>> From: Yann Droneaud [mailto:ydroneaud@opteya.com] > > >>> Sent: Thursday, April 02, 2015 1:05 PM > > >>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit : > > > > > >>>> + /* > > >>>> + * If the combination of the addr and size requested for this > > >>> memory > > >>>> + * region causes an integer overflow, return error. > > >>>> + */ > > >>>> + if ((PAGE_ALIGN(addr + size) <= size) || > > >>>> + (PAGE_ALIGN(addr + size) <= addr)) > > >>>> + return ERR_PTR(-EINVAL); > > >>>> + > > >>> > > >>> Can access_ok() be used here ? > > >>> > > >>> if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ, > > >>> addr, size)) > > >>> return ERR_PTR(-EINVAL); > > >>> > > >> > > >> No, this will break the current ODP semantics. > > >> > > >> ODP allows the user to register memory that is not accessible yet. > > >> This is a critical design feature, as it allows avoiding holding > > >> a registration cache. Adding this check will break the behavior, > > >> forcing memory to be all accessible when registering an ODP MR. > > >> > > > > > > Where's the check for the range being in userspace memory space, > > > especially for the ODP case ? > > > > > > For non ODP case (eg. plain old behavior), does get_user_pages() > > > ensure the requested pages fit in userspace region on all > > > architectures ? I think so. > > > > Yes, get_user_pages will return a smaller amount of pages than > requested > > if it encounters an unmapped region (or a region without write > > permissions for write requests). If this happens, the loop in > > ib_umem_get calls get_user_pages again with the next set of pages, and > > this time if it the first page still cannot be mapped an error is > returned. > > > > > > > > In ODP case, I'm not sure such check is ever done ? > > > > In ODP, we also call get_user_pages, but only when a page fault occurs > > (see ib_umem_odp_map_dma_pages()). This allows the user to pre- > register > > a memory region that contains unmapped virtual space, and then mmap > > different files into that area without needing to re-register. > > > > OK, thanks for the description. > > > > (Aside, does it take special mesure to protect shared mapping from > > > being read and/or *written* ?) > > > > I'm not sure I understand the question. Shared mappings that the > process > > is allowed to read or write are also allowed for the HCA > (specifically, > > to local and remote operations the same process performs using the > HCA), > > provided the application has registered their virtual address space as > a > > memory region. > > > > I was refering to description of get_user_pages(): > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/mm/g > up.c?id=v4.0-rc6#n765 > > * @force: whether to force access even when user mapping is currently > * protected (but never forces write access to shared mapping). > > But since ib_umem_odp_map_dma_pages() use get_user_pages() with force > argument set to 0, it's OK. > > Another related question: as the large memory range could be registered > by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND), > what's prevent the kernel to map a file as the result of mmap(0, ...) > in this region, making it available remotely through IBV_WR_RDMA_READ / > IBV_WR_RDMA_WRITE ? > This is not a bug. This is a feature. Exposing a file through RDMA, using ODP, can be done exactly like this. Given that the application explicitly requested this behavior, I don't see why it is a problem. Actually, some of our tests use such flows. The mmu notifiers mechanism allow us to do this safely. When the page is written back to disk, it is removed from the ODP mapping. When it is accessed by the HCA, it is brought back to RAM. Thanks, --Shachar
On Thursday, April 2, 2015 7:44 PM, Shachar Raindel wrote: >> -----Original Message----- >> From: Yann Droneaud [mailto:ydroneaud@opteya.com] >> Sent: Thursday, April 02, 2015 7:35 PM >> To: Haggai Eran >> Cc: Shachar Raindel; Sagi Grimberg; oss-security@lists.openwall.com; >> <linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org); linux- >> kernel@vger.kernel.org; stable@vger.kernel.org >> Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected >> physical memory access >> >> Hi Haggai, >> >> Le jeudi 02 avril 2015 à 18:18 +0300, Haggai Eran a écrit : >> > On 02/04/2015 16:30, Yann Droneaud wrote: >> >> Hi, >> >> >> >> Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit : >> >>>> -----Original Message----- >> >>>> From: Yann Droneaud [mailto:ydroneaud@opteya.com] >> >>>> Sent: Thursday, April 02, 2015 1:05 PM >> >>>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit : >> >> >> >>>>> + /* >> >>>>> + * If the combination of the addr and size requested for this >> >>>> memory >> >>>>> + * region causes an integer overflow, return error. >> >>>>> + */ >> >>>>> + if ((PAGE_ALIGN(addr + size) <= size) || >> >>>>> + (PAGE_ALIGN(addr + size) <= addr)) >> >>>>> + return ERR_PTR(-EINVAL); >> >>>>> + >> >>>> >> >>>> Can access_ok() be used here ? >> >>>> >> >>>> if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ, >> >>>> addr, size)) >> >>>> return ERR_PTR(-EINVAL); >> >>>> >> >>> >> >>> No, this will break the current ODP semantics. >> >>> >> >>> ODP allows the user to register memory that is not accessible yet. >> >>> This is a critical design feature, as it allows avoiding holding >> >>> a registration cache. Adding this check will break the behavior, >> >>> forcing memory to be all accessible when registering an ODP MR. >> >>> >> >> >> >> Where's the check for the range being in userspace memory space, >> >> especially for the ODP case ? >> >> >> >> For non ODP case (eg. plain old behavior), does get_user_pages() >> >> ensure the requested pages fit in userspace region on all >> >> architectures ? I think so. >> > >> > Yes, get_user_pages will return a smaller amount of pages than >> requested >> > if it encounters an unmapped region (or a region without write >> > permissions for write requests). If this happens, the loop in >> > ib_umem_get calls get_user_pages again with the next set of pages, and >> > this time if it the first page still cannot be mapped an error is >> returned. >> > >> >> >> >> In ODP case, I'm not sure such check is ever done ? >> > >> > In ODP, we also call get_user_pages, but only when a page fault occurs >> > (see ib_umem_odp_map_dma_pages()). This allows the user to pre- >> register >> > a memory region that contains unmapped virtual space, and then mmap >> > different files into that area without needing to re-register. >> > >> >> OK, thanks for the description. >> >> ... >> >> Another related question: as the large memory range could be registered >> by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND), >> what's prevent the kernel to map a file as the result of mmap(0, ...) >> in this region, making it available remotely through IBV_WR_RDMA_READ / >> IBV_WR_RDMA_WRITE ? >> > > This is not a bug. This is a feature. > > Exposing a file through RDMA, using ODP, can be done exactly like this. > Given that the application explicitly requested this behavior, I don't > see why it is a problem. Actually, some of our tests use such flows. > The mmu notifiers mechanism allow us to do this safely. When the page is > written back to disk, it is removed from the ODP mapping. When it is > accessed by the HCA, it is brought back to RAM. > I want to add that we would like to see users registering a very large memory region (perhaps the entire process address space) for local access, and then enabling remote access only to specific regions using memory windows. However, this isn't supported yet by our driver. Still, there are valid cases where you would still want the results of an mmap(0,...) call to be remotely accessible, in cases where there is enough trust between the local process and the remote process. It may help a middleware communication library register a large portion of the address space in advance, and still work with random pointers given to it by another application module. Regards, Haggai-- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Le jeudi 02 avril 2015 à 16:44 +0000, Shachar Raindel a écrit : > > -----Original Message----- > > From: Yann Droneaud [mailto:ydroneaud@opteya.com] > > Sent: Thursday, April 02, 2015 7:35 PM > > Another related question: as the large memory range could be registered > > by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND), > > what's prevent the kernel to map a file as the result of mmap(0, ...) > > in this region, making it available remotely through IBV_WR_RDMA_READ / > > IBV_WR_RDMA_WRITE ? > > > > This is not a bug. This is a feature. > > Exposing a file through RDMA, using ODP, can be done exactly like this. > Given that the application explicitly requested this behavior, I don't > see why it is a problem. If the application cannot choose what will end up in the region it has registered, it's an issue ! What might happen if one library in a program call mmap(0, size, ...) to load a file storing a secret (a private key), and that file ends up being mapped in an registered but otherwise free region (afaict, the kernel is allowed to do it) ? What might happen if one library in a program call call mmap(0, size, ..., MAP_ANONYMOUS,...) to allocate memory, call mlock(), then write in this location a secret (a passphrase), and that area ends up in the memory region registered for on demand paging ? The application haven't choose to disclose these confidential piece of information, but they are available for reading/writing by remote through RDMA given it knows the rkey of the memory region (which is a 32bits value). I hope I'm missing something, because I'm not feeling confident such behavor is a feature. > Actually, some of our tests use such flows. > The mmu notifiers mechanism allow us to do this safely. When the page is > written back to disk, it is removed from the ODP mapping. When it is > accessed by the HCA, it is brought back to RAM. > I'm not discussing about the benefit of On Demand Paging and why it's a very good feature to expose files through RDMA. I'm trying to understand how the application can choose what is exposed through RDMA if it registers a very large memory region for later use (but do not actually explicitly map something there yet): what's the consequences ? void *start = sbrk(0); size_t size = ULONG_MAX - (unsigned long)start; ibv_reg_mr(pd, start, size, IB_ACCESS_ON_DEMAND) Regards.
On Thursday, April 2, 2015 11:40 PM, Yann Droneaud <ydroneaud@opteya.com> wrote: > Le jeudi 02 avril 2015 à 16:44 +0000, Shachar Raindel a écrit : >> > -----Original Message----- >> > From: Yann Droneaud [mailto:ydroneaud@opteya.com] >> > Sent: Thursday, April 02, 2015 7:35 PM > >> > Another related question: as the large memory range could be registered >> > by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND), >> > what's prevent the kernel to map a file as the result of mmap(0, ...) >> > in this region, making it available remotely through IBV_WR_RDMA_READ / >> > IBV_WR_RDMA_WRITE ? >> > >> >> This is not a bug. This is a feature. >> >> Exposing a file through RDMA, using ODP, can be done exactly like this. >> Given that the application explicitly requested this behavior, I don't >> see why it is a problem. > > If the application cannot choose what will end up in the region it has > registered, it's an issue ! > > What might happen if one library in a program call mmap(0, size, ...) to > load a file storing a secret (a private key), and that file ends up > being mapped in an registered but otherwise free region (afaict, the > kernel is allowed to do it) ? > What might happen if one library in a program call call mmap(0, > size, ..., MAP_ANONYMOUS,...) to allocate memory, call mlock(), then > write in this location a secret (a passphrase), and that area ends up > in the memory region registered for on demand paging ? > > The application haven't choose to disclose these confidential piece of > information, but they are available for reading/writing by remote > through RDMA given it knows the rkey of the memory region (which is a > 32bits value). > > I hope I'm missing something, because I'm not feeling confident such > behavor is a feature. What we are aiming for is the possibility to register the entire process' address space for RDMA operations (if the process chooses to use this feature). This is similar to multiple threads accessing the same address space. I'm sure you wouldn't be complaining about the ability of one thread to access the secret passphrase mmapped by another thread in your example. > I'm trying to understand how the application can choose what is exposed > through RDMA if it registers a very large memory region for later use > (but do not actually explicitly map something there yet): what's the > consequences ? > > void *start = sbrk(0); > size_t size = ULONG_MAX - (unsigned long)start; > > ibv_reg_mr(pd, start, size, IB_ACCESS_ON_DEMAND) The consequences are exactly as you wrote. Just as giving a non-ODP rkey to a remote node allows the node to access the registered memory behind that rkey, giving an ODP rkey to a remote node allows that node to access the virtual address space behind that rkey. Regards, Haggai-- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Dominique, > -----Original Message----- > From: Dominique Martinet [mailto:dominique.martinet@cea.fr] > Sent: Thursday, April 02, 2015 8:44 PM > To: Shachar Raindel > Subject: Re: [oss-security] RE: CVE-2014-8159 kernel: infiniband: > uverbs: unprotected physical memory access > > Hi, > > Shachar Raindel wrote on Thu, Apr 02, 2015: > > From: Yann Droneaud [mailto:ydroneaud@opteya.com] > > > Another related question: as the large memory range could be > registered > > > by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND), > > > what's prevent the kernel to map a file as the result of mmap(0, > ...) > > > in this region, making it available remotely through > IBV_WR_RDMA_READ / > > > IBV_WR_RDMA_WRITE ? > > > > > > > This is not a bug. This is a feature. > > > > Exposing a file through RDMA, using ODP, can be done exactly like > this. > > Given that the application explicitly requested this behavior, I don't > > see why it is a problem. Actually, some of our tests use such flows. > > The mmu notifiers mechanism allow us to do this safely. When the page > is > > written back to disk, it is removed from the ODP mapping. When it is > > accessed by the HCA, it is brought back to RAM. > > Forgive the private reply, but I've actually tried that a while ago and No problem, better than e-mailing the entire world on RDMA specific topics. As this discussion is relevant to the Linux RDMA users, I'm adding back the linux-rdma mailing list. > couldn't get it to work - ibv_reg_mr would return EINVAL on an address > obtained by mmap. Were you mmaping a normal disk file, or was the mmap targeting an MMIO of another hardware device? mmap of a normal disk file should work also with normal memory registration, assuming you are providing the proper length. mmap of the MMIO area of another hardware device (i.e. interfacing an FPGA, NVRAM, or similar things) requires some code changes on both sides. The current kernel code in the infiniband side is using get_user_pages, which does not support MMIO pages. The proposed PeerDirect patches [1] allows peer device to declare ownership of virtual address ranges, and enable such registration. However, these patches are have not yet been merged upstream. > > Conceptually as well I'm not sure how it's supposed to work, mmap should > only actually issue reads when memory access issues page faults (give or > take suitable readahead logic), but I don't think direct memory access > from the IB/RDMA adapter would issue such page faults ? You are correct. RDMA adapters without ODP support do not issue page faults. Instead, during memory registration, the ib_umem code calls get_user_pages, which ensures all relevant pages are in memory, and pins them as needed. > Likewise on writes, would need the kernel to notice memory has been > written and pages are dirty/needs flushing. > Similarly, when deregistering a writable memory region, the kernel driver marks all pages as dirty before unpinning them. You can see the code doing this in [2]. > So, what am I missing? I'd wager it's that "ODP" you're talking about, > do you have any documentation I could skim through ? > Liran Liss gave a presentation about ODP at OFA [3]. The technology is available for ConnectIB devices using the most recent firmware and kernel versions above 3.19. Thanks, --Shachar [1] http://www.spinics.net/lists/linux-rdma/msg21770.html [2] http://lxr.free-electrons.com/source/drivers/infiniband/core/umem.c#L62 [3] https://www.openfabrics.org/images/Workshops_2014/DevWorkshop/presos/Tuesday/pdf/09.30_2014_OFA_Workshop_ODP_update_final.pdf and https://www.youtube.com/watch?v=KbrlsXQbHCw
Hi, Le vendredi 03 avril 2015 à 08:39 +0000, Haggai Eran a écrit : > On Thursday, April 2, 2015 11:40 PM, Yann Droneaud <ydroneaud@opteya.com> wrote: > > Le jeudi 02 avril 2015 à 16:44 +0000, Shachar Raindel a écrit : > >> > -----Original Message----- > >> > From: Yann Droneaud [mailto:ydroneaud@opteya.com] > >> > Sent: Thursday, April 02, 2015 7:35 PM > > > >> > Another related question: as the large memory range could be registered > >> > by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND), > >> > what's prevent the kernel to map a file as the result of mmap(0, ...) > >> > in this region, making it available remotely through IBV_WR_RDMA_READ / > >> > IBV_WR_RDMA_WRITE ? > >> > > >> > >> This is not a bug. This is a feature. > >> > >> Exposing a file through RDMA, using ODP, can be done exactly like this. > >> Given that the application explicitly requested this behavior, I don't > >> see why it is a problem. > > > > If the application cannot choose what will end up in the region it has > > registered, it's an issue ! > > > > What might happen if one library in a program call mmap(0, size, ...) to > > load a file storing a secret (a private key), and that file ends up > > being mapped in an registered but otherwise free region (afaict, the > > kernel is allowed to do it) ? > > What might happen if one library in a program call call mmap(0, > > size, ..., MAP_ANONYMOUS,...) to allocate memory, call mlock(), then > > write in this location a secret (a passphrase), and that area ends up > > in the memory region registered for on demand paging ? > > > > The application haven't choose to disclose these confidential piece of > > information, but they are available for reading/writing by remote > > through RDMA given it knows the rkey of the memory region (which is a > > 32bits value). > > > > I hope I'm missing something, because I'm not feeling confident such > > behavor is a feature. > > What we are aiming for is the possibility to register the entire process' address > space for RDMA operations (if the process chooses to use this feature). > This is similar to multiple threads accessing the same address space. I'm sure > you wouldn't be complaining about the ability of one thread to access the secret > passphrase mmapped by another thread in your example. > > > I'm trying to understand how the application can choose what is exposed > > through RDMA if it registers a very large memory region for later use > > (but do not actually explicitly map something there yet): what's the > > consequences ? > > > > void *start = sbrk(0); > > size_t size = ULONG_MAX - (unsigned long)start; > > > > ibv_reg_mr(pd, start, size, IB_ACCESS_ON_DEMAND) > > The consequences are exactly as you wrote. Just as giving a non-ODP rkey > to a remote node allows the node to access the registered memory behind that > rkey, giving an ODP rkey to a remote node allows that node to access the > virtual address space behind that rkey. > There's a difference: it's impossible to give a valid non-ODP rkey that point to a memory region not already mapped (backed by a file for example), so the application *choose* the content of the memory to be made accessible remotely before making it accessible. As I understand the last explanation regarding ODP, at creation time, an ODP rkey can point to a free, unused, unallocated memory portion. At this point the kernel can happily map anything the application (and its libraries) want to map at a (almost) *random* address that could be in (or partially in) the ODP memory region. And I have a problem with such random behavior. Allowing this is seems dangerous and should be done with care. I believe the application must kept the control of what's end up in its ODP registered memory region. Especially for multi thread program: imagine one thread creating a large memory region for its future purposes, then send the rkey to a remote peer and wait for some work to be done. In the mean time another call mmap(0, ...) to map a file at a kernel chosen address, and that address happen to be in the memory region registered by the other thread: 1) the first thread is amputated from a portion of memory it was willing to use; 2) the data used by the second thread is accessible to the remote peer(s) while not expected. Speculatively registering memory seems dangerous for any use case I could think of. Regards.
Hi, Shachar Raindel wrote on Fri, Apr 03, 2015 at 11:49:13AM +0000: > > couldn't get it to work - ibv_reg_mr would return EINVAL on an address > > obtained by mmap. > > Were you mmaping a normal disk file, or was the mmap targeting an MMIO of > another hardware device? mmap of a normal disk file should work also with > normal memory registration, assuming you are providing the proper length. On a proper file. It was a year or two ago, I actually tried again now and it does seem to work alright even on older kernels... I wonder what I did wrong back then! > mmap of the MMIO area of another hardware device (i.e. interfacing an FPGA, > NVRAM, or similar things) requires some code changes on both sides. The > current kernel code in the infiniband side is using get_user_pages, which > does not support MMIO pages. The proposed PeerDirect patches [1] allows peer > device to declare ownership of virtual address ranges, and enable such > registration. However, these patches are have not yet been merged upstream. Interesting, I don't need this right now but it's good to know. > > Conceptually as well I'm not sure how it's supposed to work, mmap should > > only actually issue reads when memory access issues page faults (give or > > take suitable readahead logic), but I don't think direct memory access > > from the IB/RDMA adapter would issue such page faults ? > > You are correct. RDMA adapters without ODP support do not issue page faults. > Instead, during memory registration, the ib_umem code calls get_user_pages, > which ensures all relevant pages are in memory, and pins them as needed. > > > Likewise on writes, would need the kernel to notice memory has been > > written and pages are dirty/needs flushing. > > > > Similarly, when deregistering a writable memory region, the kernel driver > marks all pages as dirty before unpinning them. You can see the code doing > this in [2]. Ok this makes sense, thanks for clearing it up. > Liran Liss gave a presentation about ODP at OFA [3]. The technology is > available for ConnectIB devices using the most recent firmware and kernel > versions above 3.19. I don't have any recent kernel around, but I'll give it a shot next week. (I'm working on a userspace file server, nfs-ganesha, so being able to mmap a file and use it directly for I/O is very promising for me!) > [1] http://www.spinics.net/lists/linux-rdma/msg21770.html > [2] http://lxr.free-electrons.com/source/drivers/infiniband/core/umem.c#L62 > [3] https://www.openfabrics.org/images/Workshops_2014/DevWorkshop/presos/Tuesday/pdf/09.30_2014_OFA_Workshop_ODP_update_final.pdf and https://www.youtube.com/watch?v=KbrlsXQbHCw Thanks for the detailed reply, I'll dig a bit further and come back straight to the list if need to.
Hi, Le jeudi 02 avril 2015 à 18:12 +0000, Haggai Eran a écrit : > On Thursday, April 2, 2015 7:44 PM, Shachar Raindel wrote: > >> -----Original Message----- > >> From: Yann Droneaud [mailto:ydroneaud@opteya.com] > >> Le jeudi 02 avril 2015 à 18:18 +0300, Haggai Eran a écrit : > >> > On 02/04/2015 16:30, Yann Droneaud wrote: > >> >> Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit : > >> >>>> -----Original Message----- > >> >>>> From: Yann Droneaud [mailto:ydroneaud@opteya.com] > >> >>>> Sent: Thursday, April 02, 2015 1:05 PM > >> >>>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit : > >> >> > >> >>>>> + /* > >> >>>>> + * If the combination of the addr and size requested for this > >> >>>> memory > >> >>>>> + * region causes an integer overflow, return error. > >> >>>>> + */ > >> >>>>> + if ((PAGE_ALIGN(addr + size) <= size) || > >> >>>>> + (PAGE_ALIGN(addr + size) <= addr)) > >> >>>>> + return ERR_PTR(-EINVAL); > >> >>>>> + > >> >>>> > >> >>>> Can access_ok() be used here ? > >> >>>> > >> >>>> if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ, > >> >>>> addr, size)) > >> >>>> return ERR_PTR(-EINVAL); > >> >>>> > >> >>> > >> >>> No, this will break the current ODP semantics. > >> >>> > >> >>> ODP allows the user to register memory that is not accessible yet. > >> >>> This is a critical design feature, as it allows avoiding holding > >> >>> a registration cache. Adding this check will break the behavior, > >> >>> forcing memory to be all accessible when registering an ODP MR. > >> >>> > >> >> > >> >> Where's the check for the range being in userspace memory space, > >> >> especially for the ODP case ? > >> >> > >> >> For non ODP case (eg. plain old behavior), does get_user_pages() > >> >> ensure the requested pages fit in userspace region on all > >> >> architectures ? I think so. > >> > > >> > Yes, get_user_pages will return a smaller amount of pages than > >> requested > >> > if it encounters an unmapped region (or a region without write > >> > permissions for write requests). If this happens, the loop in > >> > ib_umem_get calls get_user_pages again with the next set of pages, and > >> > this time if it the first page still cannot be mapped an error is > >> returned. > >> > > >> >> > >> >> In ODP case, I'm not sure such check is ever done ? > >> > > >> > In ODP, we also call get_user_pages, but only when a page fault occurs > >> > (see ib_umem_odp_map_dma_pages()). This allows the user to pre- > >> register > >> > a memory region that contains unmapped virtual space, and then mmap > >> > different files into that area without needing to re-register. > >> > > >> > >> OK, thanks for the description. > >> > >> ... > >> > >> Another related question: as the large memory range could be registered > >> by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND), > >> what's prevent the kernel to map a file as the result of mmap(0, ...) > >> in this region, making it available remotely through IBV_WR_RDMA_READ / > >> IBV_WR_RDMA_WRITE ? > >> > > > > This is not a bug. This is a feature. > > > > Exposing a file through RDMA, using ODP, can be done exactly like this. > > Given that the application explicitly requested this behavior, I don't > > see why it is a problem. Actually, some of our tests use such flows. > > The mmu notifiers mechanism allow us to do this safely. When the page is > > written back to disk, it is removed from the ODP mapping. When it is > > accessed by the HCA, it is brought back to RAM. > > > > I want to add that we would like to see users registering a very large > memory region (perhaps the entire process address space) for local > access, and then enabling remote access only to specific regions using > memory windows. However, this isn't supported yet by our driver. In such scheme, the registration must still be handled "manually": one has to create a memory window to get a rkey to be exchanged with a peer, so why one would want to register such a large memory region (the whole process space) ? I guess creating the memory window is faster than registering memory region. I'd rather say this is not an excuse to register a larger memory region (up to the whole process space, current and future) as it sounds like a surprising optimisation: let the HCA known too many pages just to be sure it already knows some when the process want to use them. It seems it would become difficult to handle if there's too many processes. I would prefer creating memory region becoming costless (through ODP :). > Still, there are valid cases where you would still want the results > of an mmap(0,...) call to be remotely accessible, in cases where there > is enough trust between the local process and the remote process. mmap(0, ...., fd) let the kernel choose where to put the file in process virtual memory space, so it may, may not, partially, end up in an ODP pre registered memory region for a range unallocated/unused yet. I don't think one want such to happen. > It may help a middleware communication library register a large > portion of the address space in advance, and still work with random > pointers given to it by another application module. > But as said in the beginnig of your message, the middleware would have bind a memory window before posting work request / exposing rkey for the "random pointers". So I fail to understand how could be used ODP when it comes to registering a memory region not yet backed by something. Regards.
On 13/04/2015 16:29, Yann Droneaud wrote: > Le jeudi 02 avril 2015 à 18:12 +0000, Haggai Eran a écrit : ... >> >> I want to add that we would like to see users registering a very large >> memory region (perhaps the entire process address space) for local >> access, and then enabling remote access only to specific regions using >> memory windows. However, this isn't supported yet by our driver. > > In such scheme, the registration must still be handled "manually": > one has to create a memory window to get a rkey to be exchanged with > a peer, so why one would want to register such a large memory region > (the whole process space) ? > > I guess creating the memory window is faster than registering memory > region. Right. It takes time to create and fill the hardware's page tables. Using memory windows allows you to reuse the work done previously, while still having a more granular control over the RDMA permissions. The larger MR can be created with only local permissions, and the memory window can add specific remote permissions to a smaller range. The memory window utilizes the page tables created for the memory region. > I'd rather say this is not an excuse to register a larger > memory region (up to the whole process space, current and future) as it > sounds like a surprising optimisation: let the HCA known too many > pages just to be sure it already knows some when the process want to > use them. It seems it would become difficult to handle if there's too > many processes. Are you worried about pinning too many pages? That is an issue we want to solve with ODP :) > > I would prefer creating memory region becoming costless (through ODP :). I agree :) > >> Still, there are valid cases where you would still want the results >> of an mmap(0,...) call to be remotely accessible, in cases where there >> is enough trust between the local process and the remote process. > > mmap(0, ...., fd) let the kernel choose where to put the file in > process virtual memory space, so it may, may not, partially, end up in > an ODP pre registered memory region for a range unallocated/unused yet. > > I don't think one want such to happen. I think that in some cases the benefit of allowing this outweigh the risks. This is why it is an opt-in feature. > >> It may help a middleware communication library register a large >> portion of the address space in advance, and still work with random >> pointers given to it by another application module. >> > > But as said in the beginnig of your message, the middleware would have > bind a memory window before posting work request / exposing rkey for > the "random pointers". > > So I fail to understand how could be used ODP when it comes to > registering a memory region not yet backed by something. In this scenario, the middleware would first register the full address space as an ODP memory region with local permissions only. When it wants to provide remote access to some buffer, it would bind a memory window over the ODP MR. This is possible with multiple processes because it uses the virtual memory system without pinning. It won't cause random mmap regions to be mapped for RDMA without the specific intent of the application. However, we currently don't have support for memory windows over ODP MRs. Even if we did, there is some performance penalty due to binding and invalidating memory windows. Some applications will still need full process address space access for RDMA. Regards, Haggai -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index aec7a6a..8c014b5 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -99,6 +99,14 @@ if (dmasync) dma_set_attr(DMA_ATTR_WRITE_BARRIER, &attrs); + /* + * If the combination of the addr and size requested for this memory + * region causes an integer overflow, return error. + */ + if ((PAGE_ALIGN(addr + size) <= size) || + (PAGE_ALIGN(addr + size) <= addr)) + return ERR_PTR(-EINVAL); + if (!can_do_mlock()) return ERR_PTR(-EPERM);