Message ID | 49D9A9D6.10003@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 06 Apr 2009 12:35:58 +0530 Suresh Jayaraman <sjayaraman@suse.de> wrote: > The upstream commit b363b3304bcf68c4541683b2eff70b29f0446a5b attempted > to fix memory overwrite during tree connect response processing while > mounting. However, the memory allocated may still be insufficient as > UTF-8 string can be upto 4X times as UCS. So, would it be safe to > allocate memory that is 4X instead of 2X? > > Noticed by Marcus Meissner <meissner@suse.de>. > > Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> > --- > fs/cifs/connect.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c > index 0de3b56..b361be0 100644 > --- a/fs/cifs/connect.c > +++ b/fs/cifs/connect.c > @@ -3674,7 +3674,7 @@ CIFSTCon(unsigned int xid, struct cifsSesInfo *ses, > BCC(smb_buffer_response)) { > kfree(tcon->nativeFileSystem); > tcon->nativeFileSystem = > - kzalloc(2*(length + 1), GFP_KERNEL); > + kzalloc((4 * length) + 2, GFP_KERNEL); > if (tcon->nativeFileSystem) > cifs_strfromUCS_le( > tcon->nativeFileSystem, Wait...is this even enough? It looks like nls.h defines this: /* this value hold the maximum octet of charset */ #define NLS_MAX_CHARSET_SIZE 6 /* for UTF-8 */ ...it really looks like this needs to use the same constant. There are other places in this code that make this sort of allocation. Could you audit and fix them too? A better solution is really needed here. A helper function that basically does the allocation and buffer-length limited conversion would be ideal. We have some functions that sort of do this, but none of them seem to be quite right. Maybe the best thing is just to fix cifs_strncpy_to_host() so that it's right and fix most of the places that do this allocation manually to do it using that function instead.
Jeff Layton wrote: > On Mon, 06 Apr 2009 12:35:58 +0530 > Suresh Jayaraman <sjayaraman@suse.de> wrote: >> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c >> index 0de3b56..b361be0 100644 >> --- a/fs/cifs/connect.c >> +++ b/fs/cifs/connect.c >> @@ -3674,7 +3674,7 @@ CIFSTCon(unsigned int xid, struct cifsSesInfo *ses, >> BCC(smb_buffer_response)) { >> kfree(tcon->nativeFileSystem); >> tcon->nativeFileSystem = >> - kzalloc(2*(length + 1), GFP_KERNEL); >> + kzalloc((4 * length) + 2, GFP_KERNEL); >> if (tcon->nativeFileSystem) >> cifs_strfromUCS_le( >> tcon->nativeFileSystem, > > Wait...is this even enough? It looks like nls.h defines this: > > /* this value hold the maximum octet of charset */ > #define NLS_MAX_CHARSET_SIZE 6 /* for UTF-8 */ > > ...it really looks like this needs to use the same constant. True. Seems I was influenced by a comment in fs/cifs/sess.c 313 /* UTF-8 string will not grow more than four times as big as UCS-16 */ > There are other places in this code that make this sort of allocation. > Could you audit and fix them too? A better solution is really needed > here. Yeah, there are other files (for e.g sess.c) as well. Looks like we intentionally limit maxlen to 512, may be we want to limit in NL case too? Wondering what could be the rationale behind this.. > A helper function that basically does the allocation and buffer-length > limited conversion would be ideal. We have some functions that sort of > do this, but none of them seem to be quite right. Maybe the best thing > is just to fix cifs_strncpy_to_host() so that it's right and fix most > of the places that do this allocation manually to do it using that > function instead. > I agree. This seems the right thing to me, too. I'll try to fix and resend this patch. Thanks,
On Mon, 06 Apr 2009 18:20:21 +0530 Suresh Jayaraman <sjayaraman@suse.de> wrote: > Jeff Layton wrote: > > On Mon, 06 Apr 2009 12:35:58 +0530 > > Suresh Jayaraman <sjayaraman@suse.de> wrote: > > >> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c > >> index 0de3b56..b361be0 100644 > >> --- a/fs/cifs/connect.c > >> +++ b/fs/cifs/connect.c > >> @@ -3674,7 +3674,7 @@ CIFSTCon(unsigned int xid, struct cifsSesInfo *ses, > >> BCC(smb_buffer_response)) { > >> kfree(tcon->nativeFileSystem); > >> tcon->nativeFileSystem = > >> - kzalloc(2*(length + 1), GFP_KERNEL); > >> + kzalloc((4 * length) + 2, GFP_KERNEL); > >> if (tcon->nativeFileSystem) > >> cifs_strfromUCS_le( > >> tcon->nativeFileSystem, > > > > Wait...is this even enough? It looks like nls.h defines this: > > > > /* this value hold the maximum octet of charset */ > > #define NLS_MAX_CHARSET_SIZE 6 /* for UTF-8 */ > > > > ...it really looks like this needs to use the same constant. > > True. Seems I was influenced by a comment in fs/cifs/sess.c > > 313 /* UTF-8 string will not grow more than four times as big as > UCS-16 */ > That looks like that's wrong (or at least potentially so). AFAICT, UTF-8 allows up to 6 bytes per character. I suppose that it's possible that none of the characters allowed in UCS-16 will ever translate to a character that's more than 4 bytes, but I'd like to see that confirmed before we depend on it. > > There are other places in this code that make this sort of allocation. > > Could you audit and fix them too? A better solution is really needed > > here. > > Yeah, there are other files (for e.g sess.c) as well. Looks like we > intentionally limit maxlen to 512, may be we want to limit in NL case > too? Wondering what could be the rationale behind this.. > Good Q. This is the problem with sprinkling "magic numbers" around the code. > > A helper function that basically does the allocation and buffer-length > > limited conversion would be ideal. We have some functions that sort of > > do this, but none of them seem to be quite right. Maybe the best thing > > is just to fix cifs_strncpy_to_host() so that it's right and fix most > > of the places that do this allocation manually to do it using that > > function instead. > > > > I agree. This seems the right thing to me, too. I'll try to fix and > resend this patch. > > Excellent. I look forward to seeing it.
On Mon, 2009-04-06 at 09:22 -0400, Jeff Layton wrote: > > True. Seems I was influenced by a comment in fs/cifs/sess.c > > > > 313 /* UTF-8 string will not grow more than four times as > big as > > UCS-16 */ > > > > That looks like that's wrong (or at least potentially so). AFAICT, > UTF-8 > allows up to 6 bytes per character. I suppose that it's possible that > none of the characters allowed in UCS-16 will ever translate to a > character that's more than 4 bytes, but I'd like to see that confirmed > before we depend on it. I wonder whats UCS-16 tho, UCS-16 does not exist :) It may be either UTF16 or UCS2. Both these charsets have a base length of 2 bytes per character. UCS2 is limited to 65535 values, while UTF-16 is a multi-word charset. If the comment above is to be read as "UTF-8 string will not grow more than four times as big as UCS-2/UTF-16" then what it is saying is that at maximum an UTF-8 chars can be 4 words (or 8 bytes long). IIRC UTF-8 chars length is maximum 6 bytes, so an 8 byte per char max estimate seem correct. If "length" in the code was the length in bytes, and not the number of characters, of an UCS-2/UTF-16 string than 4*length should, indeed be long enough. Simo.
On Mon, 06 Apr 2009 14:02:25 +0000 simo <idra@samba.org> wrote: > On Mon, 2009-04-06 at 09:22 -0400, Jeff Layton wrote: > > > True. Seems I was influenced by a comment in fs/cifs/sess.c > > > > > > 313 /* UTF-8 string will not grow more than four times as > > big as > > > UCS-16 */ > > > > > > > That looks like that's wrong (or at least potentially so). AFAICT, > > UTF-8 > > allows up to 6 bytes per character. I suppose that it's possible that > > none of the characters allowed in UCS-16 will ever translate to a > > character that's more than 4 bytes, but I'd like to see that confirmed > > before we depend on it. > > I wonder whats UCS-16 tho, UCS-16 does not exist :) > > It may be either UTF16 or UCS2. > Both these charsets have a base length of 2 bytes per character. UCS2 is > limited to 65535 values, while UTF-16 is a multi-word charset. > > If the comment above is to be read as "UTF-8 string will not grow more > than four times as big as UCS-2/UTF-16" then what it is saying is that > at maximum an UTF-8 chars can be 4 words (or 8 bytes long). > IIRC UTF-8 chars length is maximum 6 bytes, so an 8 byte per char max > estimate seem correct. > > If "length" in the code was the length in bytes, and not the number of > characters, of an UCS-2/UTF-16 string than 4*length should, indeed be > long enough. > Ahh, you're correct. I guess I'm accustomed to thinking about lengths in bytes. I guess though this means that 4*length is allocating too much...wouldn't 3*length then be right (assuming NULL termination is also accounted for) ? Regardless of the math, I'd like to see all of this moved into some nice, well commented helper functions instead of being open-coded all over the place. It's just too easy to get this stuff wrong. Let's solve this in a way that makes it easier in the future.
On Mon, 2009-04-06 at 10:08 -0400, Jeff Layton wrote: > On Mon, 06 Apr 2009 14:02:25 +0000 > simo <idra@samba.org> wrote: > > > On Mon, 2009-04-06 at 09:22 -0400, Jeff Layton wrote: > > > > True. Seems I was influenced by a comment in fs/cifs/sess.c > > > > > > > > 313 /* UTF-8 string will not grow more than four times as > > > big as > > > > UCS-16 */ > > > > > > > > > > That looks like that's wrong (or at least potentially so). AFAICT, > > > UTF-8 > > > allows up to 6 bytes per character. I suppose that it's possible that > > > none of the characters allowed in UCS-16 will ever translate to a > > > character that's more than 4 bytes, but I'd like to see that confirmed > > > before we depend on it. > > > > I wonder whats UCS-16 tho, UCS-16 does not exist :) > > > > It may be either UTF16 or UCS2. > > Both these charsets have a base length of 2 bytes per character. UCS2 is > > limited to 65535 values, while UTF-16 is a multi-word charset. > > > > If the comment above is to be read as "UTF-8 string will not grow more > > than four times as big as UCS-2/UTF-16" then what it is saying is that > > at maximum an UTF-8 chars can be 4 words (or 8 bytes long). > > IIRC UTF-8 chars length is maximum 6 bytes, so an 8 byte per char max > > estimate seem correct. > > > > If "length" in the code was the length in bytes, and not the number of > > characters, of an UCS-2/UTF-16 string than 4*length should, indeed be > > long enough. > > > > Ahh, you're correct. I guess I'm accustomed to thinking about lengths in > bytes. I guess though this means that 4*length is allocating too > much...wouldn't 3*length then be right (assuming NULL termination is > also accounted for) ? > > Regardless of the math, I'd like to see all of this moved into some > nice, well commented helper functions instead of being open-coded all > over the place. It's just too easy to get this stuff wrong. Let's solve > this in a way that makes it easier in the future. Yes, this is the way to go. Simo.
I don't think that we should be using these size assumptions (multiples of UCS stringlen). A new UCS helper function should be created that calculates how much memory would be needed for a converted string - and we need to use this before we do the malloc and string conversion. In effect a strlen and strnlen function that takes a target code page argument. For strings that will never be more than a hundred bytes this may not be needed, and we can use the length assumption, but since mallocs in kernel can be so expensive I would rather calculate the actual string length needed for the target. On Mon, Apr 6, 2009 at 9:30 AM, simo <idra@samba.org> wrote: > On Mon, 2009-04-06 at 10:08 -0400, Jeff Layton wrote: >> On Mon, 06 Apr 2009 14:02:25 +0000 >> simo <idra@samba.org> wrote: >> >> > On Mon, 2009-04-06 at 09:22 -0400, Jeff Layton wrote: >> > > > True. Seems I was influenced by a comment in fs/cifs/sess.c >> > > > >> > > > 313 Â Â Â Â /* UTF-8 string will not grow more than four times as >> > > big as >> > > > UCS-16 */ >> > > > >> > > >> > > That looks like that's wrong (or at least potentially so). AFAICT, >> > > UTF-8 >> > > allows up to 6 bytes per character. I suppose that it's possible that >> > > none of the characters allowed in UCS-16 will ever translate to a >> > > character that's more than 4 bytes, but I'd like to see that confirmed >> > > before we depend on it. >> > >> > I wonder whats UCS-16 tho, UCS-16 does not exist :) >> > >> > It may be either UTF16 or UCS2. >> > Both these charsets have a base length of 2 bytes per character. UCS2 is >> > limited to 65535 values, while UTF-16 is a multi-word charset. >> > >> > If the comment above is to be read as "UTF-8 string will not grow more >> > than four times as big as UCS-2/UTF-16" then what it is saying is that >> > at maximum an UTF-8 chars can be 4 words (or 8 bytes long). >> > IIRC UTF-8 chars length is maximum 6 bytes, so an 8 byte per char max >> > estimate seem correct. >> > >> > If "length" in the code was the length in bytes, and not the number of >> > characters, of an UCS-2/UTF-16 string than 4*length should, indeed be >> > long enough. >> > >> >> Ahh, you're correct. I guess I'm accustomed to thinking about lengths in >> bytes. I guess though this means that 4*length is allocating too >> much...wouldn't 3*length then be right (assuming NULL termination is >> also accounted for) ? >> >> Regardless of the math, I'd like to see all of this moved into some >> nice, well commented helper functions instead of being open-coded all >> over the place. It's just too easy to get this stuff wrong. Let's solve >> this in a way that makes it easier in the future. > > Yes, this is the way to go. > > Simo. > > -- > Simo Sorce > Samba Team GPL Compliance Officer <simo@samba.org> > Principal Software Engineer at Red Hat, Inc. <simo@redhat.com> > >
Steve French wrote: > I don't think that we should be using these size assumptions > (multiples of UCS stringlen). A new UCS helper function should be > created that calculates how much memory would be needed for a > converted string - and we need to use this before we do the malloc and > string conversion. In effect a strlen and strnlen function that takes > a target code page argument. For strings that will never be more than > a hundred bytes this may not be needed, and we can use the length > assumption, but since mallocs in kernel can be so expensive I would > rather calculate the actual string length needed for the target. Ah, ok. I thought of writing a little function based on cifs_strncpy_to_host() and adding a comment like below: /* UniStrnlen() returns length in 16 bit Unicode characters * (UCS-2) with base length of 2 bytes per character. An UTF-8 * character can be up to 8 bytes maximum, so we need to * allocate (len/2) * 4 bytes (or) (4 * len) bytes for the * UTF-8 string */ and make the callers use this. May be Steve's idea is a better one. However, I could find any utility function inside kernel source that calculates length of a UTF-8 string. It's not clear how can I find the length of a UTF-8 string. Thanks, > On Mon, Apr 6, 2009 at 9:30 AM, simo <idra@samba.org> wrote: >> On Mon, 2009-04-06 at 10:08 -0400, Jeff Layton wrote: >>> On Mon, 06 Apr 2009 14:02:25 +0000 >>> simo <idra@samba.org> wrote: >>> >>>> On Mon, 2009-04-06 at 09:22 -0400, Jeff Layton wrote: >>>>>> True. Seems I was influenced by a comment in fs/cifs/sess.c >>>>>> >>>>>> 313 � � � � /* UTF-8 string will not grow more than four times as >>>>> big as >>>>>> UCS-16 */ >>>>>> >>>>> That looks like that's wrong (or at least potentially so). AFAICT, >>>>> UTF-8 >>>>> allows up to 6 bytes per character. I suppose that it's possible that >>>>> none of the characters allowed in UCS-16 will ever translate to a >>>>> character that's more than 4 bytes, but I'd like to see that confirmed >>>>> before we depend on it. >>>> I wonder whats UCS-16 tho, UCS-16 does not exist :) >>>> >>>> It may be either UTF16 or UCS2. >>>> Both these charsets have a base length of 2 bytes per character. UCS2 is >>>> limited to 65535 values, while UTF-16 is a multi-word charset. >>>> >>>> If the comment above is to be read as "UTF-8 string will not grow more >>>> than four times as big as UCS-2/UTF-16" then what it is saying is that >>>> at maximum an UTF-8 chars can be 4 words (or 8 bytes long). >>>> IIRC UTF-8 chars length is maximum 6 bytes, so an 8 byte per char max >>>> estimate seem correct. >>>> >>>> If "length" in the code was the length in bytes, and not the number of >>>> characters, of an UCS-2/UTF-16 string than 4*length should, indeed be >>>> long enough. >>>> >>> Ahh, you're correct. I guess I'm accustomed to thinking about lengths in >>> bytes. I guess though this means that 4*length is allocating too >>> much...wouldn't 3*length then be right (assuming NULL termination is >>> also accounted for) ? >>> >>> Regardless of the math, I'd like to see all of this moved into some >>> nice, well commented helper functions instead of being open-coded all >>> over the place. It's just too easy to get this stuff wrong. Let's solve >>> this in a way that makes it easier in the future. >> Yes, this is the way to go. >>
On Mon, 06 Apr 2009 22:33:09 +0530 Suresh Jayaraman <sjayaraman@suse.de> wrote: > Steve French wrote: > > I don't think that we should be using these size assumptions > > (multiples of UCS stringlen). A new UCS helper function should be > > created that calculates how much memory would be needed for a > > converted string - and we need to use this before we do the malloc and > > string conversion. In effect a strlen and strnlen function that takes > > a target code page argument. For strings that will never be more than > > a hundred bytes this may not be needed, and we can use the length > > assumption, but since mallocs in kernel can be so expensive I would > > rather calculate the actual string length needed for the target. > > Ah, ok. I thought of writing a little function based on > cifs_strncpy_to_host() and adding a comment like below: > > /* UniStrnlen() returns length in 16 bit Unicode characters > * (UCS-2) with base length of 2 bytes per character. An UTF-8 > * character can be up to 8 bytes maximum, so we need to > * allocate (len/2) * 4 bytes (or) (4 * len) bytes for the > * UTF-8 string */ > > and make the callers use this. May be Steve's idea is a better one. > However, I could find any utility function inside kernel source that > calculates length of a UTF-8 string. It's not clear how can I find the > length of a UTF-8 string. > > Thanks, > I think you'll have to basically do the conversion twice. Walk the string once and convert each character determine its length and then discard it. Get the total and allocate that many bytes (plus the null termination), and do the conversion again into the buffer. Another option might be to have a static buffer already allocated somewhere (maybe even a whole page?) and do the conversion into that while keeping a count of bytes. Then memcpy it into the destination buffer after you allocate it. I'm not truly convinced this is really necessary though. You have to figure that kmalloc is a power-of-two allocator. If you kmalloc 17 bytes, you get 32 anyway. You'll probably end up using roughly the same amount of memory that you would have had you just estimated the size.
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 0de3b56..b361be0 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -3674,7 +3674,7 @@ CIFSTCon(unsigned int xid, struct cifsSesInfo *ses, BCC(smb_buffer_response)) { kfree(tcon->nativeFileSystem); tcon->nativeFileSystem = - kzalloc(2*(length + 1), GFP_KERNEL); + kzalloc((4 * length) + 2, GFP_KERNEL); if (tcon->nativeFileSystem) cifs_strfromUCS_le( tcon->nativeFileSystem,
The upstream commit b363b3304bcf68c4541683b2eff70b29f0446a5b attempted to fix memory overwrite during tree connect response processing while mounting. However, the memory allocated may still be insufficient as UTF-8 string can be upto 4X times as UCS. So, would it be safe to allocate memory that is 4X instead of 2X? Noticed by Marcus Meissner <meissner@suse.de>. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> --- fs/cifs/connect.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)