Message ID | 49E1B0D5.7010600@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, 12 Apr 2009 14:43:57 +0530 Suresh Jayaraman <sjayaraman@suse.de> wrote: > Based on the recent discussions on opencoded, inconsistent allocation of > memory needed for a unicode string conversion, the consensus is to > calculate the memory needed exactly instead of using size assumptions > and to consolidate some code that allocates memory and does conversion > in a helper function and use it widely. > > This patch attempts to do the same. Please note that non-unicode cases > has not been converted now as they are less error-prone than unicode > cases. I've only compile tested this patch and yet to test it > completely, but posting this little premature patch as there are lot > of discussions happening around this code. This patch requires careful > review. > > Thanks to Jeff, Steve and Simo for useful suggestions. > > Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> > --- > > fs/cifs/cifs_unicode.h | 22 ++++++++++++++++++++ > fs/cifs/cifsproto.h | 2 + > fs/cifs/cifssmb.c | 38 +++++++++++++++++++++++++++++++++++ > fs/cifs/connect.c | 51 +++++++++++++++-------------------------------- > fs/cifs/sess.c | 36 +++++++++++---------------------- > 5 files changed, 90 insertions(+), 59 deletions(-) > > diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h > index 14eb9a2..43c0438 100644 > --- a/fs/cifs/cifs_unicode.h > +++ b/fs/cifs/cifs_unicode.h > @@ -34,6 +34,7 @@ > #include <asm/byteorder.h> > #include <linux/types.h> > #include <linux/nls.h> > +#include "cifspdu.h" > > #define UNIUPR_NOLOWER /* Example to not expand lower case tables */ > > @@ -63,6 +64,7 @@ int cifs_strfromUCS_le(char *, const __le16 *, int, const struct nls_table *); > int cifs_strtoUCS(__le16 *, const char *, int, const struct nls_table *); > #endif > > +#define UNISTR_MAX_SIZE (MAX_PATHCONF * (NLS_MAX_CHARSET_SIZE + 1)) > /* > * UniStrcat: Concatenate the second string to the first > * > @@ -159,6 +161,26 @@ UniStrnlen(const wchar_t *ucs1, int maxlen) > } > > /* > + * UniStrnlenBytes: Return the length in bytes > + */ > +static inline int > +UniStrnlenBytes(const char *str, int maxlen, const struct nls_table *codepage) > +{ > + int rc; > + wchar_t uni; > + size_t nbytes = 0; > + char buf[UNISTR_MAX_SIZE]; > + > + while (*str++) { > + rc = codepage->char2uni(str, maxlen, &uni); Shouldn't this be uni2char? We're converting from unicode to the local charset, correct? That's what cifs_strfromUCS_le is using so it makes sense that we should be doing the same to estimate the memory that it will use. > + if (rc == -1) > + return -EINVAL; cifs_strfromUCS_le() doesn't return an error if the char can't be converted. It just turns it into a '?' and moves on. We probably want to do the same here. > + nbytes += utf8_wcstombs(buf, &uni, maxlen); I don't think this is needed. I think uni2char function returns the length of the character in bytes. You may be able to just tally up the return values from uni2char. Also, you can't assume that 'uni' will be a NULL terminated string. It's just a wchar_t on the stack. Doing the above may make that function walk off the end of the stack. > + } > + return nbytes; > +} > + > +/* > * UniStrncat: Concatenate length limited string > */ > static inline wchar_t * > diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h > index 4167716..d154ed1 100644 > --- a/fs/cifs/cifsproto.h > +++ b/fs/cifs/cifsproto.h > @@ -383,4 +383,6 @@ extern int CIFSSMBSetPosixACL(const int xid, struct cifsTconInfo *tcon, > const struct nls_table *nls_codepage, int remap_special_chars); > extern int CIFSGetExtAttr(const int xid, struct cifsTconInfo *tcon, > const int netfid, __u64 *pExtAttrBits, __u64 *pMask); > +extern int cifs_nls_to_str(char **dst, const char *src, const int maxlen, > + int *plen, const struct nls_table *nls_codepage); > #endif /* _CIFSPROTO_H */ > diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c > index bc09c99..963f4f9 100644 > --- a/fs/cifs/cifssmb.c > +++ b/fs/cifs/cifssmb.c > @@ -81,6 +81,44 @@ static struct { > #endif /* CONFIG_CIFS_WEAK_PW_HASH */ > #endif /* CIFS_POSIX */ > > + > +/* > + * Calculates, allocates memory for nls and converts it to character string. > + * Note: caller is responsible for freeing dst if function returned 0. > + * returns: > + * on success - 0 > + * on failure - errno > + */ > +int > +cifs_nls_to_str(char **dst, const char *src, const int maxlen, int *plen, > + const struct nls_table *nls_codepage) > +{ > + size_t nbytes; > + > + *plen = UniStrnlen((wchar_t *)src, maxlen); > + nbytes = UniStrnlenBytes(src, maxlen, nls_codepage); It would be nice not to have to walk the string twice like this. Maybe we need a combined function that returns both the length in bytes and in characters and only walks the string once? I'm not too worried about performance with the current users, but we need to consider it in case we want to use these routines in the future for more performance-sensitive purposes (for instance, maybe with readdir?) > + > + /* We look for obvious messed up bcc or strings in response so we do > + * not go off the end since (at least) WIN2K and * Windows XP have a > + * major bug in not null terminating last Unicode string in response > + */ > + kfree(*dst); Are you sure this is going to be a valid or null pointer that gets passed in here? Even if that's the case now, it seems reasonable that at some point in the future someone could just declare a pointer on the stack and pass it into this function uninitialized. That could be bad. I'd recommend removing this line and expect the caller to free memory at this pointer beforehand if it's needed. If you don't want to do that, then I think you need to make it clear in the comments that dst needs to either be a pointer to freeable memory or NULL. > + > + *dst = kzalloc(nbytes + 2, GFP_KERNEL); > + if (!*dst) > + goto err_exit; > + > + cifs_strfromUCS_le(*dst, (__le16 *)src, *plen, nls_codepage); > + (*dst)[*plen] = 0; > + (*dst)[*plen + 1] = 0; /* harmless for ASCII case, needed for Unicode */ > + > + return 0; > + > +err_exit: > + cERROR(1, ("Failed to allocate buffer for string\n")); > + return -ENOMEM; > +} > + > /* Allocates buffer into dst and copies smb string from src to it. > * caller is responsible for freeing dst if function returned 0. > * returns: > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c > index 0de3b56..1880537 100644 > --- a/fs/cifs/connect.c > +++ b/fs/cifs/connect.c > @@ -2667,36 +2667,23 @@ CIFSSessSetup(unsigned int xid, struct cifsSesInfo *ses, > remaining_words = > BCC(smb_buffer_response) / 2; > } > - len = > - UniStrnlen((wchar_t *) bcc_ptr, > - remaining_words - 1); > -/* We look for obvious messed up bcc or strings in response so we do not go off > - the end since (at least) WIN2K and Windows XP have a major bug in not null > - terminating last Unicode string in response */ > - if (ses->serverOS) > - kfree(ses->serverOS); > - ses->serverOS = kzalloc(2 * (len + 1), > - GFP_KERNEL); > - if (ses->serverOS == NULL) > + rc = cifs_nls_to_str(&(ses->serverOS), > + bcc_ptr, remaining_words - 1, > + &len, nls_codepage); > + if (rc) > goto sesssetup_nomem; > - cifs_strfromUCS_le(ses->serverOS, > - (__le16 *)bcc_ptr, > - len, nls_codepage); > bcc_ptr += 2 * (len + 1); > remaining_words -= len + 1; > ses->serverOS[2 * len] = 0; > ses->serverOS[1 + (2 * len)] = 0; > if (remaining_words > 0) { > - len = UniStrnlen((wchar_t *)bcc_ptr, > - remaining_words-1); > - kfree(ses->serverNOS); > - ses->serverNOS = kzalloc(2 * (len + 1), > - GFP_KERNEL); > - if (ses->serverNOS == NULL) > + rc = cifs_nls_to_str( > + &(ses->serverNOS), > + bcc_ptr, > + remaining_words - 1, > + &len, nls_codepage); > + if (rc) > goto sesssetup_nomem; > - cifs_strfromUCS_le(ses->serverNOS, > - (__le16 *)bcc_ptr, > - len, nls_codepage); > bcc_ptr += 2 * (len + 1); > ses->serverNOS[2 * len] = 0; > ses->serverNOS[1 + (2 * len)] = 0; > @@ -2707,19 +2694,13 @@ CIFSSessSetup(unsigned int xid, struct cifsSesInfo *ses, > } > remaining_words -= len + 1; > if (remaining_words > 0) { > - len = UniStrnlen((wchar_t *) bcc_ptr, remaining_words); > - /* last string is not always null terminated > - (for e.g. for Windows XP & 2000) */ > - if (ses->serverDomain) > - kfree(ses->serverDomain); > - ses->serverDomain = > - kzalloc(2*(len+1), > - GFP_KERNEL); > - if (ses->serverDomain == NULL) > + rc = cifs_nls_to_str( > + &(ses->serverDomain), > + bcc_ptr, > + remaining_words, &len, > + nls_codepage); > + if (rc) > goto sesssetup_nomem; > - cifs_strfromUCS_le(ses->serverDomain, > - (__le16 *)bcc_ptr, > - len, nls_codepage); > bcc_ptr += 2 * (len + 1); > ses->serverDomain[2*len] = 0; > ses->serverDomain[1+(2*len)] = 0; > diff --git a/fs/cifs/sess.c b/fs/cifs/sess.c > index 5c68b42..1e82b22 100644 > --- a/fs/cifs/sess.c > +++ b/fs/cifs/sess.c > @@ -301,33 +301,26 @@ static int decode_unicode_ssetup(char **pbcc_area, int bleft, > words_left = bleft / 2; > > /* save off server operating system */ > - len = UniStrnlen((wchar_t *) data, words_left); > + rc = cifs_nls_to_str(&(ses->serverOS), data, words_left, &len, > + nls_cp); > + if (rc) > + return rc; > > -/* We look for obvious messed up bcc or strings in response so we do not go off > - the end since (at least) WIN2K and Windows XP have a major bug in not null > - terminating last Unicode string in response */ > if (len >= words_left) > return rc; > > - kfree(ses->serverOS); > - /* UTF-8 string will not grow more than four times as big as UCS-16 */ > - ses->serverOS = kzalloc((4 * len) + 2 /* trailing null */, GFP_KERNEL); > - if (ses->serverOS != NULL) > - cifs_strfromUCS_le(ses->serverOS, (__le16 *)data, len, nls_cp); > data += 2 * (len + 1); > words_left -= len + 1; > > /* save off server network operating system */ > - len = UniStrnlen((wchar_t *) data, words_left); > + rc = cifs_nls_to_str(&(ses->serverNOS), data, words_left, &len, > + nls_cp); > + if (rc) > + return rc; > > if (len >= words_left) > return rc; > - > - kfree(ses->serverNOS); > - ses->serverNOS = kzalloc((4 * len) + 2 /* trailing null */, GFP_KERNEL); > if (ses->serverNOS != NULL) { > - cifs_strfromUCS_le(ses->serverNOS, (__le16 *)data, len, > - nls_cp); > if (strncmp(ses->serverNOS, "NT LAN Manager 4", 16) == 0) { > cFYI(1, ("NT4 server")); > ses->flags |= CIFS_SES_NT4; > @@ -337,19 +330,14 @@ static int decode_unicode_ssetup(char **pbcc_area, int bleft, > words_left -= len + 1; > > /* save off server domain */ > - len = UniStrnlen((wchar_t *) data, words_left); > + rc = cifs_nls_to_str(&(ses->serverDomain), data, words_left, > + &len, nls_cp); > + if (rc) > + return rc; > > if (len > words_left) > return rc; > > - kfree(ses->serverDomain); > - ses->serverDomain = kzalloc(2 * (len + 1), GFP_KERNEL); /* BB FIXME wrong length */ > - if (ses->serverDomain != NULL) { > - cifs_strfromUCS_le(ses->serverDomain, (__le16 *)data, len, > - nls_cp); > - ses->serverDomain[2*len] = 0; > - ses->serverDomain[(2*len) + 1] = 0; > - } > data += 2 * (len + 1); > words_left -= len + 1; >
On Sun, Apr 12, 2009 at 4:13 AM, Suresh Jayaraman <sjayaraman@suse.de> wrote: > Based on the recent discussions on opencoded, inconsistent allocation of > memory needed for a unicode string conversion, the consensus is to > calculate the memory needed exactly instead of using size assumptions > and to consolidate some code that allocates memory and does conversion > in a helper function and use it widely. > > +#define UNISTR_MAX_SIZE     (MAX_PATHCONF * (NLS_MAX_CHARSET_SIZE + 1)) >  /* >  * UniStrcat:  Concatenate the second string to the first >  * > @@ -159,6 +161,26 @@ UniStrnlen(const wchar_t *ucs1, int maxlen) >  } > >  /* > + * UniStrnlenBytes: Return the length in bytes > + */ > +static inline int > +UniStrnlenBytes(const char *str, int maxlen, const struct nls_table *codepage) > +{ > +    int rc; > +    wchar_t uni; > +    size_t nbytes = 0; > +    char buf[UNISTR_MAX_SIZE]; This is a huge amount to put on the stack (over 1700 bytes). Are you sure that you need to allocate space for the whole string, just to go through it one character a time while calculating its length? We probably shouldn't have any functions in common paths with more than a few hundred bytes allocated off the stack. > +    while (*str++) { > +        rc = codepage->char2uni(str, maxlen, &uni); > +        if (rc == -1) > +            return -EINVAL; > +        nbytes += utf8_wcstombs(buf, &uni, maxlen); > +    } > +    return nbytes; > +} > + > +/* >  * UniStrncat:  Concatenate length limited string >  */ >  static inline wchar_t * > diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h > index 4167716..d154ed1 100644 > --- a/fs/cifs/cifsproto.h > +++ b/fs/cifs/cifsproto.h > @@ -383,4 +383,6 @@ extern int CIFSSMBSetPosixACL(const int xid, struct cifsTconInfo *tcon, >         const struct nls_table *nls_codepage, int remap_special_chars); >  extern int CIFSGetExtAttr(const int xid, struct cifsTconInfo *tcon, >             const int netfid, __u64 *pExtAttrBits, __u64 *pMask); > +extern int cifs_nls_to_str(char **dst, const char *src, const int maxlen, > +              int *plen, const struct nls_table *nls_codepage); >  #endif         /* _CIFSPROTO_H */ > diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c > index bc09c99..963f4f9 100644 > --- a/fs/cifs/cifssmb.c > +++ b/fs/cifs/cifssmb.c > @@ -81,6 +81,44 @@ static struct { >  #endif /* CONFIG_CIFS_WEAK_PW_HASH */ >  #endif /* CIFS_POSIX */ > > + > +/* > + * Calculates, allocates memory for nls and converts it to character string. > + * Note: caller is responsible for freeing dst if function returned 0. > + * returns: > + *   on success - 0 > + *   on failure - errno > + */ > +int > +cifs_nls_to_str(char **dst, const char *src, const int maxlen, int *plen, > +            const struct nls_table *nls_codepage) > +{ > +    size_t nbytes; > + > +    *plen = UniStrnlen((wchar_t *)src, maxlen); > +    nbytes = UniStrnlenBytes(src, maxlen, nls_codepage); > + > +    /* We look for obvious messed up bcc or strings in response so we do > +     * not go off the end since (at least) WIN2K and * Windows XP have a > +     * major bug in not null terminating last Unicode string in response > +     */ > +    kfree(*dst); > + > +    *dst = kzalloc(nbytes + 2, GFP_KERNEL); > +    if (!*dst) > +        goto err_exit; > + > +    cifs_strfromUCS_le(*dst, (__le16 *)src, *plen, nls_codepage); > +    (*dst)[*plen] = 0; > +    (*dst)[*plen + 1] = 0; /* harmless for ASCII case, needed for Unicode */ > + > +    return 0; > + > +err_exit: > +    cERROR(1, ("Failed to allocate buffer for string\n")); > +    return -ENOMEM; > +} I doubt that we need to lock against rename of the path while we are calculating the length, but it would seem safer if we pass the length of the target string (nbytes+2) to a modified cifs_strfromUCS_le so we don't overflow if the patch changes (in build_path_from_dentry we do something similar in case the path changes underneath us). >  /* Allocates buffer into dst and copies smb string from src to it. >  * caller is responsible for freeing dst if function returned 0. >  * returns: > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c > index 0de3b56..1880537 100644 > --- a/fs/cifs/connect.c > +++ b/fs/cifs/connect.c > @@ -2667,36 +2667,23 @@ CIFSSessSetup(unsigned int xid, struct cifsSesInfo *ses, >                     remaining_words = >                         BCC(smb_buffer_response) / 2; >                 } > -                len = > -                  UniStrnlen((wchar_t *) bcc_ptr, > -                        remaining_words - 1); > -/* We look for obvious messed up bcc or strings in response so we do not go off > -  the end since (at least) WIN2K and Windows XP have a major bug in not null > -  terminating last Unicode string in response  */ > -                if (ses->serverOS) > -                    kfree(ses->serverOS); > -                ses->serverOS = kzalloc(2 * (len + 1), > -                            GFP_KERNEL); > -                if (ses->serverOS == NULL) > +                rc = cifs_nls_to_str(&(ses->serverOS), > +                        bcc_ptr, remaining_words - 1, > +                        &len, nls_codepage); > +                if (rc) >                     goto sesssetup_nomem; > -                cifs_strfromUCS_le(ses->serverOS, > -                          (__le16 *)bcc_ptr, > -                          len, nls_codepage); >                 bcc_ptr += 2 * (len + 1); >                 remaining_words -= len + 1; >                 ses->serverOS[2 * len] = 0; >                 ses->serverOS[1 + (2 * len)] = 0; >                 if (remaining_words > 0) { > -                    len = UniStrnlen((wchar_t *)bcc_ptr, > -                             remaining_words-1); > -                    kfree(ses->serverNOS); > -                    ses->serverNOS = kzalloc(2 * (len + 1), > -                                 GFP_KERNEL); > -                    if (ses->serverNOS == NULL) > +                    rc = cifs_nls_to_str( > +                            &(ses->serverNOS), > +                            bcc_ptr, > +                            remaining_words - 1, > +                            &len, nls_codepage); > +                    if (rc) >                         goto sesssetup_nomem; > -                    cifs_strfromUCS_le(ses->serverNOS, > -                              (__le16 *)bcc_ptr, > -                              len, nls_codepage); >                     bcc_ptr += 2 * (len + 1); >                     ses->serverNOS[2 * len] = 0; >                     ses->serverNOS[1 + (2 * len)] = 0; > @@ -2707,19 +2694,13 @@ CIFSSessSetup(unsigned int xid, struct cifsSesInfo *ses, >                     } >                     remaining_words -= len + 1; >                     if (remaining_words > 0) { > -                        len = UniStrnlen((wchar_t *) bcc_ptr, remaining_words); > -                /* last string is not always null terminated > -                  (for e.g. for Windows XP & 2000) */ > -                        if (ses->serverDomain) > -                            kfree(ses->serverDomain); > -                        ses->serverDomain = > -                          kzalloc(2*(len+1), > -                              GFP_KERNEL); > -                        if (ses->serverDomain == NULL) > +                        rc = cifs_nls_to_str( > +                            &(ses->serverDomain), > +                            bcc_ptr, > +                            remaining_words, &len, > +                            nls_codepage); > +                        if (rc) >                             goto sesssetup_nomem; > -                        cifs_strfromUCS_le(ses->serverDomain, > -                            (__le16 *)bcc_ptr, > -                            len, nls_codepage); >                         bcc_ptr += 2 * (len + 1); >                         ses->serverDomain[2*len] = 0; >                         ses->serverDomain[1+(2*len)] = 0; > diff --git a/fs/cifs/sess.c b/fs/cifs/sess.c > index 5c68b42..1e82b22 100644 > --- a/fs/cifs/sess.c > +++ b/fs/cifs/sess.c > @@ -301,33 +301,26 @@ static int decode_unicode_ssetup(char **pbcc_area, int bleft, >     words_left = bleft / 2; > >     /* save off server operating system */ > -    len = UniStrnlen((wchar_t *) data, words_left); > +    rc = cifs_nls_to_str(&(ses->serverOS), data, words_left, &len, > +            nls_cp); > +    if (rc) > +        return rc; > > -/* We look for obvious messed up bcc or strings in response so we do not go off > -  the end since (at least) WIN2K and Windows XP have a major bug in not null > -  terminating last Unicode string in response  */ >     if (len >= words_left) >         return rc; > > -    kfree(ses->serverOS); > -    /* UTF-8 string will not grow more than four times as big as UCS-16 */ > -    ses->serverOS = kzalloc((4 * len) + 2 /* trailing null */, GFP_KERNEL); > -    if (ses->serverOS != NULL) > -        cifs_strfromUCS_le(ses->serverOS, (__le16 *)data, len, nls_cp); >     data += 2 * (len + 1); >     words_left -= len + 1; > >     /* save off server network operating system */ > -    len = UniStrnlen((wchar_t *) data, words_left); > +    rc = cifs_nls_to_str(&(ses->serverNOS), data, words_left, &len, > +                nls_cp); > +    if (rc) > +        return rc; > >     if (len >= words_left) >         return rc; > - > -    kfree(ses->serverNOS); > -    ses->serverNOS = kzalloc((4 * len) + 2 /* trailing null */, GFP_KERNEL); >     if (ses->serverNOS != NULL) { > -        cifs_strfromUCS_le(ses->serverNOS, (__le16 *)data, len, > -                  nls_cp); >         if (strncmp(ses->serverNOS, "NT LAN Manager 4", 16) == 0) { >             cFYI(1, ("NT4 server")); >             ses->flags |= CIFS_SES_NT4; > @@ -337,19 +330,14 @@ static int decode_unicode_ssetup(char **pbcc_area, int bleft, >     words_left -= len + 1; > >     /* save off server domain */ > -    len = UniStrnlen((wchar_t *) data, words_left); > +    rc = cifs_nls_to_str(&(ses->serverDomain), data, words_left, > +                  &len, nls_cp); > +    if (rc) > +        return rc; > >     if (len > words_left) >         return rc; > > -    kfree(ses->serverDomain); > -    ses->serverDomain = kzalloc(2 * (len + 1), GFP_KERNEL); /* BB FIXME wrong length */ > -    if (ses->serverDomain != NULL) { > -        cifs_strfromUCS_le(ses->serverDomain, (__le16 *)data, len, > -                  nls_cp); > -        ses->serverDomain[2*len] = 0; > -        ses->serverDomain[(2*len) + 1] = 0; > -    } >     data += 2 * (len + 1); >     words_left -= len + 1; > >
Suresh Jayaraman schrieb: > Based on the recent discussions on opencoded, inconsistent allocation of > memory needed for a unicode string conversion, the consensus is to > calculate the memory needed exactly instead of using size assumptions > and to consolidate some code that allocates memory and does conversion > in a helper function and use it widely. > > This patch attempts to do the same. Please note that non-unicode cases > has not been converted now as they are less error-prone than unicode > cases. I've only compile tested this patch and yet to test it > completely, but posting this little premature patch as there are lot > of discussions happening around this code. This patch requires careful > review. > > Thanks to Jeff, Steve and Simo for useful suggestions. > > Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> > --- > First of all, it should be clear what means "Unicode" when we talk of "Unicode". There are variants with a fixed character length and a variable character length. When talking about Unicode for CIFS, there is meant UCS-2 or UTF-16. UCS-2 is fixed with 2 bytes per character. UTF-16 can be 2 bytes or 4 bytes long. That is the _input_. And now the conversion to the native codepage on the system is done. If the NLS-Codepage is e.g. iso-8859-1, all characters are 1 byte long The first 128 characters are like ASCII, and the upper 128 characters are specific. In that case we could theoretically take the length from Unistrlen and use this value and handle it as if it were bytes. But now we are at the point. Imagine, the system uses UTF-8. UTF-8, which is a form of Unicode, can be one byte long, or it can be 4 bytes long (or 2 or 3 bytes). 5-8 bytes are not used now. We do not know before. We can say, a UTF-8 string is at least the number of characters long (in bytes), and at maximum 4 times the number of characters (worst case). I'm not familiar with the linux kernel - is there a way to determine the name of the NLS that is set on the system? > fs/cifs/cifs_unicode.h | 22 ++++++++++++++++++++ > fs/cifs/cifsproto.h | 2 + > fs/cifs/cifssmb.c | 38 +++++++++++++++++++++++++++++++++++ > fs/cifs/connect.c | 51 +++++++++++++++-------------------------------- > fs/cifs/sess.c | 36 +++++++++++---------------------- > 5 files changed, 90 insertions(+), 59 deletions(-) > > diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h > index 14eb9a2..43c0438 100644 > --- a/fs/cifs/cifs_unicode.h > +++ b/fs/cifs/cifs_unicode.h > @@ -34,6 +34,7 @@ > #include <asm/byteorder.h> > #include <linux/types.h> > #include <linux/nls.h> > +#include "cifspdu.h" > > #define UNIUPR_NOLOWER /* Example to not expand lower case tables */ > > @@ -63,6 +64,7 @@ int cifs_strfromUCS_le(char *, const __le16 *, int, const struct nls_table *); > int cifs_strtoUCS(__le16 *, const char *, int, const struct nls_table *); > #endif > > +#define UNISTR_MAX_SIZE (MAX_PATHCONF * (NLS_MAX_CHARSET_SIZE + 1)) > /* > * UniStrcat: Concatenate the second string to the first > * > @@ -159,6 +161,26 @@ UniStrnlen(const wchar_t *ucs1, int maxlen) > } > > /* > + * UniStrnlenBytes: Return the length in bytes > + */ > The length in bytes in NLS format. That should be pointed out here. > +static inline int > +UniStrnlenBytes(const char *str, int maxlen, const struct nls_table *codepage) > +{ > + int rc; > + wchar_t uni; > + size_t nbytes = 0; > + char buf[UNISTR_MAX_SIZE]; > > Nice, but what if src is longer? buf would overflow, doesn't it? > + while (*str++) { > + rc = codepage->char2uni(str, maxlen, &uni); > Uni2char > + if (rc == -1) > + return -EINVAL; > + nbytes += utf8_wcstombs(buf, &uni, maxlen); > This function internally uses uni2char, so one time should be enough. > + } > + return nbytes; > +} > + > +/* > * UniStrncat: Concatenate length limited string > */ > static inline wchar_t * > diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h > index 4167716..d154ed1 100644 > --- a/fs/cifs/cifsproto.h > +++ b/fs/cifs/cifsproto.h > @@ -383,4 +383,6 @@ extern int CIFSSMBSetPosixACL(const int xid, struct cifsTconInfo *tcon, > const struct nls_table *nls_codepage, int remap_special_chars); > extern int CIFSGetExtAttr(const int xid, struct cifsTconInfo *tcon, > const int netfid, __u64 *pExtAttrBits, __u64 *pMask); > +extern int cifs_nls_to_str(char **dst, const char *src, const int maxlen, > + int *plen, const struct nls_table *nls_codepage); > #endif /* _CIFSPROTO_H */ > diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c > index bc09c99..963f4f9 100644 > --- a/fs/cifs/cifssmb.c > +++ b/fs/cifs/cifssmb.c > @@ -81,6 +81,44 @@ static struct { > #endif /* CONFIG_CIFS_WEAK_PW_HASH */ > #endif /* CIFS_POSIX */ > > + > +/* > + * Calculates, allocates memory for nls and converts it to character string. > + * Note: caller is responsible for freeing dst if function returned 0. > + * returns: > + * on success - 0 > + * on failure - errno > + */ > +int > +cifs_nls_to_str(char **dst, const char *src, const int maxlen, int *plen, > + const struct nls_table *nls_codepage) > Should it be called cifs_utf16_to_nls ? > +{ > + size_t nbytes; > + > + *plen = UniStrnlen((wchar_t *)src, maxlen); > + nbytes = UniStrnlenBytes(src, maxlen, nls_codepage); > + > + /* We look for obvious messed up bcc or strings in response so we do > + * not go off the end since (at least) WIN2K and * Windows XP have a > + * major bug in not null terminating last Unicode string in response > + */ > + kfree(*dst); > + > + *dst = kzalloc(nbytes + 2, GFP_KERNEL); > + if (!*dst) > + goto err_exit; > > Is src null-terminated here? > + cifs_strfromUCS_le(*dst, (__le16 *)src, *plen, nls_codepage); > + (*dst)[*plen] = 0; > + (*dst)[*plen + 1] = 0; /* harmless for ASCII case, needed for Unicode */ > This ugly piece of code should be replaced. In ASCII (and all it's compatibles like ISO-8859, UTF-8!), a 00-octet means end of string. Why should we use two bytes, when only one is needed? Perhaps we can modify cifs_strfromUCS_le so that it leaves the null-termination and converts it to native. Another common question: Is it possible to have UTF-16 as NLS? Otherwise an NLS-String is always terminated with only one zero-byte. Peter
On Sun, Apr 12, 2009 at 9:23 AM, Peter Hudec <PeterHudec@web.de> wrote: > Suresh Jayaraman schrieb: >> >> Based on the recent discussions on opencoded, inconsistent allocation of >> memory needed for a unicode string conversion, the consensus is to >> calculate the memory needed exactly instead of using size assumptions >> and to consolidate some code that allocates memory and does conversion >> in a helper function and use it widely. >> > First of all, it should be clear what means "Unicode" when we talk of > "Unicode". > > There are variants with a fixed character length and a variable character > length. > > When talking about Unicode for CIFS, there is meant UCS-2 or UTF-16. > UCS-2 is fixed with 2 bytes per character. > UTF-16 can be 2 bytes or 4 bytes long. CIFS (and presumably NTFS, JFS and in the future SMB2 and perhaps a few others) use Unicode (this used to be described as UCS-2, but you are probably right that this is actually UTF-16 for certain newer servers such as Windows 2003, Windows 2008, Vista). Windows moved to UTF-16 internally and it wouldn't make sense for this to be specific to the network protocol. I don't know if in practice the distinction between UCS-2 vs. UTF-16 would make a difference in the more limited form of the mappings done here. The WSPP has a large companion document describing Unicode which may describe the more important details (for someone very familiar with internationalization): http://download.microsoft.com/download/9/5/E/95EF66AF-9026-4BB0-A41D-A4F81802D92C/%5BMS-UCODEREF%5D.pdf
On Sun, Apr 12, 2009 at 02:43:57PM +0530, Suresh Jayaraman wrote: > Based on the recent discussions on opencoded, inconsistent allocation of > memory needed for a unicode string conversion, the consensus is to > calculate the memory needed exactly instead of using size assumptions > and to consolidate some code that allocates memory and does conversion > in a helper function and use it widely. > > This patch attempts to do the same. Please note that non-unicode cases > has not been converted now as they are less error-prone than unicode > cases. I've only compile tested this patch and yet to test it > completely, but posting this little premature patch as there are lot > of discussions happening around this code. This patch requires careful > review. > > Thanks to Jeff, Steve and Simo for useful suggestions. One little comment before even looking at the code: would it be useful to have some of those helpers in common code? We have an awful lot of filesystems dealing with utf8 or ucs16 encoding with a lot of ad-hoc helpers or even worse opencoded handling. I started wondering about consolidating this when reviewing Barry's patches for utf8 case insensitive support in xfs which I'll need to revisit at some point.
Christoph Hellwig wrote: > On Sun, Apr 12, 2009 at 02:43:57PM +0530, Suresh Jayaraman wrote: >> Based on the recent discussions on opencoded, inconsistent allocation of >> memory needed for a unicode string conversion, the consensus is to >> calculate the memory needed exactly instead of using size assumptions >> and to consolidate some code that allocates memory and does conversion >> in a helper function and use it widely. >> >> This patch attempts to do the same. Please note that non-unicode cases >> has not been converted now as they are less error-prone than unicode >> cases. I've only compile tested this patch and yet to test it >> completely, but posting this little premature patch as there are lot >> of discussions happening around this code. This patch requires careful >> review. >> >> Thanks to Jeff, Steve and Simo for useful suggestions. > > One little comment before even looking at the code: would it be useful > to have some of those helpers in common code? We have an awful lot of > filesystems dealing with utf8 or ucs16 encoding with a lot of ad-hoc > helpers or even worse opencoded handling. Yes, I think. My quick look at some of the fs/* code suggests that it needs consistent handling. I'll see whether this helper can me made generic so that other filesystems could use them too. What other filesystems (apart from xfs) which you think can benefit from this? If there are enough canditates, may be we could consider adding this helper to fs/nls/ ? Thanks,
On Mon, Apr 13, 2009 at 11:34 PM, Suresh Jayaraman <sjayaraman@suse.de> wrote: > Christoph Hellwig wrote: >> On Sun, Apr 12, 2009 at 02:43:57PM +0530, Suresh Jayaraman wrote: >>> Based on the recent discussions on opencoded, inconsistent allocation of >>> memory needed for a unicode string conversion, the consensus is to >>> calculate the memory needed exactly instead of using size assumptions >>> and to consolidate some code that allocates memory and does conversion >>> in a helper function and use it widely. >>> >>> This patch attempts to do the same. Please note that non-unicode cases >>> has not been converted now as they are less error-prone than unicode >>> cases. I've only compile tested this patch and yet to test it >>> completely, but posting this little premature patch as there are lot >>> of discussions happening around this code. This patch requires careful >>> review. >>> >>> Thanks to Jeff, Steve and Simo for useful suggestions. >> >> One little comment before even looking at the code: Â would it be useful >> to have some of those helpers in common code? Â We have an awful lot of >> filesystems dealing with utf8 or ucs16 encoding with a lot of ad-hoc >> helpers or even worse opencoded handling. > > Yes, I think. My quick look at some of the fs/* code suggests that it > needs consistent handling. I'll see whether this helper can me made > generic so that other filesystems could use them too. What other > filesystems (apart from xfs) which you think can benefit from this? > > If there are enough canditates, may be we could consider adding this > helper to fs/nls/ ? I agree. This makes sense to move at least some of these helpers to fs/nls (besides making the SMB2 prototype a little smaller, it would likely provide benefit for NTFS, cifs and others to use consistent handling)
diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h index 14eb9a2..43c0438 100644 --- a/fs/cifs/cifs_unicode.h +++ b/fs/cifs/cifs_unicode.h @@ -34,6 +34,7 @@ #include <asm/byteorder.h> #include <linux/types.h> #include <linux/nls.h> +#include "cifspdu.h" #define UNIUPR_NOLOWER /* Example to not expand lower case tables */ @@ -63,6 +64,7 @@ int cifs_strfromUCS_le(char *, const __le16 *, int, const struct nls_table *); int cifs_strtoUCS(__le16 *, const char *, int, const struct nls_table *); #endif +#define UNISTR_MAX_SIZE (MAX_PATHCONF * (NLS_MAX_CHARSET_SIZE + 1)) /* * UniStrcat: Concatenate the second string to the first * @@ -159,6 +161,26 @@ UniStrnlen(const wchar_t *ucs1, int maxlen) } /* + * UniStrnlenBytes: Return the length in bytes + */ +static inline int +UniStrnlenBytes(const char *str, int maxlen, const struct nls_table *codepage) +{ + int rc; + wchar_t uni; + size_t nbytes = 0; + char buf[UNISTR_MAX_SIZE]; + + while (*str++) { + rc = codepage->char2uni(str, maxlen, &uni); + if (rc == -1) + return -EINVAL; + nbytes += utf8_wcstombs(buf, &uni, maxlen); + } + return nbytes; +} + +/* * UniStrncat: Concatenate length limited string */ static inline wchar_t * diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index 4167716..d154ed1 100644 --- a/fs/cifs/cifsproto.h +++ b/fs/cifs/cifsproto.h @@ -383,4 +383,6 @@ extern int CIFSSMBSetPosixACL(const int xid, struct cifsTconInfo *tcon, const struct nls_table *nls_codepage, int remap_special_chars); extern int CIFSGetExtAttr(const int xid, struct cifsTconInfo *tcon, const int netfid, __u64 *pExtAttrBits, __u64 *pMask); +extern int cifs_nls_to_str(char **dst, const char *src, const int maxlen, + int *plen, const struct nls_table *nls_codepage); #endif /* _CIFSPROTO_H */ diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index bc09c99..963f4f9 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -81,6 +81,44 @@ static struct { #endif /* CONFIG_CIFS_WEAK_PW_HASH */ #endif /* CIFS_POSIX */ + +/* + * Calculates, allocates memory for nls and converts it to character string. + * Note: caller is responsible for freeing dst if function returned 0. + * returns: + * on success - 0 + * on failure - errno + */ +int +cifs_nls_to_str(char **dst, const char *src, const int maxlen, int *plen, + const struct nls_table *nls_codepage) +{ + size_t nbytes; + + *plen = UniStrnlen((wchar_t *)src, maxlen); + nbytes = UniStrnlenBytes(src, maxlen, nls_codepage); + + /* We look for obvious messed up bcc or strings in response so we do + * not go off the end since (at least) WIN2K and * Windows XP have a + * major bug in not null terminating last Unicode string in response + */ + kfree(*dst); + + *dst = kzalloc(nbytes + 2, GFP_KERNEL); + if (!*dst) + goto err_exit; + + cifs_strfromUCS_le(*dst, (__le16 *)src, *plen, nls_codepage); + (*dst)[*plen] = 0; + (*dst)[*plen + 1] = 0; /* harmless for ASCII case, needed for Unicode */ + + return 0; + +err_exit: + cERROR(1, ("Failed to allocate buffer for string\n")); + return -ENOMEM; +} + /* Allocates buffer into dst and copies smb string from src to it. * caller is responsible for freeing dst if function returned 0. * returns: diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 0de3b56..1880537 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -2667,36 +2667,23 @@ CIFSSessSetup(unsigned int xid, struct cifsSesInfo *ses, remaining_words = BCC(smb_buffer_response) / 2; } - len = - UniStrnlen((wchar_t *) bcc_ptr, - remaining_words - 1); -/* We look for obvious messed up bcc or strings in response so we do not go off - the end since (at least) WIN2K and Windows XP have a major bug in not null - terminating last Unicode string in response */ - if (ses->serverOS) - kfree(ses->serverOS); - ses->serverOS = kzalloc(2 * (len + 1), - GFP_KERNEL); - if (ses->serverOS == NULL) + rc = cifs_nls_to_str(&(ses->serverOS), + bcc_ptr, remaining_words - 1, + &len, nls_codepage); + if (rc) goto sesssetup_nomem; - cifs_strfromUCS_le(ses->serverOS, - (__le16 *)bcc_ptr, - len, nls_codepage); bcc_ptr += 2 * (len + 1); remaining_words -= len + 1; ses->serverOS[2 * len] = 0; ses->serverOS[1 + (2 * len)] = 0; if (remaining_words > 0) { - len = UniStrnlen((wchar_t *)bcc_ptr, - remaining_words-1); - kfree(ses->serverNOS); - ses->serverNOS = kzalloc(2 * (len + 1), - GFP_KERNEL); - if (ses->serverNOS == NULL) + rc = cifs_nls_to_str( + &(ses->serverNOS), + bcc_ptr, + remaining_words - 1, + &len, nls_codepage); + if (rc) goto sesssetup_nomem; - cifs_strfromUCS_le(ses->serverNOS, - (__le16 *)bcc_ptr, - len, nls_codepage); bcc_ptr += 2 * (len + 1); ses->serverNOS[2 * len] = 0; ses->serverNOS[1 + (2 * len)] = 0; @@ -2707,19 +2694,13 @@ CIFSSessSetup(unsigned int xid, struct cifsSesInfo *ses, } remaining_words -= len + 1; if (remaining_words > 0) { - len = UniStrnlen((wchar_t *) bcc_ptr, remaining_words); - /* last string is not always null terminated - (for e.g. for Windows XP & 2000) */ - if (ses->serverDomain) - kfree(ses->serverDomain); - ses->serverDomain = - kzalloc(2*(len+1), - GFP_KERNEL); - if (ses->serverDomain == NULL) + rc = cifs_nls_to_str( + &(ses->serverDomain), + bcc_ptr, + remaining_words, &len, + nls_codepage); + if (rc) goto sesssetup_nomem; - cifs_strfromUCS_le(ses->serverDomain, - (__le16 *)bcc_ptr, - len, nls_codepage); bcc_ptr += 2 * (len + 1); ses->serverDomain[2*len] = 0; ses->serverDomain[1+(2*len)] = 0; diff --git a/fs/cifs/sess.c b/fs/cifs/sess.c index 5c68b42..1e82b22 100644 --- a/fs/cifs/sess.c +++ b/fs/cifs/sess.c @@ -301,33 +301,26 @@ static int decode_unicode_ssetup(char **pbcc_area, int bleft, words_left = bleft / 2; /* save off server operating system */ - len = UniStrnlen((wchar_t *) data, words_left); + rc = cifs_nls_to_str(&(ses->serverOS), data, words_left, &len, + nls_cp); + if (rc) + return rc; -/* We look for obvious messed up bcc or strings in response so we do not go off - the end since (at least) WIN2K and Windows XP have a major bug in not null - terminating last Unicode string in response */ if (len >= words_left) return rc; - kfree(ses->serverOS); - /* UTF-8 string will not grow more than four times as big as UCS-16 */ - ses->serverOS = kzalloc((4 * len) + 2 /* trailing null */, GFP_KERNEL); - if (ses->serverOS != NULL) - cifs_strfromUCS_le(ses->serverOS, (__le16 *)data, len, nls_cp); data += 2 * (len + 1); words_left -= len + 1; /* save off server network operating system */ - len = UniStrnlen((wchar_t *) data, words_left); + rc = cifs_nls_to_str(&(ses->serverNOS), data, words_left, &len, + nls_cp); + if (rc) + return rc; if (len >= words_left) return rc; - - kfree(ses->serverNOS); - ses->serverNOS = kzalloc((4 * len) + 2 /* trailing null */, GFP_KERNEL); if (ses->serverNOS != NULL) { - cifs_strfromUCS_le(ses->serverNOS, (__le16 *)data, len, - nls_cp); if (strncmp(ses->serverNOS, "NT LAN Manager 4", 16) == 0) { cFYI(1, ("NT4 server")); ses->flags |= CIFS_SES_NT4; @@ -337,19 +330,14 @@ static int decode_unicode_ssetup(char **pbcc_area, int bleft, words_left -= len + 1; /* save off server domain */ - len = UniStrnlen((wchar_t *) data, words_left); + rc = cifs_nls_to_str(&(ses->serverDomain), data, words_left, + &len, nls_cp); + if (rc) + return rc; if (len > words_left) return rc; - kfree(ses->serverDomain); - ses->serverDomain = kzalloc(2 * (len + 1), GFP_KERNEL); /* BB FIXME wrong length */ - if (ses->serverDomain != NULL) { - cifs_strfromUCS_le(ses->serverDomain, (__le16 *)data, len, - nls_cp); - ses->serverDomain[2*len] = 0; - ses->serverDomain[(2*len) + 1] = 0; - } data += 2 * (len + 1); words_left -= len + 1;
Based on the recent discussions on opencoded, inconsistent allocation of memory needed for a unicode string conversion, the consensus is to calculate the memory needed exactly instead of using size assumptions and to consolidate some code that allocates memory and does conversion in a helper function and use it widely. This patch attempts to do the same. Please note that non-unicode cases has not been converted now as they are less error-prone than unicode cases. I've only compile tested this patch and yet to test it completely, but posting this little premature patch as there are lot of discussions happening around this code. This patch requires careful review. Thanks to Jeff, Steve and Simo for useful suggestions. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> --- fs/cifs/cifs_unicode.h | 22 ++++++++++++++++++++ fs/cifs/cifsproto.h | 2 + fs/cifs/cifssmb.c | 38 +++++++++++++++++++++++++++++++++++ fs/cifs/connect.c | 51 +++++++++++++++-------------------------------- fs/cifs/sess.c | 36 +++++++++++---------------------- 5 files changed, 90 insertions(+), 59 deletions(-)