Message ID | 1241011759-7632-2-git-send-email-jlayton@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Apr 29, 2009 at 8:29 AM, Jeff Layton <jlayton@redhat.com> wrote: > It's possible to have the null terminator for a charset be a single or > multiple character. Add a function to tell us how long it should be. > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > --- > fs/cifs/cifs_unicode.h | 19 +++++++++++++++++++ > 1 files changed, 19 insertions(+), 0 deletions(-) > > diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h > index 14eb9a2..6bffab5 100644 > --- a/fs/cifs/cifs_unicode.h > +++ b/fs/cifs/cifs_unicode.h > @@ -64,6 +64,25 @@ int cifs_strtoUCS(__le16 *, const char *, int, const struct nls_table *); > #endif > > /* > + * null_charlen - return length of null character for codepage > + * @codepage - codepage for which to return length of NULL terminator > + * > + * Since we can't guarantee that the null terminator will be a particular > + * length, we have to check against the codepage. If there's a problem > + * determining it, assume a single-byte NULL terminator. > + */ > +static inline int > +null_charlen(const struct nls_table *codepage) > +{ > + int charlen; > + char tmp[NLS_MAX_CHARSET_SIZE]; > + > + charlen = codepage->uni2char(0, tmp, NLS_MAX_CHARSET_SIZE); > + > + return charlen > 0 ? charlen : 1; > +} > + > +/* > * UniStrcat: Concatenate the second string to the first > * > * Returns: > -- > 1.6.0.6 > > _______________________________________________ > linux-cifs-client mailing list > linux-cifs-client@lists.samba.org > https://lists.samba.org/mailman/listinfo/linux-cifs-client > For some of the charsets I looked at under fs/nls, it looks like uni2char always returns 1, I think to indicate the function succeeded as opposed to sending an error. Are there any charsets that you might have looked at whose uni2char function returns more than 1 byte as size of the null character?
On Wed, 29 Apr 2009 08:58:22 -0500 Shirish Pargaonkar <shirishpargaonkar@gmail.com> wrote: > On Wed, Apr 29, 2009 at 8:29 AM, Jeff Layton <jlayton@redhat.com> wrote: > > It's possible to have the null terminator for a charset be a single or > > multiple character. Add a function to tell us how long it should be. > > > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > > --- > > fs/cifs/cifs_unicode.h | 19 +++++++++++++++++++ > > 1 files changed, 19 insertions(+), 0 deletions(-) > > > > diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h > > index 14eb9a2..6bffab5 100644 > > --- a/fs/cifs/cifs_unicode.h > > +++ b/fs/cifs/cifs_unicode.h > > @@ -64,6 +64,25 @@ int cifs_strtoUCS(__le16 *, const char *, int, const struct nls_table *); > > #endif > > > > /* > > + * null_charlen - return length of null character for codepage > > + * @codepage - codepage for which to return length of NULL terminator > > + * > > + * Since we can't guarantee that the null terminator will be a particular > > + * length, we have to check against the codepage. If there's a problem > > + * determining it, assume a single-byte NULL terminator. > > + */ > > +static inline int > > +null_charlen(const struct nls_table *codepage) > > +{ > > + int charlen; > > + char tmp[NLS_MAX_CHARSET_SIZE]; > > + > > + charlen = codepage->uni2char(0, tmp, NLS_MAX_CHARSET_SIZE); > > + > > + return charlen > 0 ? charlen : 1; > > +} > > + > > +/* > > * UniStrcat: Concatenate the second string to the first > > * > > * Returns: > > -- > > 1.6.0.6 > > > > _______________________________________________ > > linux-cifs-client mailing list > > linux-cifs-client@lists.samba.org > > https://lists.samba.org/mailman/listinfo/linux-cifs-client > > > > For some of the charsets I looked at under fs/nls, it looks like uni2char > always returns 1, I think to indicate the function succeeded as opposed > to sending an error. > Are there any charsets that you might have looked at whose > uni2char function returns more than 1 byte as size of the null character? No, I haven't see any, but I didn't do an exhaustive search. Given the number of problems we've had in this area, I'm leery of making any assumptions about these lengths. It's also possible that at some point in the future we could have an in-kernel version of UTF-16 or UTF-32. In the event of that we'll need to deal with multibyte null termination. So I think it makes sense to use a helper function for determining this rather than sprinkling "+1" to lengths all over the code. The overhead looks pretty minimal anyway.
On Wed, Apr 29, 2009 at 9:12 AM, Jeff Layton <jlayton@redhat.com> wrote: > On Wed, 29 Apr 2009 08:58:22 -0500 > Shirish Pargaonkar <shirishpargaonkar@gmail.com> wrote: > >> On Wed, Apr 29, 2009 at 8:29 AM, Jeff Layton <jlayton@redhat.com> wrote: >> > It's possible to have the null terminator for a charset be a single or >> > multiple character. Add a function to tell us how long it should be. >> > >> > Signed-off-by: Jeff Layton <jlayton@redhat.com> >> > --- >> > fs/cifs/cifs_unicode.h | 19 +++++++++++++++++++ >> > 1 files changed, 19 insertions(+), 0 deletions(-) >> > >> > diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h >> > index 14eb9a2..6bffab5 100644 >> > --- a/fs/cifs/cifs_unicode.h >> > +++ b/fs/cifs/cifs_unicode.h >> > @@ -64,6 +64,25 @@ int cifs_strtoUCS(__le16 *, const char *, int, const struct nls_table *); >> > #endif >> > >> > /* >> > + * null_charlen - return length of null character for codepage >> > + * @codepage - codepage for which to return length of NULL terminator >> > + * >> > + * Since we can't guarantee that the null terminator will be a particular >> > + * length, we have to check against the codepage. If there's a problem >> > + * determining it, assume a single-byte NULL terminator. >> > + */ >> > +static inline int >> > +null_charlen(const struct nls_table *codepage) >> > +{ >> > + int charlen; >> > + char tmp[NLS_MAX_CHARSET_SIZE]; >> > + >> > + charlen = codepage->uni2char(0, tmp, NLS_MAX_CHARSET_SIZE); >> > + >> > + return charlen > 0 ? charlen : 1; >> > +} >> > + >> > +/* >> > * UniStrcat: Concatenate the second string to the first >> > * >> > * Returns: >> > -- >> > 1.6.0.6 >> > >> > _______________________________________________ >> > linux-cifs-client mailing list >> > linux-cifs-client@lists.samba.org >> > https://lists.samba.org/mailman/listinfo/linux-cifs-client >> > >> >> For some of the charsets I looked at under fs/nls, it looks like uni2char >> always returns 1, I think to indicate the function succeeded as opposed >> to sending an error. >> Are there any charsets that you might have looked at whose >> uni2char function returns more than 1 byte as size of the null character? > > No, I haven't see any, but I didn't do an exhaustive search. Given the > number of problems we've had in this area, I'm leery of making any > assumptions about these lengths. It's also possible that at some point > in the future we could have an in-kernel version of UTF-16 or UTF-32. > In the event of that we'll need to deal with multibyte null termination. > > So I think it makes sense to use a helper function for determining this > rather than sprinkling "+1" to lengths all over the code. The overhead > looks pretty minimal anyway. > > -- > Jeff Layton <jlayton@redhat.com> > Jeff, I do not think uni2char returns the lenght of a character is bytes. But what you are saying as I understand is, in the future null_charlen may call some other function than uni2char to get the size of a null character to get an accurate value, so interface will remain same, just the functionility (of null_charlen) may change.
On Wed, 29 Apr 2009 09:39:58 -0500 Shirish Pargaonkar <shirishpargaonkar@gmail.com> wrote: > On Wed, Apr 29, 2009 at 9:12 AM, Jeff Layton <jlayton@redhat.com> wrote: > > On Wed, 29 Apr 2009 08:58:22 -0500 > > Shirish Pargaonkar <shirishpargaonkar@gmail.com> wrote: > > > >> On Wed, Apr 29, 2009 at 8:29 AM, Jeff Layton <jlayton@redhat.com> wrote: > >> > It's possible to have the null terminator for a charset be a single or > >> > multiple character. Add a function to tell us how long it should be. > >> > > >> > Signed-off-by: Jeff Layton <jlayton@redhat.com> > >> > --- > >> > fs/cifs/cifs_unicode.h | 19 +++++++++++++++++++ > >> > 1 files changed, 19 insertions(+), 0 deletions(-) > >> > > >> > diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h > >> > index 14eb9a2..6bffab5 100644 > >> > --- a/fs/cifs/cifs_unicode.h > >> > +++ b/fs/cifs/cifs_unicode.h > >> > @@ -64,6 +64,25 @@ int cifs_strtoUCS(__le16 *, const char *, int, const struct nls_table *); > >> > #endif > >> > > >> > /* > >> > + * null_charlen - return length of null character for codepage > >> > + * @codepage - codepage for which to return length of NULL terminator > >> > + * > >> > + * Since we can't guarantee that the null terminator will be a particular > >> > + * length, we have to check against the codepage. If there's a problem > >> > + * determining it, assume a single-byte NULL terminator. > >> > + */ > >> > +static inline int > >> > +null_charlen(const struct nls_table *codepage) > >> > +{ > >> > + int charlen; > >> > + char tmp[NLS_MAX_CHARSET_SIZE]; > >> > + > >> > + charlen = codepage->uni2char(0, tmp, NLS_MAX_CHARSET_SIZE); > >> > + > >> > + return charlen > 0 ? charlen : 1; > >> > +} > >> > + > >> > +/* > >> > * UniStrcat: Concatenate the second string to the first > >> > * > >> > * Returns: > >> > -- > >> > 1.6.0.6 > >> > > >> > _______________________________________________ > >> > linux-cifs-client mailing list > >> > linux-cifs-client@lists.samba.org > >> > https://lists.samba.org/mailman/listinfo/linux-cifs-client > >> > > >> > >> For some of the charsets I looked at under fs/nls, it looks like uni2char > >> always returns 1, I think to indicate the function succeeded as opposed > >> to sending an error. > >> Are there any charsets that you might have looked at whose > >> uni2char function returns more than 1 byte as size of the null character? > > > > No, I haven't see any, but I didn't do an exhaustive search. Given the > > number of problems we've had in this area, I'm leery of making any > > assumptions about these lengths. It's also possible that at some point > > in the future we could have an in-kernel version of UTF-16 or UTF-32. > > In the event of that we'll need to deal with multibyte null termination. > > > > So I think it makes sense to use a helper function for determining this > > rather than sprinkling "+1" to lengths all over the code. The overhead > > looks pretty minimal anyway. > > > > -- > > Jeff Layton <jlayton@redhat.com> > > > > Jeff, I do not think uni2char returns the lenght of a character is bytes. Yes, it does. > But what you are saying as I understand is, in the future null_charlen may > call some other function than uni2char to get the size of a null character > to get an accurate value, so interface will remain same, just the functionility > (of null_charlen) may change. Correct. We could (in principle) make that just return 1 for now, but then we'd have to worry about fixing that in the future if a multibyte null terminator were ever needed.
On Wed, Apr 29, 2009 at 09:29:11AM -0400, Jeff Layton wrote: > /* > + * null_charlen - return length of null character for codepage > + * @codepage - codepage for which to return length of NULL terminator > + * > + * Since we can't guarantee that the null terminator will be a particular > + * length, we have to check against the codepage. If there's a problem > + * determining it, assume a single-byte NULL terminator. > + */ > +static inline int > +null_charlen(const struct nls_table *codepage) > +{ > + int charlen; > + char tmp[NLS_MAX_CHARSET_SIZE]; > + > + charlen = codepage->uni2char(0, tmp, NLS_MAX_CHARSET_SIZE); > + > + return charlen > 0 ? charlen : 1; > +} Add this to include/linux/nls.h instead?
On Wed, Apr 29, 2009 at 9:56 AM, Jeff Layton <jlayton@redhat.com> wrote: > On Wed, 29 Apr 2009 09:39:58 -0500 > Shirish Pargaonkar <shirishpargaonkar@gmail.com> wrote: > >> On Wed, Apr 29, 2009 at 9:12 AM, Jeff Layton <jlayton@redhat.com> wrote: >> > On Wed, 29 Apr 2009 08:58:22 -0500 >> > Shirish Pargaonkar <shirishpargaonkar@gmail.com> wrote: >> > >> >> On Wed, Apr 29, 2009 at 8:29 AM, Jeff Layton <jlayton@redhat.com> wrote: >> >> > It's possible to have the null terminator for a charset be a single or >> >> > multiple character. Add a function to tell us how long it should be. >> >> > >> >> > Signed-off-by: Jeff Layton <jlayton@redhat.com> >> >> > --- >> >> > Â fs/cifs/cifs_unicode.h | Â 19 +++++++++++++++++++ >> >> > Â 1 files changed, 19 insertions(+), 0 deletions(-) >> >> > >> >> > diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h >> >> > index 14eb9a2..6bffab5 100644 >> >> > --- a/fs/cifs/cifs_unicode.h >> >> > +++ b/fs/cifs/cifs_unicode.h >> >> > @@ -64,6 +64,25 @@ int cifs_strtoUCS(__le16 *, const char *, int, const struct nls_table *); >> >> > Â #endif >> >> > >> >> > Â /* >> >> > + * null_charlen - return length of null character for codepage >> >> > + * @codepage - codepage for which to return length of NULL terminator >> >> > + * >> >> > + * Since we can't guarantee that the null terminator will be a particular >> >> > + * length, we have to check against the codepage. If there's a problem >> >> > + * determining it, assume a single-byte NULL terminator. >> >> > + */ >> >> > +static inline int >> >> > +null_charlen(const struct nls_table *codepage) >> >> > +{ >> >> > + Â Â Â int charlen; >> >> > + Â Â Â char tmp[NLS_MAX_CHARSET_SIZE]; >> >> > + >> >> > + Â Â Â charlen = codepage->uni2char(0, tmp, NLS_MAX_CHARSET_SIZE); >> >> > + >> >> > + Â Â Â return charlen > 0 ? charlen : 1; >> >> > +} >> >> > + >> >> > +/* >> >> > Â * UniStrcat: Â Concatenate the second string to the first >> >> > Â * >> >> > Â * Returns: >> >> > -- >> >> > 1.6.0.6 >> >> > >> >> > _______________________________________________ >> >> > linux-cifs-client mailing list >> >> > linux-cifs-client@lists.samba.org >> >> > https://lists.samba.org/mailman/listinfo/linux-cifs-client >> >> > >> >> >> >> For some of the charsets I looked at under fs/nls, it looks like uni2char >> >> always returns 1, I think to indicate the function succeeded as opposed >> >> to sending an error. >> >> Are there any charsets that you might have looked at whose >> >> uni2char function returns more than 1 byte as size of the null character? >> > >> > No, I haven't see any, but I didn't do an exhaustive search. Given the >> > number of problems we've had in this area, I'm leery of making any >> > assumptions about these lengths. It's also possible that at some point >> > in the future we could have an in-kernel version of UTF-16 or UTF-32. >> > In the event of that we'll need to deal with multibyte null termination. >> > >> > So I think it makes sense to use a helper function for determining this >> > rather than sprinkling "+1" to lengths all over the code. The overhead >> > looks pretty minimal anyway. >> > >> > -- >> > Jeff Layton <jlayton@redhat.com> >> > >> >> Jeff, I do not think uni2char returns the lenght of a character is bytes. > > Yes, it does. Alright but the reason I had doubt was, even char2uni returns 1, now that is not correct when a code point from a charset is mapped/encoded to a mutlibyte code value using UTF-16. > >> But what you are saying as I understand is, in the future null_charlen may >> call some other function than uni2char to get the size of a null character >> to get an accurate value, so interface will remain same, just the functionility >> (of null_charlen) may change. > > Correct. We could (in principle) make that just return 1 for now, but > then we'd have to worry about fixing that in the future if a multibyte > null terminator were ever needed. > > -- > Jeff Layton <jlayton@redhat.com> >
On Wed, 29 Apr 2009 11:02:37 -0400 Christoph Hellwig <hch@infradead.org> wrote: > On Wed, Apr 29, 2009 at 09:29:11AM -0400, Jeff Layton wrote: > > /* > > + * null_charlen - return length of null character for codepage > > + * @codepage - codepage for which to return length of NULL terminator > > + * > > + * Since we can't guarantee that the null terminator will be a particular > > + * length, we have to check against the codepage. If there's a problem > > + * determining it, assume a single-byte NULL terminator. > > + */ > > +static inline int > > +null_charlen(const struct nls_table *codepage) > > +{ > > + int charlen; > > + char tmp[NLS_MAX_CHARSET_SIZE]; > > + > > + charlen = codepage->uni2char(0, tmp, NLS_MAX_CHARSET_SIZE); > > + > > + return charlen > 0 ? charlen : 1; > > +} > > Add this to include/linux/nls.h instead? > Makes sense. What I may do is move it there and then hardwire this function to just return '1' for now. It looks like all of the existing charsets that are in the kernel use single-byte null termination. At a later point in time, if multibyte null terminators are needed, we can change this to do something different and the callers should "just work".
diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h index 14eb9a2..6bffab5 100644 --- a/fs/cifs/cifs_unicode.h +++ b/fs/cifs/cifs_unicode.h @@ -64,6 +64,25 @@ int cifs_strtoUCS(__le16 *, const char *, int, const struct nls_table *); #endif /* + * null_charlen - return length of null character for codepage + * @codepage - codepage for which to return length of NULL terminator + * + * Since we can't guarantee that the null terminator will be a particular + * length, we have to check against the codepage. If there's a problem + * determining it, assume a single-byte NULL terminator. + */ +static inline int +null_charlen(const struct nls_table *codepage) +{ + int charlen; + char tmp[NLS_MAX_CHARSET_SIZE]; + + charlen = codepage->uni2char(0, tmp, NLS_MAX_CHARSET_SIZE); + + return charlen > 0 ? charlen : 1; +} + +/* * UniStrcat: Concatenate the second string to the first * * Returns:
It's possible to have the null terminator for a charset be a single or multiple character. Add a function to tell us how long it should be. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/cifs/cifs_unicode.h | 19 +++++++++++++++++++ 1 files changed, 19 insertions(+), 0 deletions(-)