diff mbox

[linux-cifs-client,01/10] cifs: add function to get length of NULL termination in bytes

Message ID 1241011759-7632-2-git-send-email-jlayton@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jeff Layton April 29, 2009, 1:29 p.m. UTC
It's possible to have the null terminator for a charset be a single or
multiple character. Add a function to tell us how long it should be.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/cifs/cifs_unicode.h |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

Comments

Shirish Pargaonkar April 29, 2009, 1:58 p.m. UTC | #1
On Wed, Apr 29, 2009 at 8:29 AM, Jeff Layton <jlayton@redhat.com> wrote:
> It's possible to have the null terminator for a charset be a single or
> multiple character. Add a function to tell us how long it should be.
>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  fs/cifs/cifs_unicode.h |   19 +++++++++++++++++++
>  1 files changed, 19 insertions(+), 0 deletions(-)
>
> diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h
> index 14eb9a2..6bffab5 100644
> --- a/fs/cifs/cifs_unicode.h
> +++ b/fs/cifs/cifs_unicode.h
> @@ -64,6 +64,25 @@ int cifs_strtoUCS(__le16 *, const char *, int, const struct nls_table *);
>  #endif
>
>  /*
> + * null_charlen - return length of null character for codepage
> + * @codepage - codepage for which to return length of NULL terminator
> + *
> + * Since we can't guarantee that the null terminator will be a particular
> + * length, we have to check against the codepage. If there's a problem
> + * determining it, assume a single-byte NULL terminator.
> + */
> +static inline int
> +null_charlen(const struct nls_table *codepage)
> +{
> +       int charlen;
> +       char tmp[NLS_MAX_CHARSET_SIZE];
> +
> +       charlen = codepage->uni2char(0, tmp, NLS_MAX_CHARSET_SIZE);
> +
> +       return charlen > 0 ? charlen : 1;
> +}
> +
> +/*
>  * UniStrcat:  Concatenate the second string to the first
>  *
>  * Returns:
> --
> 1.6.0.6
>
> _______________________________________________
> linux-cifs-client mailing list
> linux-cifs-client@lists.samba.org
> https://lists.samba.org/mailman/listinfo/linux-cifs-client
>

For some of the charsets I looked at under fs/nls, it looks like uni2char
always returns 1, I think to indicate the function succeeded as opposed
to sending an error.
Are there any charsets that you might have looked at whose
uni2char function returns more than 1 byte as size of the null character?
Jeff Layton April 29, 2009, 2:12 p.m. UTC | #2
On Wed, 29 Apr 2009 08:58:22 -0500
Shirish Pargaonkar <shirishpargaonkar@gmail.com> wrote:

> On Wed, Apr 29, 2009 at 8:29 AM, Jeff Layton <jlayton@redhat.com> wrote:
> > It's possible to have the null terminator for a charset be a single or
> > multiple character. Add a function to tell us how long it should be.
> >
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > ---
> >  fs/cifs/cifs_unicode.h |   19 +++++++++++++++++++
> >  1 files changed, 19 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h
> > index 14eb9a2..6bffab5 100644
> > --- a/fs/cifs/cifs_unicode.h
> > +++ b/fs/cifs/cifs_unicode.h
> > @@ -64,6 +64,25 @@ int cifs_strtoUCS(__le16 *, const char *, int, const struct nls_table *);
> >  #endif
> >
> >  /*
> > + * null_charlen - return length of null character for codepage
> > + * @codepage - codepage for which to return length of NULL terminator
> > + *
> > + * Since we can't guarantee that the null terminator will be a particular
> > + * length, we have to check against the codepage. If there's a problem
> > + * determining it, assume a single-byte NULL terminator.
> > + */
> > +static inline int
> > +null_charlen(const struct nls_table *codepage)
> > +{
> > +       int charlen;
> > +       char tmp[NLS_MAX_CHARSET_SIZE];
> > +
> > +       charlen = codepage->uni2char(0, tmp, NLS_MAX_CHARSET_SIZE);
> > +
> > +       return charlen > 0 ? charlen : 1;
> > +}
> > +
> > +/*
> >  * UniStrcat:  Concatenate the second string to the first
> >  *
> >  * Returns:
> > --
> > 1.6.0.6
> >
> > _______________________________________________
> > linux-cifs-client mailing list
> > linux-cifs-client@lists.samba.org
> > https://lists.samba.org/mailman/listinfo/linux-cifs-client
> >
> 
> For some of the charsets I looked at under fs/nls, it looks like uni2char
> always returns 1, I think to indicate the function succeeded as opposed
> to sending an error.
> Are there any charsets that you might have looked at whose
> uni2char function returns more than 1 byte as size of the null character?

No, I haven't see any, but I didn't do an exhaustive search. Given the
number of problems we've had in this area, I'm leery of making any
assumptions about these lengths. It's also possible that at some point
in the future we could have an in-kernel version of UTF-16 or UTF-32.
In the event of that we'll need to deal with multibyte null termination.

So I think it makes sense to use a helper function for determining this
rather than sprinkling "+1" to lengths all over the code. The overhead
looks pretty minimal anyway.
Shirish Pargaonkar April 29, 2009, 2:39 p.m. UTC | #3
On Wed, Apr 29, 2009 at 9:12 AM, Jeff Layton <jlayton@redhat.com> wrote:
> On Wed, 29 Apr 2009 08:58:22 -0500
> Shirish Pargaonkar <shirishpargaonkar@gmail.com> wrote:
>
>> On Wed, Apr 29, 2009 at 8:29 AM, Jeff Layton <jlayton@redhat.com> wrote:
>> > It's possible to have the null terminator for a charset be a single or
>> > multiple character. Add a function to tell us how long it should be.
>> >
>> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
>> > ---
>> >  fs/cifs/cifs_unicode.h |   19 +++++++++++++++++++
>> >  1 files changed, 19 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h
>> > index 14eb9a2..6bffab5 100644
>> > --- a/fs/cifs/cifs_unicode.h
>> > +++ b/fs/cifs/cifs_unicode.h
>> > @@ -64,6 +64,25 @@ int cifs_strtoUCS(__le16 *, const char *, int, const struct nls_table *);
>> >  #endif
>> >
>> >  /*
>> > + * null_charlen - return length of null character for codepage
>> > + * @codepage - codepage for which to return length of NULL terminator
>> > + *
>> > + * Since we can't guarantee that the null terminator will be a particular
>> > + * length, we have to check against the codepage. If there's a problem
>> > + * determining it, assume a single-byte NULL terminator.
>> > + */
>> > +static inline int
>> > +null_charlen(const struct nls_table *codepage)
>> > +{
>> > +       int charlen;
>> > +       char tmp[NLS_MAX_CHARSET_SIZE];
>> > +
>> > +       charlen = codepage->uni2char(0, tmp, NLS_MAX_CHARSET_SIZE);
>> > +
>> > +       return charlen > 0 ? charlen : 1;
>> > +}
>> > +
>> > +/*
>> >  * UniStrcat:  Concatenate the second string to the first
>> >  *
>> >  * Returns:
>> > --
>> > 1.6.0.6
>> >
>> > _______________________________________________
>> > linux-cifs-client mailing list
>> > linux-cifs-client@lists.samba.org
>> > https://lists.samba.org/mailman/listinfo/linux-cifs-client
>> >
>>
>> For some of the charsets I looked at under fs/nls, it looks like uni2char
>> always returns 1, I think to indicate the function succeeded as opposed
>> to sending an error.
>> Are there any charsets that you might have looked at whose
>> uni2char function returns more than 1 byte as size of the null character?
>
> No, I haven't see any, but I didn't do an exhaustive search. Given the
> number of problems we've had in this area, I'm leery of making any
> assumptions about these lengths. It's also possible that at some point
> in the future we could have an in-kernel version of UTF-16 or UTF-32.
> In the event of that we'll need to deal with multibyte null termination.
>
> So I think it makes sense to use a helper function for determining this
> rather than sprinkling "+1" to lengths all over the code. The overhead
> looks pretty minimal anyway.
>
> --
> Jeff Layton <jlayton@redhat.com>
>

Jeff, I do not think uni2char returns the lenght of a character is bytes.
But what you are saying as I understand is, in the future null_charlen may
call some other function than uni2char to get the size of a null character
to get an accurate value, so interface will remain same, just the functionility
(of null_charlen) may change.
Jeff Layton April 29, 2009, 2:56 p.m. UTC | #4
On Wed, 29 Apr 2009 09:39:58 -0500
Shirish Pargaonkar <shirishpargaonkar@gmail.com> wrote:

> On Wed, Apr 29, 2009 at 9:12 AM, Jeff Layton <jlayton@redhat.com> wrote:
> > On Wed, 29 Apr 2009 08:58:22 -0500
> > Shirish Pargaonkar <shirishpargaonkar@gmail.com> wrote:
> >
> >> On Wed, Apr 29, 2009 at 8:29 AM, Jeff Layton <jlayton@redhat.com> wrote:
> >> > It's possible to have the null terminator for a charset be a single or
> >> > multiple character. Add a function to tell us how long it should be.
> >> >
> >> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> >> > ---
> >> >  fs/cifs/cifs_unicode.h |   19 +++++++++++++++++++
> >> >  1 files changed, 19 insertions(+), 0 deletions(-)
> >> >
> >> > diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h
> >> > index 14eb9a2..6bffab5 100644
> >> > --- a/fs/cifs/cifs_unicode.h
> >> > +++ b/fs/cifs/cifs_unicode.h
> >> > @@ -64,6 +64,25 @@ int cifs_strtoUCS(__le16 *, const char *, int, const struct nls_table *);
> >> >  #endif
> >> >
> >> >  /*
> >> > + * null_charlen - return length of null character for codepage
> >> > + * @codepage - codepage for which to return length of NULL terminator
> >> > + *
> >> > + * Since we can't guarantee that the null terminator will be a particular
> >> > + * length, we have to check against the codepage. If there's a problem
> >> > + * determining it, assume a single-byte NULL terminator.
> >> > + */
> >> > +static inline int
> >> > +null_charlen(const struct nls_table *codepage)
> >> > +{
> >> > +       int charlen;
> >> > +       char tmp[NLS_MAX_CHARSET_SIZE];
> >> > +
> >> > +       charlen = codepage->uni2char(0, tmp, NLS_MAX_CHARSET_SIZE);
> >> > +
> >> > +       return charlen > 0 ? charlen : 1;
> >> > +}
> >> > +
> >> > +/*
> >> >  * UniStrcat:  Concatenate the second string to the first
> >> >  *
> >> >  * Returns:
> >> > --
> >> > 1.6.0.6
> >> >
> >> > _______________________________________________
> >> > linux-cifs-client mailing list
> >> > linux-cifs-client@lists.samba.org
> >> > https://lists.samba.org/mailman/listinfo/linux-cifs-client
> >> >
> >>
> >> For some of the charsets I looked at under fs/nls, it looks like uni2char
> >> always returns 1, I think to indicate the function succeeded as opposed
> >> to sending an error.
> >> Are there any charsets that you might have looked at whose
> >> uni2char function returns more than 1 byte as size of the null character?
> >
> > No, I haven't see any, but I didn't do an exhaustive search. Given the
> > number of problems we've had in this area, I'm leery of making any
> > assumptions about these lengths. It's also possible that at some point
> > in the future we could have an in-kernel version of UTF-16 or UTF-32.
> > In the event of that we'll need to deal with multibyte null termination.
> >
> > So I think it makes sense to use a helper function for determining this
> > rather than sprinkling "+1" to lengths all over the code. The overhead
> > looks pretty minimal anyway.
> >
> > --
> > Jeff Layton <jlayton@redhat.com>
> >
> 
> Jeff, I do not think uni2char returns the lenght of a character is bytes.

Yes, it does.

> But what you are saying as I understand is, in the future null_charlen may
> call some other function than uni2char to get the size of a null character
> to get an accurate value, so interface will remain same, just the functionility
> (of null_charlen) may change.

Correct. We could (in principle) make that just return 1 for now, but
then we'd have to worry about fixing that in the future if a multibyte
null terminator were ever needed.
Christoph Hellwig April 29, 2009, 3:02 p.m. UTC | #5
On Wed, Apr 29, 2009 at 09:29:11AM -0400, Jeff Layton wrote:
>  /*
> + * null_charlen - return length of null character for codepage
> + * @codepage - codepage for which to return length of NULL terminator
> + *
> + * Since we can't guarantee that the null terminator will be a particular
> + * length, we have to check against the codepage. If there's a problem
> + * determining it, assume a single-byte NULL terminator.
> + */
> +static inline int
> +null_charlen(const struct nls_table *codepage)
> +{
> +	int charlen;
> +	char tmp[NLS_MAX_CHARSET_SIZE];
> +
> +	charlen = codepage->uni2char(0, tmp, NLS_MAX_CHARSET_SIZE);
> +
> +	return charlen > 0 ? charlen : 1;
> +}

Add this to include/linux/nls.h instead?
Shirish Pargaonkar April 29, 2009, 3:03 p.m. UTC | #6
On Wed, Apr 29, 2009 at 9:56 AM, Jeff Layton <jlayton@redhat.com> wrote:
> On Wed, 29 Apr 2009 09:39:58 -0500
> Shirish Pargaonkar <shirishpargaonkar@gmail.com> wrote:
>
>> On Wed, Apr 29, 2009 at 9:12 AM, Jeff Layton <jlayton@redhat.com> wrote:
>> > On Wed, 29 Apr 2009 08:58:22 -0500
>> > Shirish Pargaonkar <shirishpargaonkar@gmail.com> wrote:
>> >
>> >> On Wed, Apr 29, 2009 at 8:29 AM, Jeff Layton <jlayton@redhat.com> wrote:
>> >> > It's possible to have the null terminator for a charset be a single or
>> >> > multiple character. Add a function to tell us how long it should be.
>> >> >
>> >> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
>> >> > ---
>> >> >  fs/cifs/cifs_unicode.h |   19 +++++++++++++++++++
>> >> >  1 files changed, 19 insertions(+), 0 deletions(-)
>> >> >
>> >> > diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h
>> >> > index 14eb9a2..6bffab5 100644
>> >> > --- a/fs/cifs/cifs_unicode.h
>> >> > +++ b/fs/cifs/cifs_unicode.h
>> >> > @@ -64,6 +64,25 @@ int cifs_strtoUCS(__le16 *, const char *, int, const struct nls_table *);
>> >> >  #endif
>> >> >
>> >> >  /*
>> >> > + * null_charlen - return length of null character for codepage
>> >> > + * @codepage - codepage for which to return length of NULL terminator
>> >> > + *
>> >> > + * Since we can't guarantee that the null terminator will be a particular
>> >> > + * length, we have to check against the codepage. If there's a problem
>> >> > + * determining it, assume a single-byte NULL terminator.
>> >> > + */
>> >> > +static inline int
>> >> > +null_charlen(const struct nls_table *codepage)
>> >> > +{
>> >> > +       int charlen;
>> >> > +       char tmp[NLS_MAX_CHARSET_SIZE];
>> >> > +
>> >> > +       charlen = codepage->uni2char(0, tmp, NLS_MAX_CHARSET_SIZE);
>> >> > +
>> >> > +       return charlen > 0 ? charlen : 1;
>> >> > +}
>> >> > +
>> >> > +/*
>> >> >  * UniStrcat:  Concatenate the second string to the first
>> >> >  *
>> >> >  * Returns:
>> >> > --
>> >> > 1.6.0.6
>> >> >
>> >> > _______________________________________________
>> >> > linux-cifs-client mailing list
>> >> > linux-cifs-client@lists.samba.org
>> >> > https://lists.samba.org/mailman/listinfo/linux-cifs-client
>> >> >
>> >>
>> >> For some of the charsets I looked at under fs/nls, it looks like uni2char
>> >> always returns 1, I think to indicate the function succeeded as opposed
>> >> to sending an error.
>> >> Are there any charsets that you might have looked at whose
>> >> uni2char function returns more than 1 byte as size of the null character?
>> >
>> > No, I haven't see any, but I didn't do an exhaustive search. Given the
>> > number of problems we've had in this area, I'm leery of making any
>> > assumptions about these lengths. It's also possible that at some point
>> > in the future we could have an in-kernel version of UTF-16 or UTF-32.
>> > In the event of that we'll need to deal with multibyte null termination.
>> >
>> > So I think it makes sense to use a helper function for determining this
>> > rather than sprinkling "+1" to lengths all over the code. The overhead
>> > looks pretty minimal anyway.
>> >
>> > --
>> > Jeff Layton <jlayton@redhat.com>
>> >
>>
>> Jeff, I do not think uni2char returns the lenght of a character is bytes.
>
> Yes, it does.

Alright but the reason I had doubt was, even char2uni returns 1, now that is
not correct when a code point from a charset is mapped/encoded to a mutlibyte
code value using UTF-16.

>
>> But what you are saying as I understand is, in the future null_charlen may
>> call some other function than uni2char to get the size of a null character
>> to get an accurate value, so interface will remain same, just the functionility
>> (of null_charlen) may change.
>
> Correct. We could (in principle) make that just return 1 for now, but
> then we'd have to worry about fixing that in the future if a multibyte
> null terminator were ever needed.
>
> --
> Jeff Layton <jlayton@redhat.com>
>
Jeff Layton April 29, 2009, 3:17 p.m. UTC | #7
On Wed, 29 Apr 2009 11:02:37 -0400
Christoph Hellwig <hch@infradead.org> wrote:

> On Wed, Apr 29, 2009 at 09:29:11AM -0400, Jeff Layton wrote:
> >  /*
> > + * null_charlen - return length of null character for codepage
> > + * @codepage - codepage for which to return length of NULL terminator
> > + *
> > + * Since we can't guarantee that the null terminator will be a particular
> > + * length, we have to check against the codepage. If there's a problem
> > + * determining it, assume a single-byte NULL terminator.
> > + */
> > +static inline int
> > +null_charlen(const struct nls_table *codepage)
> > +{
> > +	int charlen;
> > +	char tmp[NLS_MAX_CHARSET_SIZE];
> > +
> > +	charlen = codepage->uni2char(0, tmp, NLS_MAX_CHARSET_SIZE);
> > +
> > +	return charlen > 0 ? charlen : 1;
> > +}
> 
> Add this to include/linux/nls.h instead?
> 

Makes sense. What I may do is move it there and then hardwire this
function to just return '1' for now. It looks like all of the existing
charsets that are in the kernel use single-byte null termination.

At a later point in time, if multibyte null terminators are needed, we
can change this to do something different and the callers should "just
work".
diff mbox

Patch

diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h
index 14eb9a2..6bffab5 100644
--- a/fs/cifs/cifs_unicode.h
+++ b/fs/cifs/cifs_unicode.h
@@ -64,6 +64,25 @@  int cifs_strtoUCS(__le16 *, const char *, int, const struct nls_table *);
 #endif
 
 /*
+ * null_charlen - return length of null character for codepage
+ * @codepage - codepage for which to return length of NULL terminator
+ *
+ * Since we can't guarantee that the null terminator will be a particular
+ * length, we have to check against the codepage. If there's a problem
+ * determining it, assume a single-byte NULL terminator.
+ */
+static inline int
+null_charlen(const struct nls_table *codepage)
+{
+	int charlen;
+	char tmp[NLS_MAX_CHARSET_SIZE];
+
+	charlen = codepage->uni2char(0, tmp, NLS_MAX_CHARSET_SIZE);
+
+	return charlen > 0 ? charlen : 1;
+}
+
+/*
  * UniStrcat:  Concatenate the second string to the first
  *
  * Returns: