diff mbox series

Kernel memory corruption in CIPSO labeled TCP packets processing.

Message ID 16659801547571984@sas1-890ba5c2334a.qloud-c.yandex.net (mailing list archive)
State New, archived
Headers show
Series Kernel memory corruption in CIPSO labeled TCP packets processing. | expand

Commit Message

Sergey Nazarov Jan. 15, 2019, 5:06 p.m. UTC
Hello!
Security modules (selinux, smack) use icmp_send for discarded incorrectly labeled network packets.
This could be on TCP level too (security_sock_rcv_skb -> cipso_v4_error for INET stream connection, for example).
icmp_send calls ip_option_echo, which uses IPCB to take compiled IP options.
After moving IP header data to the end of the struct tcp_skb_cb (since 3.18 kernel), this could lead to
kernel memory corruption when IP options copying.
This patch fix a bug, but I'm not sure, that this is a best solution. Perhaps someone more familiar with the
linux TCP/IP stack will offer a better one.
Thanks.

Comments

Casey Schaufler Jan. 15, 2019, 5:55 p.m. UTC | #1
On 1/15/2019 9:06 AM, Nazarov Sergey wrote:
> Hello!
> Security modules (selinux, smack) use icmp_send for discarded incorrectly labeled network packets.
> This could be on TCP level too (security_sock_rcv_skb -> cipso_v4_error for INET stream connection, for example).
> icmp_send calls ip_option_echo, which uses IPCB to take compiled IP options.
> After moving IP header data to the end of the struct tcp_skb_cb (since 3.18 kernel), this could lead to
> kernel memory corruption when IP options copying.

Can you explain how that corruption might occur?
Do you have a test case?

> This patch fix a bug, but I'm not sure, that this is a best solution. Perhaps someone more familiar with the
> linux TCP/IP stack will offer a better one.
> Thanks.
>
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -679,7 +679,8 @@ void icmp_send(struct sk_buff *skb_in, i
>  					  iph->tos;
>  	mark = IP4_REPLY_MARK(net, skb_in->mark);
>  
> -	if (ip_options_echo(&icmp_param->replyopts.opt.opt, skb_in))
> +	if (__ip_options_echo(&icmp_param->replyopts.opt.opt, skb_in,
> +			ip_hdr(skb_in)->protocol == IPPROTO_TCP ? &TCP_SKB_CB(skb_in)->header.h4.opt : &IPCB(skb_in)->opt))
>  		goto out_unlock;
>
>
Paul Moore Jan. 15, 2019, 7:52 p.m. UTC | #2
On Tue, Jan 15, 2019 at 12:55 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> On 1/15/2019 9:06 AM, Nazarov Sergey wrote:
> > Hello!
> > Security modules (selinux, smack) use icmp_send for discarded incorrectly labeled network packets.
> > This could be on TCP level too (security_sock_rcv_skb -> cipso_v4_error for INET stream connection, for example).
> > icmp_send calls ip_option_echo, which uses IPCB to take compiled IP options.
> > After moving IP header data to the end of the struct tcp_skb_cb (since 3.18 kernel), this could lead to
> > kernel memory corruption when IP options copying.
>
> Can you explain how that corruption might occur?
> Do you have a test case?

Thanks for pointing this out Nazarov.

Presumably we are going to hit a problem whenever icmp_send is called
from outside the IP layer in the stack.  We fixed a similar problem a
few years back with 04f81f0154e4 ("cipso: don't use IPCB() to locate
the CIPSO IP option").

I've CC'd netdev, as I'm guessing they will have some thoughts on
this, but my initial reaction is that your proposed patch isn't as
generic as it should be for code that lives in icmp_send().  I suspect
the safe thing to do would be to call ip_options_compile() again on
skb_in and build a local copy of the ip_options struct that could then
be used in the call to __ip_options_echo(); the code could either live
in icmp_send() or some new ip_options_echo() variant
(ip_options_echo_safe()?  I dunno).  Unfortunately, calling
ip_options_compile() is going to add some overhead, and may be a
non-starter for the netdev folks, but this is error path code, so it
might be acceptable.  Hopefully the netdev folks will have some
better, more clever suggestions.

> > This patch fix a bug, but I'm not sure, that this is a best solution. Perhaps someone more familiar with the
> > linux TCP/IP stack will offer a better one.
> > Thanks.
> >
> > --- a/net/ipv4/icmp.c
> > +++ b/net/ipv4/icmp.c
> > @@ -679,7 +679,8 @@ void icmp_send(struct sk_buff *skb_in, i
> >                                         iph->tos;
> >       mark = IP4_REPLY_MARK(net, skb_in->mark);
> >
> > -     if (ip_options_echo(&icmp_param->replyopts.opt.opt, skb_in))
> > +     if (__ip_options_echo(&icmp_param->replyopts.opt.opt, skb_in,
> > +                     ip_hdr(skb_in)->protocol == IPPROTO_TCP ? &TCP_SKB_CB(skb_in)->header.h4.opt : &IPCB(skb_in)->opt))
> >               goto out_unlock;
> >
> >
Paul Moore Jan. 18, 2019, 2:53 p.m. UTC | #3
On Tue, Jan 15, 2019 at 2:52 PM Paul Moore <paul@paul-moore.com> wrote:
> On Tue, Jan 15, 2019 at 12:55 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> > On 1/15/2019 9:06 AM, Nazarov Sergey wrote:
> > > Hello!
> > > Security modules (selinux, smack) use icmp_send for discarded incorrectly labeled network packets.
> > > This could be on TCP level too (security_sock_rcv_skb -> cipso_v4_error for INET stream connection, for example).
> > > icmp_send calls ip_option_echo, which uses IPCB to take compiled IP options.
> > > After moving IP header data to the end of the struct tcp_skb_cb (since 3.18 kernel), this could lead to
> > > kernel memory corruption when IP options copying.
> >
> > Can you explain how that corruption might occur?
> > Do you have a test case?
>
> Thanks for pointing this out Nazarov.
>
> Presumably we are going to hit a problem whenever icmp_send is called
> from outside the IP layer in the stack.  We fixed a similar problem a
> few years back with 04f81f0154e4 ("cipso: don't use IPCB() to locate
> the CIPSO IP option").
>
> I've CC'd netdev, as I'm guessing they will have some thoughts on
> this, but my initial reaction is that your proposed patch isn't as
> generic as it should be for code that lives in icmp_send().  I suspect
> the safe thing to do would be to call ip_options_compile() again on
> skb_in and build a local copy of the ip_options struct that could then
> be used in the call to __ip_options_echo(); the code could either live
> in icmp_send() or some new ip_options_echo() variant
> (ip_options_echo_safe()?  I dunno).  Unfortunately, calling
> ip_options_compile() is going to add some overhead, and may be a
> non-starter for the netdev folks, but this is error path code, so it
> might be acceptable.  Hopefully the netdev folks will have some
> better, more clever suggestions.

It's been a few days now with no comments from the netdev folks, so I
think it's probably best to start putting together a RFC patch for
review/comment.  Nazarov, would you like to do that?  If not, that's
okay, just let me know.

I'm still concerned about calling ip_options_compile() in icmp_send()
and I'm thinking we might be better off to add a new ip_options
parameter to icmp_send(); if the parameter is NULL we behave as we do
today, but if it is non-NULL we use the given ip_options parameter in
place of calling ip_options_echo().  With that change in place, we
would need to update cipso_v4_error() to do the extra work of calling
ip_options_compile() and __ip_options_echo().  There looks to be maybe
a dozen (or two?) existing icmp_send() callers, but it should be
pretty trivial to update them to pass NULL for the new parameter.

What do you think?

> > > This patch fix a bug, but I'm not sure, that this is a best solution. Perhaps someone more familiar with the
> > > linux TCP/IP stack will offer a better one.
> > > Thanks.
> > >
> > > --- a/net/ipv4/icmp.c
> > > +++ b/net/ipv4/icmp.c
> > > @@ -679,7 +679,8 @@ void icmp_send(struct sk_buff *skb_in, i
> > >                                         iph->tos;
> > >       mark = IP4_REPLY_MARK(net, skb_in->mark);
> > >
> > > -     if (ip_options_echo(&icmp_param->replyopts.opt.opt, skb_in))
> > > +     if (__ip_options_echo(&icmp_param->replyopts.opt.opt, skb_in,
> > > +                     ip_hdr(skb_in)->protocol == IPPROTO_TCP ? &TCP_SKB_CB(skb_in)->header.h4.opt : &IPCB(skb_in)->opt))
> > >               goto out_unlock;
> > >
> > >
Sergey Nazarov Jan. 18, 2019, 4:34 p.m. UTC | #4
Hi, Paul!
I don't like this. As you said, fix any calls to icmp_send is a trivial.
But in cipso_v4_optptr, we repeating the work already done, and in cipso_v4_error
we will need to do even more again. Any code, working with IP header data on several
levels of TCP/IP stack need to do this again and again. Is looks too overloaded!
In my opinion, this is a problem of TCP/IP stack design, comes from standing
compiled IP header data in different places at different stack layers.
May be better to add some flag or pointer to struct sk_buff?
But, I think, we need netdev developers advice for this.
Regards, Sergey.

18.01.2019, 17:53, "Paul Moore" <paul@paul-moore.com>:
> On Tue, Jan 15, 2019 at 2:52 PM Paul Moore <paul@paul-moore.com> wrote:
>
> It's been a few days now with no comments from the netdev folks, so I
> think it's probably best to start putting together a RFC patch for
> review/comment. Nazarov, would you like to do that? If not, that's
> okay, just let me know.
>
> I'm still concerned about calling ip_options_compile() in icmp_send()
> and I'm thinking we might be better off to add a new ip_options
> parameter to icmp_send(); if the parameter is NULL we behave as we do
> today, but if it is non-NULL we use the given ip_options parameter in
> place of calling ip_options_echo(). With that change in place, we
> would need to update cipso_v4_error() to do the extra work of calling
> ip_options_compile() and __ip_options_echo(). There looks to be maybe
> a dozen (or two?) existing icmp_send() callers, but it should be
> pretty trivial to update them to pass NULL for the new parameter.
>
> What do you think?
>
>
> --
> paul moore
> www.paul-moore.com
Paul Moore Jan. 18, 2019, 5:17 p.m. UTC | #5
On Fri, Jan 18, 2019 at 11:34 AM Nazarov Sergey <s-nazarov@yandex.ru> wrote:
> Hi, Paul!
> I don't like this. As you said, fix any calls to icmp_send is a trivial.
> But in cipso_v4_optptr, we repeating the work already done, and in cipso_v4_error
> we will need to do even more again.

Yes, this is extra overhead, but it is in an error path so I'm not
overly concerned about the performance impact on the given connection.

I will admit that it isn't an ideal solution from a LSM/NetLabel
perspective, but historically the netdev folks have not been overly
receptive to changes which negatively impact the core network stack
regardless of how important it may be for the LSMs.  If we could
modify icmp_send() so that it works regardless of the caller's
location in the stack, that would be great, I'm just not sure it would
be deemed acceptable.  I suggested the approach I did because I think
that has the best chance of getting merged.

Regardless, I would encourage you to put together a patch and post it
for review, I think that's the best way to get a response.  If you
aren't able, or aren't comfortable, submitting a patch please let me
know.

> Any code, working with IP header data on several
> levels of TCP/IP stack need to do this again and again. Is looks too overloaded!
> In my opinion, this is a problem of TCP/IP stack design, comes from standing
> compiled IP header data in different places at different stack layers.
> May be better to add some flag or pointer to struct sk_buff?
> But, I think, we need netdev developers advice for this.
> Regards, Sergey.
>
> 18.01.2019, 17:53, "Paul Moore" <paul@paul-moore.com>:
> > On Tue, Jan 15, 2019 at 2:52 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > It's been a few days now with no comments from the netdev folks, so I
> > think it's probably best to start putting together a RFC patch for
> > review/comment. Nazarov, would you like to do that? If not, that's
> > okay, just let me know.
> >
> > I'm still concerned about calling ip_options_compile() in icmp_send()
> > and I'm thinking we might be better off to add a new ip_options
> > parameter to icmp_send(); if the parameter is NULL we behave as we do
> > today, but if it is non-NULL we use the given ip_options parameter in
> > place of calling ip_options_echo(). With that change in place, we
> > would need to update cipso_v4_error() to do the extra work of calling
> > ip_options_compile() and __ip_options_echo(). There looks to be maybe
> > a dozen (or two?) existing icmp_send() callers, but it should be
> > pretty trivial to update them to pass NULL for the new parameter.
> >
> > What do you think?
Sergey Nazarov Jan. 21, 2019, 5:11 p.m. UTC | #6
18.01.2019, 20:17, "Paul Moore" <paul@paul-moore.com>:
> Yes, this is extra overhead, but it is in an error path so I'm not
> overly concerned about the performance impact on the given connection.
>
> I will admit that it isn't an ideal solution from a LSM/NetLabel
> perspective, but historically the netdev folks have not been overly
> receptive to changes which negatively impact the core network stack
> regardless of how important it may be for the LSMs. If we could
> modify icmp_send() so that it works regardless of the caller's
> location in the stack, that would be great, I'm just not sure it would
> be deemed acceptable. I suggested the approach I did because I think
> that has the best chance of getting merged.
>
> Regardless, I would encourage you to put together a patch and post it
> for review, I think that's the best way to get a response. If you
> aren't able, or aren't comfortable, submitting a patch please let me
> know.
> --
> paul moore
> www.paul-moore.com

May be something like that?

[PATCH 1/2] Add IP options parameter to icmp_send
---
 include/net/icmp.h    |    9 ++++++++-
 net/ipv4/icmp.c       |    7 ++++---
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/net/icmp.h b/include/net/icmp.h
index 6ac3a5b..e0f709d 100644
--- a/include/net/icmp.h
+++ b/include/net/icmp.h
@@ -22,6 +22,7 @@
 
 #include <net/inet_sock.h>
 #include <net/snmp.h>
+#include <net/ip.h>
 
 struct icmp_err {
   int		errno;
@@ -39,7 +40,13 @@ struct icmp_err {
 struct sk_buff;
 struct net;
 
-void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info);
+void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
+		 const struct ip_options *opt);
+static inline void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
+{
+	__icmp_send(skb_in, type, code, info, &IPCB(skb_in)->opt);
+}
+
 int icmp_rcv(struct sk_buff *skb);
 int icmp_err(struct sk_buff *skb, u32 info);
 int icmp_init(void);
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 065997f..3f24414 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -570,7 +570,8 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
  *			MUST reply to only the first fragment.
  */
 
-void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
+void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
+		 const struct ip_options *opt)
 {
 	struct iphdr *iph;
 	int room;
@@ -691,7 +692,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 					  iph->tos;
 	mark = IP4_REPLY_MARK(net, skb_in->mark);
 
-	if (ip_options_echo(net, &icmp_param.replyopts.opt.opt, skb_in))
+	if (__ip_options_echo(net, &icmp_param.replyopts.opt.opt, skb_in, opt))
 		goto out_unlock;
 
 
@@ -742,7 +743,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 	local_bh_enable();
 out:;
 }
-EXPORT_SYMBOL(icmp_send);
+EXPORT_SYMBOL(__icmp_send);
 
 
 static void icmp_socket_deliver(struct sk_buff *skb, u32 info)

---
[PATCH 2/2] Extract IP options in cipso_v4_error
---
 include/net/ip.h      |    2 ++
 net/ipv4/cipso_ipv4.c |   11 +++++++++--
 net/ipv4/ip_options.c |   22 +++++++++++++++++-----
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 8866bfc..f0e8d06 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -667,6 +667,8 @@ static inline int ip_options_echo(struct net *net, struct ip_options *dopt,
 }
 
 void ip_options_fragment(struct sk_buff *skb);
+int __ip_options_compile(struct net *net, struct ip_options *opt,
+			 struct sk_buff *skb, __be32 *info);
 int ip_options_compile(struct net *net, struct ip_options *opt,
 		       struct sk_buff *skb);
 int ip_options_get(struct net *net, struct ip_options_rcu **optp,
diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index 777fa3b..86599e9 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -1735,13 +1735,20 @@ int cipso_v4_validate(const struct sk_buff *skb, unsigned char **option)
  */
 void cipso_v4_error(struct sk_buff *skb, int error, u32 gateway)
 {
+	unsigned char optbuf[sizeof(struct ip_options) + 40];
+	struct ip_options *opt = (struct ip_options *)optbuf;
+
 	if (ip_hdr(skb)->protocol == IPPROTO_ICMP || error != -EACCES)
 		return;
 
+	opt->optlen = ip_hdr(skb)->ihl*4 - sizeof(struct iphdr);
+	if (__ip_options_compile(dev_net(skb->dev), opt, skb, NULL))
+		return;
+
 	if (gateway)
-		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0);
+		__icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0, opt);
 	else
-		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0);
+		__icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0, opt);
 }
 
 /**
diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index ed194d4..32a3504 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -251,8 +251,9 @@ static void spec_dst_fill(__be32 *spec_dst, struct sk_buff *skb)
  * If opt == NULL, then skb->data should point to IP header.
  */
 
-int ip_options_compile(struct net *net,
-		       struct ip_options *opt, struct sk_buff *skb)
+int __ip_options_compile(struct net *net,
+			 struct ip_options *opt, struct sk_buff *skb,
+			 __be32 *info)
 {
 	__be32 spec_dst = htonl(INADDR_ANY);
 	unsigned char *pp_ptr = NULL;
@@ -468,11 +469,22 @@ int ip_options_compile(struct net *net,
 		return 0;
 
 error:
-	if (skb) {
-		icmp_send(skb, ICMP_PARAMETERPROB, 0, htonl((pp_ptr-iph)<<24));
-	}
+	if (info)
+		*info = htonl((pp_ptr-iph)<<24);
 	return -EINVAL;
 }
+
+int ip_options_compile(struct net *net,
+		       struct ip_options *opt, struct sk_buff *skb)
+{
+	int ret;
+	__be32 info;
+
+	ret = __ip_options_compile(net, opt, skb, &info);
+	if (ret != 0 && skb)
+		icmp_send(skb, ICMP_PARAMETERPROB, 0, info);
+	return ret;
+}
 EXPORT_SYMBOL(ip_options_compile);
 
 /*

---
Paul Moore Jan. 22, 2019, 4:49 p.m. UTC | #7
On Mon, Jan 21, 2019 at 12:11 PM Nazarov Sergey <s-nazarov@yandex.ru> wrote:
> 18.01.2019, 20:17, "Paul Moore" <paul@paul-moore.com>:
> > Yes, this is extra overhead, but it is in an error path so I'm not
> > overly concerned about the performance impact on the given connection.
> >
> > I will admit that it isn't an ideal solution from a LSM/NetLabel
> > perspective, but historically the netdev folks have not been overly
> > receptive to changes which negatively impact the core network stack
> > regardless of how important it may be for the LSMs. If we could
> > modify icmp_send() so that it works regardless of the caller's
> > location in the stack, that would be great, I'm just not sure it would
> > be deemed acceptable. I suggested the approach I did because I think
> > that has the best chance of getting merged.
> >
> > Regardless, I would encourage you to put together a patch and post it
> > for review, I think that's the best way to get a response. If you
> > aren't able, or aren't comfortable, submitting a patch please let me
> > know.
> > --
> > paul moore
> > www.paul-moore.com
>
> May be something like that?
>
> [PATCH 1/2] Add IP options parameter to icmp_send
> ---
>  include/net/icmp.h    |    9 ++++++++-
>  net/ipv4/icmp.c       |    7 ++++---
>  2 files changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/include/net/icmp.h b/include/net/icmp.h
> index 6ac3a5b..e0f709d 100644
> --- a/include/net/icmp.h
> +++ b/include/net/icmp.h
> @@ -22,6 +22,7 @@
>
>  #include <net/inet_sock.h>
>  #include <net/snmp.h>
> +#include <net/ip.h>
>
>  struct icmp_err {
>    int          errno;
> @@ -39,7 +40,13 @@ struct icmp_err {
>  struct sk_buff;
>  struct net;
>
> -void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info);
> +void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
> +                const struct ip_options *opt);
> +static inline void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
> +{
> +       __icmp_send(skb_in, type, code, info, &IPCB(skb_in)->opt);
> +}
> +
>  int icmp_rcv(struct sk_buff *skb);
>  int icmp_err(struct sk_buff *skb, u32 info);
>  int icmp_init(void);
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 065997f..3f24414 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -570,7 +570,8 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
>   *                     MUST reply to only the first fragment.
>   */
>
> -void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
> +void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
> +                const struct ip_options *opt)
>  {
>         struct iphdr *iph;
>         int room;
> @@ -691,7 +692,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>                                           iph->tos;
>         mark = IP4_REPLY_MARK(net, skb_in->mark);
>
> -       if (ip_options_echo(net, &icmp_param.replyopts.opt.opt, skb_in))
> +       if (__ip_options_echo(net, &icmp_param.replyopts.opt.opt, skb_in, opt))
>                 goto out_unlock;
>
>
> @@ -742,7 +743,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>         local_bh_enable();
>  out:;
>  }
> -EXPORT_SYMBOL(icmp_send);
> +EXPORT_SYMBOL(__icmp_send);

This looks reasonable, and I think this should be acceptable to the
netdev folks.  Although a minor heads-up that somebody might object to
icmp_send() no longer being exported.

More below...

> ---
> [PATCH 2/2] Extract IP options in cipso_v4_error
> ---
>  include/net/ip.h      |    2 ++
>  net/ipv4/cipso_ipv4.c |   11 +++++++++--
>  net/ipv4/ip_options.c |   22 +++++++++++++++++-----
>  3 files changed, 28 insertions(+), 7 deletions(-)
>
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 8866bfc..f0e8d06 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -667,6 +667,8 @@ static inline int ip_options_echo(struct net *net, struct ip_options *dopt,
>  }
>
>  void ip_options_fragment(struct sk_buff *skb);
> +int __ip_options_compile(struct net *net, struct ip_options *opt,
> +                        struct sk_buff *skb, __be32 *info);
>  int ip_options_compile(struct net *net, struct ip_options *opt,
>                        struct sk_buff *skb);
>  int ip_options_get(struct net *net, struct ip_options_rcu **optp,
> diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
> index 777fa3b..86599e9 100644
> --- a/net/ipv4/cipso_ipv4.c
> +++ b/net/ipv4/cipso_ipv4.c
> @@ -1735,13 +1735,20 @@ int cipso_v4_validate(const struct sk_buff *skb, unsigned char **option)
>   */
>  void cipso_v4_error(struct sk_buff *skb, int error, u32 gateway)
>  {
> +       unsigned char optbuf[sizeof(struct ip_options) + 40];
> +       struct ip_options *opt = (struct ip_options *)optbuf;
> +
>         if (ip_hdr(skb)->protocol == IPPROTO_ICMP || error != -EACCES)
>                 return;
>
> +       opt->optlen = ip_hdr(skb)->ihl*4 - sizeof(struct iphdr);
> +       if (__ip_options_compile(dev_net(skb->dev), opt, skb, NULL))
> +               return;

I would suggest adding a comment to the code above explaining that
this is necessary because we might be called above the IP layer and we
can't safely use IPCB() here.

>         if (gateway)
> -               icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0);
> +               __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0, opt);
>         else
> -               icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0);
> +               __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0, opt);
>  }
>
>  /**
> diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
> index ed194d4..32a3504 100644
> --- a/net/ipv4/ip_options.c
> +++ b/net/ipv4/ip_options.c
> @@ -251,8 +251,9 @@ static void spec_dst_fill(__be32 *spec_dst, struct sk_buff *skb)
>   * If opt == NULL, then skb->data should point to IP header.
>   */
>
> -int ip_options_compile(struct net *net,
> -                      struct ip_options *opt, struct sk_buff *skb)
> +int __ip_options_compile(struct net *net,
> +                        struct ip_options *opt, struct sk_buff *skb,
> +                        __be32 *info)
>  {
>         __be32 spec_dst = htonl(INADDR_ANY);
>         unsigned char *pp_ptr = NULL;
> @@ -468,11 +469,22 @@ int ip_options_compile(struct net *net,
>                 return 0;
>
>  error:
> -       if (skb) {
> -               icmp_send(skb, ICMP_PARAMETERPROB, 0, htonl((pp_ptr-iph)<<24));
> -       }
> +       if (info)
> +               *info = htonl((pp_ptr-iph)<<24);
>         return -EINVAL;
>  }
> +
> +int ip_options_compile(struct net *net,
> +                      struct ip_options *opt, struct sk_buff *skb)
> +{
> +       int ret;
> +       __be32 info;
> +
> +       ret = __ip_options_compile(net, opt, skb, &info);
> +       if (ret != 0 && skb)
> +               icmp_send(skb, ICMP_PARAMETERPROB, 0, info);
> +       return ret;
> +}
>  EXPORT_SYMBOL(ip_options_compile);

Granted I'm looking at this rather quickly, so I may be missing
something, but why the changes to ip_options_compile()?  Couldn't you
simply set opt->data manually (set the ptr) in cipso_v4_error() before
calling ip_options_compile() and arrive at the same result without
having to modify ip_options_compile()?  I suppose there is the rtable
value to worry about, but ip_options_echo() should take care of that,
yes?  No?
Sergey Nazarov Jan. 22, 2019, 5:35 p.m. UTC | #8
22.01.2019, 19:49, "Paul Moore" <paul@paul-moore.com>:
>
> Granted I'm looking at this rather quickly, so I may be missing
> something, but why the changes to ip_options_compile()? Couldn't you
> simply set opt->data manually (set the ptr) in cipso_v4_error() before
> calling ip_options_compile() and arrive at the same result without
> having to modify ip_options_compile()? I suppose there is the rtable
> value to worry about, but ip_options_echo() should take care of that,
> yes? No?

ip_options_compile calls icmp_send, if someting wrong. So, we'll go back
to trying to fix. ip_options_compile changes needed to avoid this.
Paul Moore Jan. 22, 2019, 5:48 p.m. UTC | #9
On Tue, Jan 22, 2019 at 12:35 PM Nazarov Sergey <s-nazarov@yandex.ru> wrote:
> 22.01.2019, 19:49, "Paul Moore" <paul@paul-moore.com>:
> >
> > Granted I'm looking at this rather quickly, so I may be missing
> > something, but why the changes to ip_options_compile()? Couldn't you
> > simply set opt->data manually (set the ptr) in cipso_v4_error() before
> > calling ip_options_compile() and arrive at the same result without
> > having to modify ip_options_compile()? I suppose there is the rtable
> > value to worry about, but ip_options_echo() should take care of that,
> > yes? No?
>
> ip_options_compile calls icmp_send, if someting wrong. So, we'll go back
> to trying to fix. ip_options_compile changes needed to avoid this.

Yes, exactly.  If you don't pass the skb it shouldn't attempt to call
icmp_send() in case of error.
Sergey Nazarov Jan. 24, 2019, 2:46 p.m. UTC | #10
22.01.2019, 20:48, "Paul Moore" <paul@paul-moore.com>:
>
> Yes, exactly. If you don't pass the skb it shouldn't attempt to call
> icmp_send() in case of error.
>
> --
> paul moore
> www.paul-moore.com

You are right, sorry. We can do that without ip_options_compile modification.
Simplified patch 2:
---
 net/ipv4/cipso_ipv4.c |   18 ++++++++++++++++--
 1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index 777fa3b..a4ed0a4 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -1735,13 +1735,27 @@ int cipso_v4_validate(const struct sk_buff *skb, unsigned char **option)
  */
 void cipso_v4_error(struct sk_buff *skb, int error, u32 gateway)
 {
+       unsigned char optbuf[sizeof(struct ip_options) + 40];
+       struct ip_options *opt = (struct ip_options *)optbuf;
+
        if (ip_hdr(skb)->protocol == IPPROTO_ICMP || error != -EACCES)
                return;
 
+       /* 
+        * We might be called above the IP layer,
+        * so we can not use icmp_send and IPCB here.
+        */
+
+       memset(opt, 0, sizeof(struct ip_options));
+       opt->optlen = ip_hdr(skb)->ihl*4 - sizeof(struct iphdr);
+       memcpy(opt->__data, (unsigned char *)&(ip_hdr(skb)[1]), opt->optlen);
+       if (ip_options_compile(dev_net(skb->dev), opt, NULL))
+               return;
+
        if (gateway)
-               icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0);
+               __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0, opt);
        else
-               icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0);
+               __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0, opt);
 }
 
 /**
Paul Moore Jan. 25, 2019, 4:45 p.m. UTC | #11
On Thu, Jan 24, 2019 at 9:46 AM Nazarov Sergey <s-nazarov@yandex.ru> wrote:
> 22.01.2019, 20:48, "Paul Moore" <paul@paul-moore.com>:
> >
> > Yes, exactly. If you don't pass the skb it shouldn't attempt to call
> > icmp_send() in case of error.
> >
> > --
> > paul moore
> > www.paul-moore.com
>
> You are right, sorry. We can do that without ip_options_compile modification.
> Simplified patch 2:
> ---
>  net/ipv4/cipso_ipv4.c |   18 ++++++++++++++++--
>  1 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
> index 777fa3b..a4ed0a4 100644
> --- a/net/ipv4/cipso_ipv4.c
> +++ b/net/ipv4/cipso_ipv4.c
> @@ -1735,13 +1735,27 @@ int cipso_v4_validate(const struct sk_buff *skb, unsigned char **option)
>   */
>  void cipso_v4_error(struct sk_buff *skb, int error, u32 gateway)
>  {
> +       unsigned char optbuf[sizeof(struct ip_options) + 40];
> +       struct ip_options *opt = (struct ip_options *)optbuf;
> +
>         if (ip_hdr(skb)->protocol == IPPROTO_ICMP || error != -EACCES)
>                 return;
>
> +       /*
> +        * We might be called above the IP layer,
> +        * so we can not use icmp_send and IPCB here.
> +        */
> +
> +       memset(opt, 0, sizeof(struct ip_options));
> +       opt->optlen = ip_hdr(skb)->ihl*4 - sizeof(struct iphdr);

Hmm, I think the above calculation should take into account the actual
length of the IP options, and not just the max size (calculate it
based on iphdr->ihl).

Beyond that fix, I think it's time to put together a proper patchset
and post it to the lists for formal review/merging.

Thanks for your work on this.

> +       memcpy(opt->__data, (unsigned char *)&(ip_hdr(skb)[1]), opt->optlen);
> +       if (ip_options_compile(dev_net(skb->dev), opt, NULL))
> +               return;
> +
>         if (gateway)
> -               icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0);
> +               __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0, opt);
>         else
> -               icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0);
> +               __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0, opt);
>  }
Sergey Nazarov Jan. 28, 2019, 1:10 p.m. UTC | #12
25.01.2019, 19:46, "Paul Moore" <paul@paul-moore.com>:
> Hmm, I think the above calculation should take into account the actual
> length of the IP options, and not just the max size (calculate it
> based on iphdr->ihl).
>
> Beyond that fix, I think it's time to put together a proper patchset
> and post it to the lists for formal review/merging.
>
> Thanks for your work on this.
>
> --
> paul moore
> www.paul-moore.com

Where we can take actual IP options length? Sorry, I'm not so familiar with linux network stack.
And also, ip_options_compile could change IP options data (SSRR, LSRR, RR, TIMESTAMP options),
so, we can't use ip_options_compile again for these options. Am I right?
Paul Moore Jan. 28, 2019, 10:18 p.m. UTC | #13
On Mon, Jan 28, 2019 at 8:10 AM Nazarov Sergey <s-nazarov@yandex.ru> wrote:
> 25.01.2019, 19:46, "Paul Moore" <paul@paul-moore.com>:
> > Hmm, I think the above calculation should take into account the actual
> > length of the IP options, and not just the max size (calculate it
> > based on iphdr->ihl).
> >
> > Beyond that fix, I think it's time to put together a proper patchset
> > and post it to the lists for formal review/merging.
> >
> > Thanks for your work on this.
> >
> > --
> > paul moore
> > www.paul-moore.com
>
> Where we can take actual IP options length? Sorry, I'm not so familiar with linux network stack.

I'm the one who needs to apologize; you're doing it correctly.  Not
sure what I was thinking there, sorry about that.

> And also, ip_options_compile could change IP options data (SSRR, LSRR, RR, TIMESTAMP options),
> so, we can't use ip_options_compile again for these options. Am I right?

If we don't pass a skb into ip_options_compile(), meaning both "skb"
and "rt" will be NULL, then I don't believe the option data will
change.  Am I missing something?
Sergey Nazarov Jan. 29, 2019, 7:23 a.m. UTC | #14
29.01.2019, 01:18, "Paul Moore" <paul@paul-moore.com>:
> If we don't pass a skb into ip_options_compile(), meaning both "skb"
> and "rt" will be NULL, then I don't believe the option data will
> change. Am I missing something?
>
> --
> paul moore
> www.paul-moore.com

I mean, in cipso_v4_error we copy option data from skb before ip_options_compile call:
+       memcpy(opt->__data, (unsigned char *)&(ip_hdr(skb)[1]), opt->optlen);
But skb IP header data could be already changed by first call of ip_options_compile
when packet received.
Paul Moore Jan. 29, 2019, 10:42 p.m. UTC | #15
On Tue, Jan 29, 2019 at 2:23 AM Nazarov Sergey <s-nazarov@yandex.ru> wrote:
> 29.01.2019, 01:18, "Paul Moore" <paul@paul-moore.com>:
> > If we don't pass a skb into ip_options_compile(), meaning both "skb"
> > and "rt" will be NULL, then I don't believe the option data will
> > change. Am I missing something?
>
> I mean, in cipso_v4_error we copy option data from skb before ip_options_compile call:
> +       memcpy(opt->__data, (unsigned char *)&(ip_hdr(skb)[1]), opt->optlen);
> But skb IP header data could be already changed by first call of ip_options_compile
> when packet received.

There are several cases where the stack ends up calling icmp_send()
after the skb has been through ip_options_compile(), that should be
okay.
Sergey Nazarov Jan. 30, 2019, 1:11 p.m. UTC | #16
30.01.2019, 01:42, "Paul Moore" <paul@paul-moore.com>:
> There are several cases where the stack ends up calling icmp_send()
> after the skb has been through ip_options_compile(), that should be
> okay.
>
> --
> paul moore
> www.paul-moore.com

In those cases precompiled ip_options struct used, without the need to reuse ip_options_compile.
I think, for error ICMP packet, we can discard all other options except CIPSO. It will be better, than
send packet, contains wrong option's data. Modified patch 2:
---
 net/ipv4/cipso_ipv4.c |   24 ++++++++++++++++++++++--
 1 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index 777fa3b..797826c 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -1735,13 +1735,33 @@ int cipso_v4_validate(const struct sk_buff *skb, unsigned char **option)
  */
 void cipso_v4_error(struct sk_buff *skb, int error, u32 gateway)
 {
+       struct ip_options opt;
+       unsigned char *optptr;
+
        if (ip_hdr(skb)->protocol == IPPROTO_ICMP || error != -EACCES)
                return;
 
+       /* 
+        * We might be called above the IP layer,
+        * so we can not use icmp_send and IPCB here.
+        *
+        * For the generated ICMP packet, we create a
+        * temporary ip _options structure, contains
+        * the CIPSO option only, since the other options data
+        * could be modified when the original packet receiving.
+        */
+
+       memset(&opt, 0, sizeof(struct ip_options));
+       optptr = cipso_v4_optptr(skb);
+       if (optptr) {
+               opt.optlen = optptr[1];
+               opt.cipso = optptr - skb_network_header(skb);
+       }
+
        if (gateway)
-               icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0);
+               __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0, &opt);
        else
-               icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0);
+               __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0, &opt);
 }
 
 /**
Paul Moore Jan. 31, 2019, 2:10 a.m. UTC | #17
On Wed, Jan 30, 2019 at 8:11 AM Nazarov Sergey <s-nazarov@yandex.ru> wrote:
> 30.01.2019, 01:42, "Paul Moore" <paul@paul-moore.com>:
> > There are several cases where the stack ends up calling icmp_send()
> > after the skb has been through ip_options_compile(), that should be
> > okay.
> >
> > --
> > paul moore
> > www.paul-moore.com
>
> In those cases precompiled ip_options struct used, without the need to reuse ip_options_compile.
> I think, for error ICMP packet, we can discard all other options except CIPSO. It will be better, than
> send packet, contains wrong option's data. Modified patch 2:
> ---
>  net/ipv4/cipso_ipv4.c |   24 ++++++++++++++++++++++--
>  1 files changed, 22 insertions(+), 2 deletions(-)

This isn't how the rest of the stack works, look at
ip_local_deliver_finish() for one example.  Perhaps the behavior you
are proposing is correct, but please show me where in the various RFC
specs it is defined so that I can better understand why it should work
this way.

> diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
> index 777fa3b..797826c 100644
> --- a/net/ipv4/cipso_ipv4.c
> +++ b/net/ipv4/cipso_ipv4.c
> @@ -1735,13 +1735,33 @@ int cipso_v4_validate(const struct sk_buff *skb, unsigned char **option)
>   */
>  void cipso_v4_error(struct sk_buff *skb, int error, u32 gateway)
>  {
> +       struct ip_options opt;
> +       unsigned char *optptr;
> +
>         if (ip_hdr(skb)->protocol == IPPROTO_ICMP || error != -EACCES)
>                 return;
>
> +       /*
> +        * We might be called above the IP layer,
> +        * so we can not use icmp_send and IPCB here.
> +        *
> +        * For the generated ICMP packet, we create a
> +        * temporary ip _options structure, contains
> +        * the CIPSO option only, since the other options data
> +        * could be modified when the original packet receiving.
> +        */
> +
> +       memset(&opt, 0, sizeof(struct ip_options));
> +       optptr = cipso_v4_optptr(skb);
> +       if (optptr) {
> +               opt.optlen = optptr[1];
> +               opt.cipso = optptr - skb_network_header(skb);
> +       }
> +
>         if (gateway)
> -               icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0);
> +               __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0, &opt);
>         else
> -               icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0);
> +               __icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0, &opt);
>  }
>
>  /**
>
Sergey Nazarov Jan. 31, 2019, 1:20 p.m. UTC | #18
31.01.2019, 05:10, "Paul Moore" <paul@paul-moore.com>:
> This isn't how the rest of the stack works, look at
> ip_local_deliver_finish() for one example. Perhaps the behavior you
> are proposing is correct, but please show me where in the various RFC
> specs it is defined so that I can better understand why it should work
> this way.
> --
> paul moore
> www.paul-moore.com

Sorry, I was inattentive. ip_options_compile modifies srr option data, only if
skb is NULL. My last message could be ignored.
Paul Moore Feb. 11, 2019, 8:37 p.m. UTC | #19
On Thu, Jan 31, 2019 at 8:20 AM Nazarov Sergey <s-nazarov@yandex.ru> wrote:
> 31.01.2019, 05:10, "Paul Moore" <paul@paul-moore.com>:
> > This isn't how the rest of the stack works, look at
> > ip_local_deliver_finish() for one example. Perhaps the behavior you
> > are proposing is correct, but please show me where in the various RFC
> > specs it is defined so that I can better understand why it should work
> > this way.
> > --
> > paul moore
> > www.paul-moore.com
>
> Sorry, I was inattentive. ip_options_compile modifies srr option data, only if
> skb is NULL. My last message could be ignored.

Hi Nazarov,

Do you plan on submitting these patches as a proper patchset for
review and merging?
Sergey Nazarov Feb. 11, 2019, 9:21 p.m. UTC | #20
Hi, Paul!
What I need to do for this?

11.02.2019, 23:37, "Paul Moore" <paul@paul-moore.com>:
> On Thu, Jan 31, 2019 at 8:20 AM Nazarov Sergey <s-nazarov@yandex.ru> wrote:
>>  31.01.2019, 05:10, "Paul Moore" <paul@paul-moore.com>:
>>  > This isn't how the rest of the stack works, look at
>>  > ip_local_deliver_finish() for one example. Perhaps the behavior you
>>  > are proposing is correct, but please show me where in the various RFC
>>  > specs it is defined so that I can better understand why it should work
>>  > this way.
>>  > --
>>  > paul moore
>>  > www.paul-moore.com
>>
>>  Sorry, I was inattentive. ip_options_compile modifies srr option data, only if
>>  skb is NULL. My last message could be ignored.
>
> Hi Nazarov,
>
> Do you plan on submitting these patches as a proper patchset for
> review and merging?
>
> --
> paul moore
> www.paul-moore.com
Paul Moore Feb. 11, 2019, 11:43 p.m. UTC | #21
On Mon, Feb 11, 2019 at 4:21 PM Nazarov Sergey <s-nazarov@yandex.ru> wrote:
> Hi, Paul!
> What I need to do for this?

If you haven't already done so, go read
Documentation/process/submitting-patches.rst, that should guide you
through the process.  I would also suggest looking at both the git log
and the mailing list archives to see what others have done in terms of
commit descriptions, etc.

After that, if you have any questions let me know and I can help you out.

Thanks.

> 11.02.2019, 23:37, "Paul Moore" <paul@paul-moore.com>:
> > On Thu, Jan 31, 2019 at 8:20 AM Nazarov Sergey <s-nazarov@yandex.ru> wrote:
> >>  31.01.2019, 05:10, "Paul Moore" <paul@paul-moore.com>:
> >>  > This isn't how the rest of the stack works, look at
> >>  > ip_local_deliver_finish() for one example. Perhaps the behavior you
> >>  > are proposing is correct, but please show me where in the various RFC
> >>  > specs it is defined so that I can better understand why it should work
> >>  > this way.
> >>  > --
> >>  > paul moore
> >>  > www.paul-moore.com
> >>
> >>  Sorry, I was inattentive. ip_options_compile modifies srr option data, only if
> >>  skb is NULL. My last message could be ignored.
> >
> > Hi Nazarov,
> >
> > Do you plan on submitting these patches as a proper patchset for
> > review and merging?
diff mbox series

Patch

--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -679,7 +679,8 @@  void icmp_send(struct sk_buff *skb_in, i
 					  iph->tos;
 	mark = IP4_REPLY_MARK(net, skb_in->mark);
 
-	if (ip_options_echo(&icmp_param->replyopts.opt.opt, skb_in))
+	if (__ip_options_echo(&icmp_param->replyopts.opt.opt, skb_in,
+			ip_hdr(skb_in)->protocol == IPPROTO_TCP ? &TCP_SKB_CB(skb_in)->header.h4.opt : &IPCB(skb_in)->opt))
 		goto out_unlock;