Message ID | 20201008012154.11149-1-xiyou.wangcong@gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | [net] ip_gre: set dev->hard_header_len properly | expand |
On Wed, Oct 7, 2020 at 9:22 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > GRE tunnel has its own header_ops, ipgre_header_ops, and sets it > conditionally. When it is set, it assumes the outer IP header is > already created before ipgre_xmit(). > > This is not true when we send packets through a raw packet socket, > where L2 headers are supposed to be constructed by user. Packet > socket calls dev_validate_header() to validate the header. But > GRE tunnel does not set dev->hard_header_len, so that check can > be simply bypassed, therefore uninit memory could be passed down > to ipgre_xmit(). If dev->hard_header_len is zero, the packet socket will not reserve room for the link layer header, so skb->data points to network_header. But I don't see any uninitialized packet data? > Fix this by setting dev->hard_header_len whenever sets header_ops, > as dev->hard_header_len is supposed to be the length of the header > created by dev->header_ops->create() anyway. Agreed. Xie has made similar changes to lapbether recently in commit c7ca03c216ac. > Reported-and-tested-by: syzbot+4a2c52677a8a1aa283cb@syzkaller.appspotmail.com > Cc: William Tu <u9012063@gmail.com> > Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> > --- > net/ipv4/ip_gre.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > index 4e31f23e4117..43b62095559e 100644 > --- a/net/ipv4/ip_gre.c > +++ b/net/ipv4/ip_gre.c > @@ -987,10 +987,12 @@ static int ipgre_tunnel_init(struct net_device *dev) > return -EINVAL; > dev->flags = IFF_BROADCAST; > dev->header_ops = &ipgre_header_ops; > + dev->hard_header_len = tunnel->hlen + sizeof(*iph); > } > #endif > } else if (!tunnel->collect_md) { > dev->header_ops = &ipgre_header_ops; > + dev->hard_header_len = tunnel->hlen + sizeof(*iph); Unrelated to this patch, I do wonder if ipgre_header does the right thing when tunnel->hlen > sizeof(gre_base_hdr), i.e., for gre tunnels with optional fields. Currently it appears to not initialize those. > } > > return ip_tunnel_init(dev); > -- > 2.28.0 >
On Thu, Oct 8, 2020 at 4:49 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Wed, Oct 7, 2020 at 9:22 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > > > GRE tunnel has its own header_ops, ipgre_header_ops, and sets it > > conditionally. When it is set, it assumes the outer IP header is > > already created before ipgre_xmit(). > > > > This is not true when we send packets through a raw packet socket, > > where L2 headers are supposed to be constructed by user. Packet > > socket calls dev_validate_header() to validate the header. But > > GRE tunnel does not set dev->hard_header_len, so that check can > > be simply bypassed, therefore uninit memory could be passed down > > to ipgre_xmit(). > > If dev->hard_header_len is zero, the packet socket will not reserve > room for the link layer header, so skb->data points to network_header. > But I don't see any uninitialized packet data? The uninit data is allocated by packet_alloc_skb(), if dev->hard_header_len is 0 and 'len' is anything between [0, tunnel->hlen + sizeof(struct iphdr)), dev_validate_header() still returns true obviously but only 'len' bytes are copied from user-space by skb_copy_datagram_from_iter(). Therefore, those bytes within range (len, tunnel->hlen + sizeof(struct iphdr)] are uninitialized. Thanks.
On Thu, Oct 8, 2020 at 1:34 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > On Thu, Oct 8, 2020 at 4:49 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > On Wed, Oct 7, 2020 at 9:22 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > > > > > GRE tunnel has its own header_ops, ipgre_header_ops, and sets it > > > conditionally. When it is set, it assumes the outer IP header is > > > already created before ipgre_xmit(). > > > > > > This is not true when we send packets through a raw packet socket, > > > where L2 headers are supposed to be constructed by user. Packet > > > socket calls dev_validate_header() to validate the header. But > > > GRE tunnel does not set dev->hard_header_len, so that check can > > > be simply bypassed, therefore uninit memory could be passed down > > > to ipgre_xmit(). > > > > If dev->hard_header_len is zero, the packet socket will not reserve > > room for the link layer header, so skb->data points to network_header. > > But I don't see any uninitialized packet data? > > The uninit data is allocated by packet_alloc_skb(), if dev->hard_header_len > is 0 and 'len' is anything between [0, tunnel->hlen + sizeof(struct iphdr)), > dev_validate_header() still returns true obviously but only 'len' > bytes are copied > from user-space by skb_copy_datagram_from_iter(). Therefore, those bytes > within range (len, tunnel->hlen + sizeof(struct iphdr)] are uninitialized. With dev->hard_header_len of zero, packet_alloc_skb() only allocates len bytes. With SOCK_RAW, the writer is expected to write the ip and gre header and include these in the send len argument. The only difference I see is that with hard_header_len the data starts reserve bytes before skb_network_header, and an additional tail has been allocated that is not used. But this also fixes a potentially more serious bug. With SOCK_DGRAM, dev_hard_header/ipgre_header just assumes that there is enough room in the packet to skb_push(skb, t->hlen + sizeof(*iph)). Which may be false if this header length had not been reserved. Though I've mainly looked at packet_snd. Perhaps you are referring to tpacket_snd?
On Thu, Oct 8, 2020 at 12:04 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Thu, Oct 8, 2020 at 1:34 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > > > The uninit data is allocated by packet_alloc_skb(), if dev->hard_header_len > > is 0 and 'len' is anything between [0, tunnel->hlen + sizeof(struct iphdr)), > > dev_validate_header() still returns true obviously but only 'len' > > bytes are copied > > from user-space by skb_copy_datagram_from_iter(). Therefore, those bytes > > within range (len, tunnel->hlen + sizeof(struct iphdr)] are uninitialized. > > With dev->hard_header_len of zero, packet_alloc_skb() only allocates len bytes. > > With SOCK_RAW, the writer is expected to write the ip and gre header > and include these in the send len argument. The only difference I see > is that with hard_header_len the data starts reserve bytes before > skb_network_header, and an additional tail has been allocated that is > not used. > > But this also fixes a potentially more serious bug. With SOCK_DGRAM, > dev_hard_header/ipgre_header just assumes that there is enough room in > the packet to skb_push(skb, t->hlen + sizeof(*iph)). Which may be > false if this header length had not been reserved. > > Though I've mainly looked at packet_snd. Perhaps you are referring to > tpacket_snd? I think what Cong means is that hard_header_len has to be set properly to prevent an AF_PACKET/RAW user from sending a frame that is too short (shorter than the header length). When an AF_PACKET/RAW user sends a frame shorter than the header length, and the code on the sending path still expects a full header, it will read uninitialized data. If my understanding is right, I agree on this part. However, there's something I don't understand in the GRE code. The ipgre_header function only creates an IP header (20 bytes) + a GRE base header (4 bytes), but pushes and returns "t->hlen + sizeof(*iph)". What is t->hlen? It seems to me it is the sum of t->tun_hlen and t->encap_hlen. What are these two?
On Thu, Oct 8, 2020 at 3:04 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Thu, Oct 8, 2020 at 1:34 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > > > On Thu, Oct 8, 2020 at 4:49 AM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > On Wed, Oct 7, 2020 at 9:22 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > > > > > > > GRE tunnel has its own header_ops, ipgre_header_ops, and sets it > > > > conditionally. When it is set, it assumes the outer IP header is > > > > already created before ipgre_xmit(). > > > > > > > > This is not true when we send packets through a raw packet socket, > > > > where L2 headers are supposed to be constructed by user. Packet > > > > socket calls dev_validate_header() to validate the header. But > > > > GRE tunnel does not set dev->hard_header_len, so that check can > > > > be simply bypassed, therefore uninit memory could be passed down > > > > to ipgre_xmit(). > > > > > > If dev->hard_header_len is zero, the packet socket will not reserve > > > room for the link layer header, so skb->data points to network_header. > > > But I don't see any uninitialized packet data? > > > > The uninit data is allocated by packet_alloc_skb(), if dev->hard_header_len > > is 0 and 'len' is anything between [0, tunnel->hlen + sizeof(struct iphdr)), > > dev_validate_header() still returns true obviously but only 'len' > > bytes are copied > > from user-space by skb_copy_datagram_from_iter(). Therefore, those bytes > > within range (len, tunnel->hlen + sizeof(struct iphdr)] are uninitialized. Oh, do you mean that ipgre_xmit will process undefined data because it calls skb_pull(skb, tunnel->hlen + sizeof(struct iphdr)) ?
On Thu, Oct 8, 2020 at 3:17 PM Xie He <xie.he.0141@gmail.com> wrote: > > On Thu, Oct 8, 2020 at 12:04 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > On Thu, Oct 8, 2020 at 1:34 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > > > > > The uninit data is allocated by packet_alloc_skb(), if dev->hard_header_len > > > is 0 and 'len' is anything between [0, tunnel->hlen + sizeof(struct iphdr)), > > > dev_validate_header() still returns true obviously but only 'len' > > > bytes are copied > > > from user-space by skb_copy_datagram_from_iter(). Therefore, those bytes > > > within range (len, tunnel->hlen + sizeof(struct iphdr)] are uninitialized. > > > > With dev->hard_header_len of zero, packet_alloc_skb() only allocates len bytes. > > > > With SOCK_RAW, the writer is expected to write the ip and gre header > > and include these in the send len argument. The only difference I see > > is that with hard_header_len the data starts reserve bytes before > > skb_network_header, and an additional tail has been allocated that is > > not used. > > > > But this also fixes a potentially more serious bug. With SOCK_DGRAM, > > dev_hard_header/ipgre_header just assumes that there is enough room in > > the packet to skb_push(skb, t->hlen + sizeof(*iph)). Which may be > > false if this header length had not been reserved. > > > > Though I've mainly looked at packet_snd. Perhaps you are referring to > > tpacket_snd? > > I think what Cong means is that hard_header_len has to be set properly > to prevent an AF_PACKET/RAW user from sending a frame that is too > short (shorter than the header length). When an AF_PACKET/RAW user > sends a frame shorter than the header length, and the code on the > sending path still expects a full header, it will read uninitialized > data. Yes, that makes sense, thanks. Our message crossed :) > If my understanding is right, I agree on this part. > > However, there's something I don't understand in the GRE code. The > ipgre_header function only creates an IP header (20 bytes) + a GRE > base header (4 bytes), but pushes and returns "t->hlen + > sizeof(*iph)". What is t->hlen? GRE is variable length depending on flags: tunnel->tun_hlen = gre_calc_hlen(tunnel->parms.o_flags); > It seems to me it is the sum of > t->tun_hlen and t->encap_hlen. What are these two?
On Thu, Oct 8, 2020 at 12:18 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Thu, Oct 8, 2020 at 3:04 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > On Thu, Oct 8, 2020 at 1:34 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > > > > > On Thu, Oct 8, 2020 at 4:49 AM Willem de Bruijn > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > On Wed, Oct 7, 2020 at 9:22 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > > > > > > > > > GRE tunnel has its own header_ops, ipgre_header_ops, and sets it > > > > > conditionally. When it is set, it assumes the outer IP header is > > > > > already created before ipgre_xmit(). > > > > > > > > > > This is not true when we send packets through a raw packet socket, > > > > > where L2 headers are supposed to be constructed by user. Packet > > > > > socket calls dev_validate_header() to validate the header. But > > > > > GRE tunnel does not set dev->hard_header_len, so that check can > > > > > be simply bypassed, therefore uninit memory could be passed down > > > > > to ipgre_xmit(). > > > > > > > > If dev->hard_header_len is zero, the packet socket will not reserve > > > > room for the link layer header, so skb->data points to network_header. > > > > But I don't see any uninitialized packet data? > > > > > > The uninit data is allocated by packet_alloc_skb(), if dev->hard_header_len > > > is 0 and 'len' is anything between [0, tunnel->hlen + sizeof(struct iphdr)), > > > dev_validate_header() still returns true obviously but only 'len' > > > bytes are copied > > > from user-space by skb_copy_datagram_from_iter(). Therefore, those bytes > > > within range (len, tunnel->hlen + sizeof(struct iphdr)] are uninitialized. > > Oh, do you mean that ipgre_xmit will process undefined data because it > calls skb_pull(skb, tunnel->hlen + sizeof(struct iphdr)) ? The syzbot report has the information for both of your questions: https://syzkaller.appspot.com/text?tag=CrashReport&x=11845568500000 It clearly shows packet_snd() and ipgre_xmit(). Thanks.
On Thu, Oct 8, 2020 at 12:20 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Thu, Oct 8, 2020 at 3:17 PM Xie He <xie.he.0141@gmail.com> wrote: > > > > However, there's something I don't understand in the GRE code. The > > ipgre_header function only creates an IP header (20 bytes) + a GRE > > base header (4 bytes), but pushes and returns "t->hlen + > > sizeof(*iph)". What is t->hlen? > > GRE is variable length depending on flags: > > tunnel->tun_hlen = gre_calc_hlen(tunnel->parms.o_flags); > > > > It seems to me it is the sum of > > t->tun_hlen and t->encap_hlen. What are these two? OK. I understand that t->tun_hlen is the GRE header length. What is t->encap_hlen?
On Wed, Oct 7, 2020 at 9:22 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > GRE tunnel has its own header_ops, ipgre_header_ops, and sets it > conditionally. When it is set, it assumes the outer IP header is > already created before ipgre_xmit(). > > This is not true when we send packets through a raw packet socket, > where L2 headers are supposed to be constructed by user. Packet > socket calls dev_validate_header() to validate the header. But > GRE tunnel does not set dev->hard_header_len, so that check can > be simply bypassed, therefore uninit memory could be passed down > to ipgre_xmit(). > > Fix this by setting dev->hard_header_len whenever sets header_ops, > as dev->hard_header_len is supposed to be the length of the header > created by dev->header_ops->create() anyway. > > Reported-and-tested-by: syzbot+4a2c52677a8a1aa283cb@syzkaller.appspotmail.com > Cc: William Tu <u9012063@gmail.com> > Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> > The syzbot report has the information for both of your questions: > https://syzkaller.appspot.com/text?tag=CrashReport&x=11845568500000 > > It clearly shows packet_snd() and ipgre_xmit(). Thanks. I hadn't thought to check that (clearly).
On Thu, Oct 8, 2020 at 4:11 PM Xie He <xie.he.0141@gmail.com> wrote: > > On Thu, Oct 8, 2020 at 12:20 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > On Thu, Oct 8, 2020 at 3:17 PM Xie He <xie.he.0141@gmail.com> wrote: > > > > > > However, there's something I don't understand in the GRE code. The > > > ipgre_header function only creates an IP header (20 bytes) + a GRE > > > base header (4 bytes), but pushes and returns "t->hlen + > > > sizeof(*iph)". What is t->hlen? > > > > GRE is variable length depending on flags: > > > > tunnel->tun_hlen = gre_calc_hlen(tunnel->parms.o_flags); > > > > > > > It seems to me it is the sum of > > > t->tun_hlen and t->encap_hlen. What are these two? > > OK. I understand that t->tun_hlen is the GRE header length. What is > t->encap_hlen? I've looked at that closely either. Appears to be to account for additional FOU/GUE encap: " commit 56328486539ddd07cbaafec7a542a2c8a3043623 Author: Tom Herbert <therbert@google.com> Date: Wed Sep 17 12:25:58 2014 -0700 net: Changes to ip_tunnel to support foo-over-udp encapsulation This patch changes IP tunnel to support (secondary) encapsulation, Foo-over-UDP. Changes include: 1) Adding tun_hlen as the tunnel header length, encap_hlen as the encapsulation header length, and hlen becomes the grand total of these. 2) Added common netlink define to support FOU encapsulation. 3) Routines to perform FOU encapsulation. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> "
On Thu, Oct 8, 2020 at 1:32 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Thu, Oct 8, 2020 at 4:11 PM Xie He <xie.he.0141@gmail.com> wrote: > > > > OK. I understand that t->tun_hlen is the GRE header length. What is > > t->encap_hlen? > > I've looked at that closely either. > > Appears to be to account for additional FOU/GUE encap: > > " > commit 56328486539ddd07cbaafec7a542a2c8a3043623 > Author: Tom Herbert <therbert@google.com> > Date: Wed Sep 17 12:25:58 2014 -0700 > net: Changes to ip_tunnel to support foo-over-udp encapsulation > > This patch changes IP tunnel to support (secondary) encapsulation, > Foo-over-UDP. Changes include: > > 1) Adding tun_hlen as the tunnel header length, encap_hlen as the > encapsulation header length, and hlen becomes the grand total > of these. > 2) Added common netlink define to support FOU encapsulation. > 3) Routines to perform FOU encapsulation. > > Signed-off-by: Tom Herbert <therbert@google.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > " I see the ipgre_xmit function would pull the header our header_ops creates, and then call __gre_xmit. __gre_xmit will call gre_build_header to complete the GRE header. gre_build_header expects to find the base GRE header after pushing tunnel->tun_hlen. However, if tunnel->encap_hlen is not 0, it couldn't find the base GRE header there. Is there a problem? Where exactly should we put the tunnel->encap_hlen header? Before the GRE header or after?
On Thu, Oct 8, 2020 at 5:36 PM Xie He <xie.he.0141@gmail.com> wrote: > > On Thu, Oct 8, 2020 at 1:32 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > On Thu, Oct 8, 2020 at 4:11 PM Xie He <xie.he.0141@gmail.com> wrote: > > > > > > OK. I understand that t->tun_hlen is the GRE header length. What is > > > t->encap_hlen? > > > > I've looked at that closely either. > > > > Appears to be to account for additional FOU/GUE encap: > > > > " > > commit 56328486539ddd07cbaafec7a542a2c8a3043623 > > Author: Tom Herbert <therbert@google.com> > > Date: Wed Sep 17 12:25:58 2014 -0700 > > net: Changes to ip_tunnel to support foo-over-udp encapsulation > > > > This patch changes IP tunnel to support (secondary) encapsulation, > > Foo-over-UDP. Changes include: > > > > 1) Adding tun_hlen as the tunnel header length, encap_hlen as the > > encapsulation header length, and hlen becomes the grand total > > of these. > > 2) Added common netlink define to support FOU encapsulation. > > 3) Routines to perform FOU encapsulation. > > > > Signed-off-by: Tom Herbert <therbert@google.com> > > Signed-off-by: David S. Miller <davem@davemloft.net> > > " > > I see the ipgre_xmit function would pull the header our header_ops > creates, and then call __gre_xmit. __gre_xmit will call > gre_build_header to complete the GRE header. gre_build_header expects > to find the base GRE header after pushing tunnel->tun_hlen. However, > if tunnel->encap_hlen is not 0, it couldn't find the base GRE header > there. Is there a problem? > > Where exactly should we put the tunnel->encap_hlen header? Before the > GRE header or after? The L4 tunnel infra uses the two callbacks encap_hlen (e.g., fou_encap_hlen) and build_header (fou_build_header) in struct ip_tunnel_encap_ops to first allocate more space and later call back into the specific implementation to fill that data. build_header is called from __gre6_xmit -> ip6_tnl_xmit (or its ipv4 equivalent). This happens after gre has pushed its header, so the headers will come before that.
On Thu, Oct 8, 2020 at 2:48 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > > I see the ipgre_xmit function would pull the header our header_ops > > creates, and then call __gre_xmit. __gre_xmit will call > > gre_build_header to complete the GRE header. gre_build_header expects > > to find the base GRE header after pushing tunnel->tun_hlen. However, > > if tunnel->encap_hlen is not 0, it couldn't find the base GRE header > > there. Is there a problem? > > > > Where exactly should we put the tunnel->encap_hlen header? Before the > > GRE header or after? > > The L4 tunnel infra uses the two callbacks encap_hlen (e.g., > fou_encap_hlen) and build_header (fou_build_header) in struct > ip_tunnel_encap_ops to first allocate more space and later call back > into the specific implementation to fill that data. build_header is > called from __gre6_xmit -> ip6_tnl_xmit (or its ipv4 equivalent). > This happens after gre has pushed its header, so the headers > will come before that. OK. If the t->encap_hlen header needs to be placed before the GRE header, then I think the ipgre_header function should leave some space before the GRE header to place the t->encap_hlen header, rather than leaving space after the GRE header.
On Thu, Oct 8, 2020 at 2:54 PM Xie He <xie.he.0141@gmail.com> wrote: > > OK. If the t->encap_hlen header needs to be placed before the GRE > header, then I think the ipgre_header function should leave some space > before the GRE header to place the t->encap_hlen header, rather than > leaving space after the GRE header. I found another possible issue. Shouldn't we update hard_header_len every time t->tun_hlen and t->hlen are updated in ipgre_link_update?
On Thu, Oct 8, 2020 at 4:40 PM Xie He <xie.he.0141@gmail.com> wrote: > > I found another possible issue. Shouldn't we update hard_header_len > every time t->tun_hlen and t->hlen are updated in ipgre_link_update? Good catch. It should be updated there like ->needed_headroom. I will update my patch. Thanks.
On Fri, Oct 9, 2020 at 10:44 AM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > On Thu, Oct 8, 2020 at 4:40 PM Xie He <xie.he.0141@gmail.com> wrote: > > > > I found another possible issue. Shouldn't we update hard_header_len > > every time t->tun_hlen and t->hlen are updated in ipgre_link_update? > > Good catch. It should be updated there like ->needed_headroom. > I will update my patch. Thanks. But there is still something that I don't understand. What is needed_headroom used for? If we are requesting space for t->encap_hlen and t->tun_hlen in hard_header_len. Do we still need to use needed_headroom? Also, if we update hard_header_len or needed_headroom in ipgre_link_update, would there be racing issues if they are updated while skbs are being sent? If these are indeed issues, it might not be easy to fix this driver. Willem, do you have any thoughts?
On Fri, Oct 9, 2020 at 12:41 PM Xie He <xie.he.0141@gmail.com> wrote: > > Thanks. But there is still something that I don't understand. What is > needed_headroom used for? If we are requesting space for t->encap_hlen > and t->tun_hlen in hard_header_len. Do we still need to use > needed_headroom? It seems to me that the original code includes t->encap_hlen, t->tun_hlen, and the IP header length in needed_headroom. (Right?) If we are including these in hard_header_len, we need to move them out of needed_headroom. > Also, if we update hard_header_len or needed_headroom in > ipgre_link_update, would there be racing issues if they are updated > while skbs are being sent? > > If these are indeed issues, it might not be easy to fix this driver. > Willem, do you have any thoughts?
On Fri, Oct 9, 2020 at 12:51 PM Xie He <xie.he.0141@gmail.com> wrote: > > On Fri, Oct 9, 2020 at 12:41 PM Xie He <xie.he.0141@gmail.com> wrote: > > > > Thanks. But there is still something that I don't understand. What is > > needed_headroom used for? If we are requesting space for t->encap_hlen > > and t->tun_hlen in hard_header_len. Do we still need to use > > needed_headroom? > > It seems to me that the original code includes t->encap_hlen, > t->tun_hlen, and the IP header length in needed_headroom. (Right?) If > we are including these in hard_header_len, we need to move them out of > needed_headroom. Interesting point. I think needed_headroom is 0 until we call ipgre_changelink(), but needed_headroom is already being used in multiple places for skb_cow_head() in the same file, I guess they should be replaced with hard_head_len because for GRE tunnel those are its link-layer header. What makes it more complicated is that header_ops is actually set conditionally, so should be hard_header_len/needed_head_room accordingly. Thanks.
On Fri, Oct 9, 2020 at 1:38 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > Interesting point. I think needed_headroom is 0 until we call > ipgre_changelink(), but needed_headroom is already being used > in multiple places for skb_cow_head() in the same file, I guess > they should be replaced with hard_head_len because for GRE tunnel > those are its link-layer header. What makes it more complicated > is that header_ops is actually set conditionally, so should be > hard_header_len/needed_head_room accordingly. Looking a bit deeper, I doubt the ipgre_header_ops->create is necessary, because 1) all other tunnels devices do not have it (ip_tunnel_header_ops only contains ->parse_protocol); 2) GRE headers are pushed in xmit anyway, so at least SOCK_DGRAM does not need it; 3) ipgre_header() creates the GRE header, later ipgre_xmit() pulls it back, then __gre_xmit() builds GRE header again... Please correct me if I am wrong. Thanks.
On Fri, Oct 9, 2020 at 6:07 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > Looking a bit deeper, I doubt the ipgre_header_ops->create is necessary, > because 1) all other tunnels devices do not have it (ip_tunnel_header_ops > only contains ->parse_protocol); 2) GRE headers are pushed in xmit > anyway, so at least SOCK_DGRAM does not need it; 3) ipgre_header() > creates the GRE header, later ipgre_xmit() pulls it back, then __gre_xmit() > builds GRE header again... Haha, I also don't like ipgre_header_ops->create and want to get rid of it. We are thinking the same :) From what I see, the flow when sending skbs (when header_ops is used) is like this: 1) First ipgre_header creates the IP header and the GRE base header, leaving space for the GRE optional fields and the "encap" header (the space for the "encap" header should be left before the GRE header, but it is incorrectly left after the GRE header). 2) Then ipgre_xmit pulls the header created by ipgre_header (but leaves the data there). Then it calls __gre_xmit. 3) __gre_xmit calls gre_build_header. gre_build_header will push back the GRE header, read the GRE base header and build the GRE optional fields. 4) Then __gre_xmit calls ip_tunnel_xmit. ip_tunnel_xmit will build the "encap" header by calling ip_tunnel_encap. So what ipgre_header does is actually creating the IP header and the GRE header, and leaving some holes for the GRE optional fields and the "encap" header to be built later. This seems so weird to me. If a user is using an AF_PACKET/RAW socket, the user is supposed to do what the header_ops->create function does (that is, creating two headers and leaving two holes to be filled in later). I think no user would actually do that. That is so weird. Also you said, all other tunnel devices didn't have header_ops->create. I think that is a good argument for getting rid of header_ops->create, too. Also, for IP GRE TAP devices (as I understand, these devices are just like IP GRE devices except it tunnels Ethernet frames instead of network-layer packets), its header_ops will be Ethernet's eth_header_ops (ipgre_tap_setup calls ether_setup, which sets up the Ethernet header_ops). In this case, I think it is logical to keep the header_ops for IP GRE devices as NULL. So that the header_ops of IP GRE devices is consistent with that of IP GRE TAP devices.
On Fri, Oct 9, 2020 at 8:10 PM Xie He <xie.he.0141@gmail.com> wrote: > > On Fri, Oct 9, 2020 at 6:07 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > > > Looking a bit deeper, I doubt the ipgre_header_ops->create is necessary, > > because 1) all other tunnels devices do not have it (ip_tunnel_header_ops > > only contains ->parse_protocol); 2) GRE headers are pushed in xmit > > anyway, so at least SOCK_DGRAM does not need it; 3) ipgre_header() > > creates the GRE header, later ipgre_xmit() pulls it back, then __gre_xmit() > > builds GRE header again... > > Haha, I also don't like ipgre_header_ops->create and want to get rid > of it. We are thinking the same :) Great! > > From what I see, the flow when sending skbs (when header_ops is used) > is like this: > > 1) First ipgre_header creates the IP header and the GRE base header, > leaving space for the GRE optional fields and the "encap" header (the > space for the "encap" header should be left before the GRE header, but > it is incorrectly left after the GRE header). > > 2) Then ipgre_xmit pulls the header created by ipgre_header (but > leaves the data there). Then it calls __gre_xmit. > > 3) __gre_xmit calls gre_build_header. gre_build_header will push back > the GRE header, read the GRE base header and build the GRE optional > fields. > > 4) Then __gre_xmit calls ip_tunnel_xmit. ip_tunnel_xmit will build the > "encap" header by calling ip_tunnel_encap. > > So what ipgre_header does is actually creating the IP header and the > GRE header, and leaving some holes for the GRE optional fields and the > "encap" header to be built later. Right. For the encap header, I'd guess it is because this is relatively new. > > This seems so weird to me. If a user is using an AF_PACKET/RAW socket, > the user is supposed to do what the header_ops->create function does > (that is, creating two headers and leaving two holes to be filled in > later). I think no user would actually do that. That is so weird. Well, AF_PACKET RAW socket is supposed to construct L2 headers from the user buffer, and for a tunnel device these headers are indeed its L2. If users do not want to do this, they can switch to DGRAM anyway. I know how inconvenient it is to construct a GRE tunnel header, I guess this is why all other tunnel devices do not provide a header_ops::create. GRE tunnel is not a special case, this is why I agree on removing its ->create() although it does look like all tunnel devices should let users construct headers by definition of RAW. Of course, it may be too late to change this behavior even if we really want, users may already assume there is always no tunnel header needed to construct. Thanks.
On Sat, Oct 10, 2020 at 11:58 AM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > On Fri, Oct 9, 2020 at 8:10 PM Xie He <xie.he.0141@gmail.com> wrote: > > > > This seems so weird to me. If a user is using an AF_PACKET/RAW socket, > > the user is supposed to do what the header_ops->create function does > > (that is, creating two headers and leaving two holes to be filled in > > later). I think no user would actually do that. That is so weird. > > Well, AF_PACKET RAW socket is supposed to construct L2 headers > from the user buffer, and for a tunnel device these headers are indeed its > L2. If users do not want to do this, they can switch to DGRAM anyway. > > I know how inconvenient it is to construct a GRE tunnel header, I guess > this is why all other tunnel devices do not provide a header_ops::create. > GRE tunnel is not a special case, this is why I agree on removing its > ->create() although it does look like all tunnel devices should let users > construct headers by definition of RAW. > > Of course, it may be too late to change this behavior even if we really > want, users may already assume there is always no tunnel header needed > to construct. Another reason why tunnel devices usually don't provide header_ops->create might be to keep consistency with TAP devices. TAP devices are just like tunnel (TUN) devices except that they use an additional Ethernet header after the tunnel headers. I guess that TAP devices would usually use Ethernet's header_ops to expose only the Ethernet header to the user, but hide the outer tunnel headers from the user and let them be constructed on the transmission path (so that TAP devices would appear exactly like Ethernet devices). If we want to keep TUN devices consistent with TAP devices, we should hide the tunnel headers from the user for TUN devices, too. Actually, a lot of devices expose a fake L2 header in header_ops, instead of their real L2 header anyway. Wi-Fi devices usually pretend to be Ethernet devices and expose an Ethernet header in header_ops. So I think it is acceptable to not expose the real L2 header in header_ops. (This is also what needed_headroom is created for - to request additional header space for headers not exposed in header_ops.)
On Sat, Oct 10, 2020 at 2:49 PM Xie He <xie.he.0141@gmail.com> wrote: > > Another reason why tunnel devices usually don't provide > header_ops->create might be to keep consistency with TAP devices. TAP > devices are just like tunnel (TUN) devices except that they use an > additional Ethernet header after the tunnel headers. I guess that TAP > devices would usually use Ethernet's header_ops to expose only the > Ethernet header to the user, but hide the outer tunnel headers from > the user and let them be constructed on the transmission path (so that > TAP devices would appear exactly like Ethernet devices). If we want to > keep TUN devices consistent with TAP devices, we should hide the > tunnel headers from the user for TUN devices, too. Actually there's a "Universal TUN/TAP driver" in the kernel, which passes L3 packets or Ethernet frames (that are being sent) back to the user space and lets a user space program add the tunnel headers. https://www.kernel.org/doc/Documentation/networking/tuntap.rst In this case, we are not able to construct the tunnel headers until we pass the packets / Ethernet frames back to the user space to the user space program. We can only hide the tunnel headers. To keep other TUN/TAP devices in the kernel consistent with the "Universal TUN/TAP driver", we should hide the tunnel headers from the user for those devices, too. > Actually, a lot of devices expose a fake L2 header in header_ops, > instead of their real L2 header anyway. Wi-Fi devices usually pretend > to be Ethernet devices and expose an Ethernet header in header_ops. So > I think it is acceptable to not expose the real L2 header in > header_ops. (This is also what needed_headroom is created for - to > request additional header space for headers not exposed in > header_ops.)
On Sat, Oct 10, 2020 at 11:58 AM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > Well, AF_PACKET RAW socket is supposed to construct L2 headers > from the user buffer, and for a tunnel device these headers are indeed its > L2. If users do not want to do this, they can switch to DGRAM anyway. > > I know how inconvenient it is to construct a GRE tunnel header, I guess > this is why all other tunnel devices do not provide a header_ops::create. > GRE tunnel is not a special case, this is why I agree on removing its > ->create() although it does look like all tunnel devices should let users > construct headers by definition of RAW. > > Of course, it may be too late to change this behavior even if we really > want, users may already assume there is always no tunnel header needed > to construct. Sorry. I was re-thinking about whether we should expose certain headers in header_ops or not. Now my thoughts are a little bit different, so I want to re-reply to your email. 1. The first reason why I think we should not expose the GRE header in header_ops, is that in this way we can keep the un-exposed headers of GRE devices consistent with those of GRETAP devices. (This is just like the consistency between "universal TUN devices" and "universal TAP devices".) This is similar to what I wrote in my previous replies. However, after thinking I think this only applies to GRE/GRETAP but not to some other tunnel devices. For example, for PPP, we can't actually transmit Ethernet frames directly over PPP (but need to encapsulate them in BCP over PPP), so there's no way to keep the un-exposed headers of PPP devices consistent with those of "PPP TAP" (PPP bridging) devices anyway. 2. Second, I think the headers before the GRE header (the outer IP header and the "encap" header) should actually belong to the underlying network being tunneled through, and should not be viewed as L2 headers of the tunneling network. As for the GRE header, it might be better viewed as a "bridge" between the tunneled network and the tunneling network, and belongs neither of the two, so it should not be viewed as the L2 header of the tunneling network either. However, for PPP tunnels, I think the PPP header clearly belongs to the tunneling network's L2 header (because PPP can run directly over a hardware link, and even if it is used for a tunnel, it needs a specific underlying network's protocol, such as SSH or PPPoE, to transport). So I think it is better to expose the PPP header in header_ops.
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 4e31f23e4117..43b62095559e 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -987,10 +987,12 @@ static int ipgre_tunnel_init(struct net_device *dev) return -EINVAL; dev->flags = IFF_BROADCAST; dev->header_ops = &ipgre_header_ops; + dev->hard_header_len = tunnel->hlen + sizeof(*iph); } #endif } else if (!tunnel->collect_md) { dev->header_ops = &ipgre_header_ops; + dev->hard_header_len = tunnel->hlen + sizeof(*iph); } return ip_tunnel_init(dev);
GRE tunnel has its own header_ops, ipgre_header_ops, and sets it conditionally. When it is set, it assumes the outer IP header is already created before ipgre_xmit(). This is not true when we send packets through a raw packet socket, where L2 headers are supposed to be constructed by user. Packet socket calls dev_validate_header() to validate the header. But GRE tunnel does not set dev->hard_header_len, so that check can be simply bypassed, therefore uninit memory could be passed down to ipgre_xmit(). Fix this by setting dev->hard_header_len whenever sets header_ops, as dev->hard_header_len is supposed to be the length of the header created by dev->header_ops->create() anyway. Reported-and-tested-by: syzbot+4a2c52677a8a1aa283cb@syzkaller.appspotmail.com Cc: William Tu <u9012063@gmail.com> Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> --- net/ipv4/ip_gre.c | 2 ++ 1 file changed, 2 insertions(+)