Skip to content
Snippets Groups Projects
  1. Dec 27, 2021
  2. Dec 25, 2021
    • Xin Long's avatar
      sctp: use call_rcu to free endpoint · 5ec7d18d
      Xin Long authored
      
      This patch is to delay the endpoint free by calling call_rcu() to fix
      another use-after-free issue in sctp_sock_dump():
      
        BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20
        Call Trace:
          __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218
          lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
          __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
          _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
          spin_lock_bh include/linux/spinlock.h:334 [inline]
          __lock_sock+0x203/0x350 net/core/sock.c:2253
          lock_sock_nested+0xfe/0x120 net/core/sock.c:2774
          lock_sock include/net/sock.h:1492 [inline]
          sctp_sock_dump+0x122/0xb20 net/sctp/diag.c:324
          sctp_for_each_transport+0x2b5/0x370 net/sctp/socket.c:5091
          sctp_diag_dump+0x3ac/0x660 net/sctp/diag.c:527
          __inet_diag_dump+0xa8/0x140 net/ipv4/inet_diag.c:1049
          inet_diag_dump+0x9b/0x110 net/ipv4/inet_diag.c:1065
          netlink_dump+0x606/0x1080 net/netlink/af_netlink.c:2244
          __netlink_dump_start+0x59a/0x7c0 net/netlink/af_netlink.c:2352
          netlink_dump_start include/linux/netlink.h:216 [inline]
          inet_diag_handler_cmd+0x2ce/0x3f0 net/ipv4/inet_diag.c:1170
          __sock_diag_cmd net/core/sock_diag.c:232 [inline]
          sock_diag_rcv_msg+0x31d/0x410 net/core/sock_diag.c:263
          netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2477
          sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:274
      
      This issue occurs when asoc is peeled off and the old sk is freed after
      getting it by asoc->base.sk and before calling lock_sock(sk).
      
      To prevent the sk free, as a holder of the sk, ep should be alive when
      calling lock_sock(). This patch uses call_rcu() and moves sock_put and
      ep free into sctp_endpoint_destroy_rcu(), so that it's safe to try to
      hold the ep under rcu_read_lock in sctp_transport_traverse_process().
      
      If sctp_endpoint_hold() returns true, it means this ep is still alive
      and we have held it and can continue to dump it; If it returns false,
      it means this ep is dead and can be freed after rcu_read_unlock, and
      we should skip it.
      
      In sctp_sock_dump(), after locking the sk, if this ep is different from
      tsp->asoc->ep, it means during this dumping, this asoc was peeled off
      before calling lock_sock(), and the sk should be skipped; If this ep is
      the same with tsp->asoc->ep, it means no peeloff happens on this asoc,
      and due to lock_sock, no peeloff will happen either until release_sock.
      
      Note that delaying endpoint free won't delay the port release, as the
      port release happens in sctp_endpoint_destroy() before calling call_rcu().
      Also, freeing endpoint by call_rcu() makes it safe to access the sk by
      asoc->base.sk in sctp_assocs_seq_show() and sctp_rcv().
      
      Thanks Jones to bring this issue up.
      
      v1->v2:
        - improve the changelog.
        - add kfree(ep) into sctp_endpoint_destroy_rcu(), as Jakub noticed.
      
      Reported-by: default avatar <syzbot+9276d76e83e3bcde6c99@syzkaller.appspotmail.com>
      Reported-by: default avatarLee Jones <lee.jones@linaro.org>
      Fixes: d25adbeb ("sctp: fix an use-after-free issue in sctp_sock_dump")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ec7d18d
  3. Dec 22, 2021
  4. Dec 21, 2021
    • Willem de Bruijn's avatar
      net: skip virtio_net_hdr_set_proto if protocol already set · 1ed1d592
      Willem de Bruijn authored
      
      virtio_net_hdr_set_proto infers skb->protocol from the virtio_net_hdr
      gso_type, to avoid packets getting dropped for lack of a proto type.
      
      Its protocol choice is a guess, especially in the case of UFO, where
      the single VIRTIO_NET_HDR_GSO_UDP label covers both UFOv4 and UFOv6.
      
      Skip this best effort if the field is already initialized. Whether
      explicitly from userspace, or implicitly based on an earlier call to
      dev_parse_header_protocol (which is more robust, but was introduced
      after this patch).
      
      Fixes: 9d2f67e4 ("net/packet: fix packet drop as of virtio gso")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20211220145027.2784293-1-willemdebruijn.kernel@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1ed1d592
    • Willem de Bruijn's avatar
      net: accept UFOv6 packages in virtio_net_hdr_to_skb · 7e5cced9
      Willem de Bruijn authored
      Skb with skb->protocol 0 at the time of virtio_net_hdr_to_skb may have
      a protocol inferred from virtio_net_hdr with virtio_net_hdr_set_proto.
      
      Unlike TCP, UDP does not have separate types for IPv4 and IPv6. Type
      VIRTIO_NET_HDR_GSO_UDP is guessed to be IPv4/UDP. As of the below
      commit, UFOv6 packets are dropped due to not matching the protocol as
      obtained from dev_parse_header_protocol.
      
      Invert the test to take that L2 protocol field as starting point and
      pass both UFOv4 and UFOv6 for VIRTIO_NET_HDR_GSO_UDP.
      
      Fixes: 924a9bc3 ("net: check if protocol extracted by virtio_net_hdr_set_proto is correct")
      Link: https://lore.kernel.org/netdev/CABcq3pG9GRCYqFDBAJ48H1vpnnX=41u+MhQnayF1ztLH4WX0Fw@mail.gmail.com/
      
      
      Reported-by: default avatarAndrew Melnichenko <andrew@daynix.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20211220144901.2784030-1-willemdebruijn.kernel@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7e5cced9
    • Eric Dumazet's avatar
      inet: fully convert sk->sk_rx_dst to RCU rules · 8f905c0e
      Eric Dumazet authored
      
      syzbot reported various issues around early demux,
      one being included in this changelog [1]
      
      sk->sk_rx_dst is using RCU protection without clearly
      documenting it.
      
      And following sequences in tcp_v4_do_rcv()/tcp_v6_do_rcv()
      are not following standard RCU rules.
      
      [a]    dst_release(dst);
      [b]    sk->sk_rx_dst = NULL;
      
      They look wrong because a delete operation of RCU protected
      pointer is supposed to clear the pointer before
      the call_rcu()/synchronize_rcu() guarding actual memory freeing.
      
      In some cases indeed, dst could be freed before [b] is done.
      
      We could cheat by clearing sk_rx_dst before calling
      dst_release(), but this seems the right time to stick
      to standard RCU annotations and debugging facilities.
      
      [1]
      BUG: KASAN: use-after-free in dst_check include/net/dst.h:470 [inline]
      BUG: KASAN: use-after-free in tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
      Read of size 2 at addr ffff88807f1cb73a by task syz-executor.5/9204
      
      CPU: 0 PID: 9204 Comm: syz-executor.5 Not tainted 5.16.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247
       __kasan_report mm/kasan/report.c:433 [inline]
       kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
       dst_check include/net/dst.h:470 [inline]
       tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
       ip_rcv_finish_core.constprop.0+0x15de/0x1e80 net/ipv4/ip_input.c:340
       ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
       ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
       ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
       __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
       __netif_receive_skb_list net/core/dev.c:5608 [inline]
       netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
       gro_normal_list net/core/dev.c:5853 [inline]
       gro_normal_list net/core/dev.c:5849 [inline]
       napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
       __napi_poll+0xaf/0x440 net/core/dev.c:7023
       napi_poll net/core/dev.c:7090 [inline]
       net_rx_action+0x801/0xb40 net/core/dev.c:7177
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
       invoke_softirq kernel/softirq.c:432 [inline]
       __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
       irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
       common_interrupt+0x52/0xc0 arch/x86/kernel/irq.c:240
       asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:629
      RIP: 0033:0x7f5e972bfd57
      Code: 39 d1 73 14 0f 1f 80 00 00 00 00 48 8b 50 f8 48 83 e8 08 48 39 ca 77 f3 48 39 c3 73 3e 48 89 13 48 8b 50 f8 48 89 38 49 8b 0e <48> 8b 3e 48 83 c3 08 48 83 c6 08 eb bc 48 39 d1 72 9e 48 39 d0 73
      RSP: 002b:00007fff8a413210 EFLAGS: 00000283
      RAX: 00007f5e97108990 RBX: 00007f5e97108338 RCX: ffffffff81d3aa45
      RDX: ffffffff81d3aa45 RSI: 00007f5e97108340 RDI: ffffffff81d3aa45
      RBP: 00007f5e97107eb8 R08: 00007f5e97108d88 R09: 0000000093c2e8d9
      R10: 0000000000000000 R11: 0000000000000000 R12: 00007f5e97107eb0
      R13: 00007f5e97108338 R14: 00007f5e97107ea8 R15: 0000000000000019
       </TASK>
      
      Allocated by task 13:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:46 [inline]
       set_alloc_info mm/kasan/common.c:434 [inline]
       __kasan_slab_alloc+0x90/0xc0 mm/kasan/common.c:467
       kasan_slab_alloc include/linux/kasan.h:259 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slub.c:3234 [inline]
       slab_alloc mm/slub.c:3242 [inline]
       kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3247
       dst_alloc+0x146/0x1f0 net/core/dst.c:92
       rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
       ip_route_input_slow+0x1817/0x3a20 net/ipv4/route.c:2340
       ip_route_input_rcu net/ipv4/route.c:2470 [inline]
       ip_route_input_noref+0x116/0x2a0 net/ipv4/route.c:2415
       ip_rcv_finish_core.constprop.0+0x288/0x1e80 net/ipv4/ip_input.c:354
       ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
       ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
       ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
       __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
       __netif_receive_skb_list net/core/dev.c:5608 [inline]
       netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
       gro_normal_list net/core/dev.c:5853 [inline]
       gro_normal_list net/core/dev.c:5849 [inline]
       napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
       __napi_poll+0xaf/0x440 net/core/dev.c:7023
       napi_poll net/core/dev.c:7090 [inline]
       net_rx_action+0x801/0xb40 net/core/dev.c:7177
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Freed by task 13:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       kasan_set_track+0x21/0x30 mm/kasan/common.c:46
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
       ____kasan_slab_free mm/kasan/common.c:366 [inline]
       ____kasan_slab_free mm/kasan/common.c:328 [inline]
       __kasan_slab_free+0xff/0x130 mm/kasan/common.c:374
       kasan_slab_free include/linux/kasan.h:235 [inline]
       slab_free_hook mm/slub.c:1723 [inline]
       slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749
       slab_free mm/slub.c:3513 [inline]
       kmem_cache_free+0xbd/0x5d0 mm/slub.c:3530
       dst_destroy+0x2d6/0x3f0 net/core/dst.c:127
       rcu_do_batch kernel/rcu/tree.c:2506 [inline]
       rcu_core+0x7ab/0x1470 kernel/rcu/tree.c:2741
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Last potentially related work creation:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       __kasan_record_aux_stack+0xf5/0x120 mm/kasan/generic.c:348
       __call_rcu kernel/rcu/tree.c:2985 [inline]
       call_rcu+0xb1/0x740 kernel/rcu/tree.c:3065
       dst_release net/core/dst.c:177 [inline]
       dst_release+0x79/0xe0 net/core/dst.c:167
       tcp_v4_do_rcv+0x612/0x8d0 net/ipv4/tcp_ipv4.c:1712
       sk_backlog_rcv include/net/sock.h:1030 [inline]
       __release_sock+0x134/0x3b0 net/core/sock.c:2768
       release_sock+0x54/0x1b0 net/core/sock.c:3300
       tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1441
       inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       sock_write_iter+0x289/0x3c0 net/socket.c:1057
       call_write_iter include/linux/fs.h:2162 [inline]
       new_sync_write+0x429/0x660 fs/read_write.c:503
       vfs_write+0x7cd/0xae0 fs/read_write.c:590
       ksys_write+0x1ee/0x250 fs/read_write.c:643
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The buggy address belongs to the object at ffff88807f1cb700
       which belongs to the cache ip_dst_cache of size 176
      The buggy address is located 58 bytes inside of
       176-byte region [ffff88807f1cb700, ffff88807f1cb7b0)
      The buggy address belongs to the page:
      page:ffffea0001fc72c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7f1cb
      flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000000200 dead000000000100 dead000000000122 ffff8881413bb780
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 5, ts 108466983062, free_ts 108048976062
       prep_new_page mm/page_alloc.c:2418 [inline]
       get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369
       alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191
       alloc_slab_page mm/slub.c:1793 [inline]
       allocate_slab mm/slub.c:1930 [inline]
       new_slab+0x32d/0x4a0 mm/slub.c:1993
       ___slab_alloc+0x918/0xfe0 mm/slub.c:3022
       __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3109
       slab_alloc_node mm/slub.c:3200 [inline]
       slab_alloc mm/slub.c:3242 [inline]
       kmem_cache_alloc+0x35c/0x3a0 mm/slub.c:3247
       dst_alloc+0x146/0x1f0 net/core/dst.c:92
       rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
       __mkroute_output net/ipv4/route.c:2564 [inline]
       ip_route_output_key_hash_rcu+0x921/0x2d00 net/ipv4/route.c:2791
       ip_route_output_key_hash+0x18b/0x300 net/ipv4/route.c:2619
       __ip_route_output_key include/net/route.h:126 [inline]
       ip_route_output_flow+0x23/0x150 net/ipv4/route.c:2850
       ip_route_output_key include/net/route.h:142 [inline]
       geneve_get_v4_rt+0x3a6/0x830 drivers/net/geneve.c:809
       geneve_xmit_skb drivers/net/geneve.c:899 [inline]
       geneve_xmit+0xc4a/0x3540 drivers/net/geneve.c:1082
       __netdev_start_xmit include/linux/netdevice.h:4994 [inline]
       netdev_start_xmit include/linux/netdevice.h:5008 [inline]
       xmit_one net/core/dev.c:3590 [inline]
       dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3606
       __dev_queue_xmit+0x299a/0x3650 net/core/dev.c:4229
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1338 [inline]
       free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389
       free_unref_page_prepare mm/page_alloc.c:3309 [inline]
       free_unref_page+0x19/0x690 mm/page_alloc.c:3388
       qlink_free mm/kasan/quarantine.c:146 [inline]
       qlist_free_all+0x5a/0xc0 mm/kasan/quarantine.c:165
       kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272
       __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:444
       kasan_slab_alloc include/linux/kasan.h:259 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slub.c:3234 [inline]
       kmem_cache_alloc_node+0x255/0x3f0 mm/slub.c:3270
       __alloc_skb+0x215/0x340 net/core/skbuff.c:414
       alloc_skb include/linux/skbuff.h:1126 [inline]
       alloc_skb_with_frags+0x93/0x620 net/core/skbuff.c:6078
       sock_alloc_send_pskb+0x783/0x910 net/core/sock.c:2575
       mld_newpack+0x1df/0x770 net/ipv6/mcast.c:1754
       add_grhead+0x265/0x330 net/ipv6/mcast.c:1857
       add_grec+0x1053/0x14e0 net/ipv6/mcast.c:1995
       mld_send_initial_cr.part.0+0xf6/0x230 net/ipv6/mcast.c:2242
       mld_send_initial_cr net/ipv6/mcast.c:1232 [inline]
       mld_dad_work+0x1d3/0x690 net/ipv6/mcast.c:2268
       process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
       worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
      
      Memory state around the buggy address:
       ffff88807f1cb600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88807f1cb680: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
      >ffff88807f1cb700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                              ^
       ffff88807f1cb780: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
       ffff88807f1cb800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 41063e9d ("ipv4: Early TCP socket demux.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20211220143330.680945-1-eric.dumazet@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8f905c0e
  5. Dec 18, 2021
  6. Dec 17, 2021
  7. Dec 16, 2021
    • Jens Wiklander's avatar
      tee: handle lookup of shm with reference count 0 · dfd0743f
      Jens Wiklander authored
      
      Since the tee subsystem does not keep a strong reference to its idle
      shared memory buffers, it races with other threads that try to destroy a
      shared memory through a close of its dma-buf fd or by unmapping the
      memory.
      
      In tee_shm_get_from_id() when a lookup in teedev->idr has been
      successful, it is possible that the tee_shm is in the dma-buf teardown
      path, but that path is blocked by the teedev mutex. Since we don't have
      an API to tell if the tee_shm is in the dma-buf teardown path or not we
      must find another way of detecting this condition.
      
      Fix this by doing the reference counting directly on the tee_shm using a
      new refcount_t refcount field. dma-buf is replaced by using
      anon_inode_getfd() instead, this separates the life-cycle of the
      underlying file from the tee_shm. tee_shm_put() is updated to hold the
      mutex when decreasing the refcount to 0 and then remove the tee_shm from
      teedev->idr before releasing the mutex. This means that the tee_shm can
      never be found unless it has a refcount larger than 0.
      
      Fixes: 967c9cca ("tee: generic TEE subsystem")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarLars Persson <larper@axis.com>
      Reviewed-by: default avatarSumit Garg <sumit.garg@linaro.org>
      Reported-by: default avatarPatrik Lantz <patrik.lantz@axis.com>
      Signed-off-by: default avatarJens Wiklander <jens.wiklander@linaro.org>
      dfd0743f
    • Juergen Gross's avatar
      xen/console: harden hvc_xen against event channel storms · fe415186
      Juergen Gross authored
      
      The Xen console driver is still vulnerable for an attack via excessive
      number of events sent by the backend. Fix that by using a lateeoi event
      channel.
      
      For the normal domU initial console this requires the introduction of
      bind_evtchn_to_irq_lateeoi() as there is no xenbus device available
      at the time the event channel is bound to the irq.
      
      As the decision whether an interrupt was spurious or not requires to
      test for bytes having been read from the backend, move sending the
      event into the if statement, as sending an event without having found
      any bytes to be read is making no sense at all.
      
      This is part of XSA-391
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      ---
      V2:
      - slightly adapt spurious irq detection (Jan Beulich)
      V3:
      - fix spurious irq detection (Jan Beulich)
      fe415186
  8. Dec 15, 2021
  9. Dec 14, 2021
  10. Dec 11, 2021
    • SeongJae Park's avatar
      timers: implement usleep_idle_range() · e4779015
      SeongJae Park authored
      Patch series "mm/damon: Fix fake /proc/loadavg reports", v3.
      
      This patchset fixes DAMON's fake load report issue.  The first patch
      makes yet another variant of usleep_range() for this fix, and the second
      patch fixes the issue of DAMON by making it using the newly introduced
      function.
      
      This patch (of 2):
      
      Some kernel threads such as DAMON could need to repeatedly sleep in
      micro seconds level.  Because usleep_range() sleeps in uninterruptible
      state, however, such threads would make /proc/loadavg reports fake load.
      
      To help such cases, this commit implements a variant of usleep_range()
      called usleep_idle_range().  It is same to usleep_range() but sets the
      state of the current task as TASK_IDLE while sleeping.
      
      Link: https://lkml.kernel.org/r/20211126145015.15862-1-sj@kernel.org
      Link: https://lkml.kernel.org/r/20211126145015.15862-2-sj@kernel.org
      
      
      Signed-off-by: default avatarSeongJae Park <sj@kernel.org>
      Suggested-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e4779015
    • Drew DeVault's avatar
      Increase default MLOCK_LIMIT to 8 MiB · 9dcc38e2
      Drew DeVault authored
      This limit has not been updated since 2008, when it was increased to 64
      KiB at the request of GnuPG.  Until recently, the main use-cases for this
      feature were (1) preventing sensitive memory from being swapped, as in
      GnuPG's use-case; and (2) real-time use-cases.  In the first case, little
      memory is called for, and in the second case, the user is generally in a
      position to increase it if they need more.
      
      The introduction of IOURING_REGISTER_BUFFERS adds a third use-case:
      preparing fixed buffers for high-performance I/O.  This use-case will take
      as much of this memory as it can get, but is still limited to 64 KiB by
      default, which is very little.  This increases the limit to 8 MB, which
      was chosen fairly arbitrarily as a more generous, but still conservative,
      default value.
      
      It is also possible to raise this limit in userspace.  This is easily
      done, for example, in the use-case of a network daemon: systemd, for
      instance, provides for this via LimitMEMLOCK in the service file; OpenRC
      via the rc_ulimit variables.  However, there is no established userspace
      facility for configuring this outside of daemons: end-user applications do
      not presently have access to a convenient means of raising their limits.
      
      The buck, as it were, stops with the kernel.  It's much easier to address
      it here than it is to bring it to hundreds of distributions, and it can
      only realistically be relied upon to be high-enough by end-user software
      if it is more-or-less ubiquitous.  Most distros don't change this
      particular rlimit from the kernel-supplied default value, so a change here
      will easily provide that ubiquity.
      
      Link: https://lkml.kernel.org/r/20211028080813.15966-1-sir@cmpwn.com
      
      
      Signed-off-by: default avatarDrew DeVault <sir@cmpwn.com>
      Acked-by: default avatarJens Axboe <axboe@kernel.dk>
      Acked-by: default avatarCyril Hrubis <chrubis@suse.cz>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Andrew Dona-Couch <andrew@donacou.ch>
      Cc: Ammar Faizi <ammarfaizi2@gnuweeb.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9dcc38e2
  11. Dec 09, 2021
  12. Dec 08, 2021
    • Rafael J. Wysocki's avatar
      PM: runtime: Fix pm_runtime_active() kerneldoc comment · 444dd878
      Rafael J. Wysocki authored
      
      The kerneldoc comment of pm_runtime_active() does not reflect the
      behavior of the function, so update it accordingly.
      
      Fixes: 403d2d11 ("PM: runtime: Add kerneldoc comments to multiple helpers")
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      444dd878
    • Yanteng Si's avatar
      net: phy: Remove unnecessary indentation in the comments of phy_device · a97770cc
      Yanteng Si authored
      
      Fix warning as:
      
      linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:543: WARNING: Unexpected indentation.
      linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:544: WARNING: Block quote ends without a blank line; unexpected unindent.
      linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:546: WARNING: Unexpected indentation.
      
      Suggested-by: default avatarAkira Yokosawa <akiyks@gmail.com>
      Signed-off-by: default avatarYanteng Si <siyanteng@loongson.cn>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a97770cc
    • Eric Dumazet's avatar
      netfilter: conntrack: annotate data-races around ct->timeout · 802a7dc5
      Eric Dumazet authored
      
      (struct nf_conn)->timeout can be read/written locklessly,
      add READ_ONCE()/WRITE_ONCE() to prevent load/store tearing.
      
      BUG: KCSAN: data-race in __nf_conntrack_alloc / __nf_conntrack_find_get
      
      write to 0xffff888132e78c08 of 4 bytes by task 6029 on cpu 0:
       __nf_conntrack_alloc+0x158/0x280 net/netfilter/nf_conntrack_core.c:1563
       init_conntrack+0x1da/0xb30 net/netfilter/nf_conntrack_core.c:1635
       resolve_normal_ct+0x502/0x610 net/netfilter/nf_conntrack_core.c:1746
       nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901
       ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414
       nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
       nf_hook_slow+0x72/0x170 net/netfilter/core.c:619
       nf_hook include/linux/netfilter.h:262 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324
       inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135
       __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402
       tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline]
       tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680
       __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864
       tcp_push_pending_frames include/net/tcp.h:1897 [inline]
       tcp_data_snd_check+0x62/0x2e0 net/ipv4/tcp_input.c:5452
       tcp_rcv_established+0x880/0x10e0 net/ipv4/tcp_input.c:5947
       tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521
       sk_backlog_rcv include/net/sock.h:1030 [inline]
       __release_sock+0xf2/0x270 net/core/sock.c:2768
       release_sock+0x40/0x110 net/core/sock.c:3300
       sk_stream_wait_memory+0x435/0x700 net/core/stream.c:145
       tcp_sendmsg_locked+0xb85/0x25a0 net/ipv4/tcp.c:1402
       tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1440
       inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:644
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       __sys_sendto+0x21e/0x2c0 net/socket.c:2036
       __do_sys_sendto net/socket.c:2048 [inline]
       __se_sys_sendto net/socket.c:2044 [inline]
       __x64_sys_sendto+0x74/0x90 net/socket.c:2044
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888132e78c08 of 4 bytes by task 17446 on cpu 1:
       nf_ct_is_expired include/net/netfilter/nf_conntrack.h:286 [inline]
       ____nf_conntrack_find net/netfilter/nf_conntrack_core.c:776 [inline]
       __nf_conntrack_find_get+0x1c7/0xac0 net/netfilter/nf_conntrack_core.c:807
       resolve_normal_ct+0x273/0x610 net/netfilter/nf_conntrack_core.c:1734
       nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901
       ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414
       nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
       nf_hook_slow+0x72/0x170 net/netfilter/core.c:619
       nf_hook include/linux/netfilter.h:262 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324
       inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135
       __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402
       __tcp_send_ack+0x1fd/0x300 net/ipv4/tcp_output.c:3956
       tcp_send_ack+0x23/0x30 net/ipv4/tcp_output.c:3962
       __tcp_ack_snd_check+0x2d8/0x510 net/ipv4/tcp_input.c:5478
       tcp_ack_snd_check net/ipv4/tcp_input.c:5523 [inline]
       tcp_rcv_established+0x8c2/0x10e0 net/ipv4/tcp_input.c:5948
       tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521
       sk_backlog_rcv include/net/sock.h:1030 [inline]
       __release_sock+0xf2/0x270 net/core/sock.c:2768
       release_sock+0x40/0x110 net/core/sock.c:3300
       tcp_sendpage+0x94/0xb0 net/ipv4/tcp.c:1114
       inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833
       rds_tcp_xmit+0x376/0x5f0 net/rds/tcp_send.c:118
       rds_send_xmit+0xbed/0x1500 net/rds/send.c:367
       rds_send_worker+0x43/0x200 net/rds/threads.c:200
       process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
       worker_thread+0x616/0xa70 kernel/workqueue.c:2445
       kthread+0x2c7/0x2e0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00027cc2 -> 0x00000000
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 17446 Comm: kworker/u4:5 Tainted: G        W         5.16.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: krdsd rds_send_worker
      
      Note: I chose an arbitrary commit for the Fixes: tag,
      because I do not think we need to backport this fix to very old kernels.
      
      Fixes: e37542ba ("netfilter: conntrack: avoid possible false sharing")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      802a7dc5
  13. Dec 03, 2021
    • Jakub Kicinski's avatar
      treewide: Add missing includes masked by cgroup -> bpf dependency · 8581fd40
      Jakub Kicinski authored
      
      cgroup.h (therefore swap.h, therefore half of the universe)
      includes bpf.h which in turn includes module.h and slab.h.
      Since we're about to get rid of that dependency we need
      to clean things up.
      
      v2: drop the cpu.h include from cacheinfo.h, it's not necessary
      and it makes riscv sensitive to ordering of include files.
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarKrzysztof Wilczyński <kw@linux.com>
      Acked-by: default avatarPeter Chen <peter.chen@kernel.org>
      Acked-by: default avatarSeongJae Park <sj@kernel.org>
      Acked-by: default avatarJani Nikula <jani.nikula@intel.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Link: https://lore.kernel.org/all/20211120035253.72074-1-kuba@kernel.org/  # v1
      Link: https://lore.kernel.org/all/20211120165528.197359-1-kuba@kernel.org/ # cacheinfo discussion
      Link: https://lore.kernel.org/bpf/20211202203400.1208663-1-kuba@kernel.org
      8581fd40
    • Eric Dumazet's avatar
      bonding: make tx_rebalance_counter an atomic · dac8e00f
      Eric Dumazet authored
      
      KCSAN reported a data-race [1] around tx_rebalance_counter
      which can be accessed from different contexts, without
      the protection of a lock/mutex.
      
      [1]
      BUG: KCSAN: data-race in bond_alb_init_slave / bond_alb_monitor
      
      write to 0xffff888157e8ca24 of 4 bytes by task 7075 on cpu 0:
       bond_alb_init_slave+0x713/0x860 drivers/net/bonding/bond_alb.c:1613
       bond_enslave+0xd94/0x3010 drivers/net/bonding/bond_main.c:1949
       do_set_master net/core/rtnetlink.c:2521 [inline]
       __rtnl_newlink net/core/rtnetlink.c:3475 [inline]
       rtnl_newlink+0x1298/0x13b0 net/core/rtnetlink.c:3506
       rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5571
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2491
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5589
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x5fc/0x6c0 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x6e1/0x7d0 net/netlink/af_netlink.c:1916
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2409
       ___sys_sendmsg net/socket.c:2463 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2492
       __do_sys_sendmsg net/socket.c:2501 [inline]
       __se_sys_sendmsg net/socket.c:2499 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2499
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888157e8ca24 of 4 bytes by task 1082 on cpu 1:
       bond_alb_monitor+0x8f/0xc00 drivers/net/bonding/bond_alb.c:1511
       process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
       worker_thread+0x616/0xa70 kernel/workqueue.c:2445
       kthread+0x2c7/0x2e0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00000001 -> 0x00000064
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 1082 Comm: kworker/u4:3 Not tainted 5.16.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: bond1 bond_alb_monitor
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dac8e00f
    • Eric Dumazet's avatar
      tcp: fix another uninit-value (sk_rx_queue_mapping) · 03cfda4f
      Eric Dumazet authored
      
      KMSAN is still not happy [1].
      
      I missed that passive connections do not inherit their
      sk_rx_queue_mapping values from the request socket,
      but instead tcp_child_process() is calling
      sk_mark_napi_id(child, skb)
      
      We have many sk_mark_napi_id() callers, so I am providing
      a new helper, forcing the setting sk_rx_queue_mapping
      and sk_napi_id.
      
      Note that we had no KMSAN report for sk_napi_id because
      passive connections got a copy of this field from the listener.
      sk_rx_queue_mapping in the other hand is inside the
      sk_dontcopy_begin/sk_dontcopy_end so sk_clone_lock()
      leaves this field uninitialized.
      
      We might remove dead code populating req->sk_rx_queue_mapping
      in the future.
      
      [1]
      
      BUG: KMSAN: uninit-value in __sk_rx_queue_set include/net/sock.h:1924 [inline]
      BUG: KMSAN: uninit-value in sk_rx_queue_update include/net/sock.h:1938 [inline]
      BUG: KMSAN: uninit-value in sk_mark_napi_id include/net/busy_poll.h:136 [inline]
      BUG: KMSAN: uninit-value in tcp_child_process+0xb42/0x1050 net/ipv4/tcp_minisocks.c:833
       __sk_rx_queue_set include/net/sock.h:1924 [inline]
       sk_rx_queue_update include/net/sock.h:1938 [inline]
       sk_mark_napi_id include/net/busy_poll.h:136 [inline]
       tcp_child_process+0xb42/0x1050 net/ipv4/tcp_minisocks.c:833
       tcp_v4_rcv+0x3d83/0x4ed0 net/ipv4/tcp_ipv4.c:2066
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
       run_ksoftirqd+0x33/0x50 kernel/softirq.c:920
       smpboot_thread_fn+0x616/0xbf0 kernel/smpboot.c:164
       kthread+0x721/0x850 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      Uninit was created at:
       __alloc_pages+0xbc7/0x10a0 mm/page_alloc.c:5409
       alloc_pages+0x8a5/0xb80
       alloc_slab_page mm/slub.c:1810 [inline]
       allocate_slab+0x287/0x1c20 mm/slub.c:1947
       new_slab mm/slub.c:2010 [inline]
       ___slab_alloc+0xbdf/0x1e90 mm/slub.c:3039
       __slab_alloc mm/slub.c:3126 [inline]
       slab_alloc_node mm/slub.c:3217 [inline]
       slab_alloc mm/slub.c:3259 [inline]
       kmem_cache_alloc+0xbb3/0x11c0 mm/slub.c:3264
       sk_prot_alloc+0xeb/0x570 net/core/sock.c:1914
       sk_clone_lock+0xd6/0x1940 net/core/sock.c:2118
       inet_csk_clone_lock+0x8d/0x6a0 net/ipv4/inet_connection_sock.c:956
       tcp_create_openreq_child+0xb1/0x1ef0 net/ipv4/tcp_minisocks.c:453
       tcp_v4_syn_recv_sock+0x268/0x2710 net/ipv4/tcp_ipv4.c:1563
       tcp_check_req+0x207c/0x2a30 net/ipv4/tcp_minisocks.c:765
       tcp_v4_rcv+0x36f5/0x4ed0 net/ipv4/tcp_ipv4.c:2047
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
      
      Fixes: 342159ee ("net: avoid dirtying sk->sk_rx_queue_mapping")
      Fixes: a37a0ee4 ("net: avoid uninit-value from tcp_conn_request")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Tested-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03cfda4f
  14. Dec 02, 2021
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALL · d9847eb8
      Kumar Kartikeya Dwivedi authored
      Vinicius Costa Gomes reported [0] that build fails when
      CONFIG_DEBUG_INFO_BTF is enabled and CONFIG_BPF_SYSCALL is disabled.
      This leads to btf.c not being compiled, and then no symbol being present
      in vmlinux for the declarations in btf.h. Since BTF is not useful
      without enabling BPF subsystem, disallow this combination.
      
      However, theoretically disabling both now could still fail, as the
      symbol for kfunc_btf_id_list variables is not available. This isn't a
      problem as the compiler usually optimizes the whole register/unregister
      call, but at lower optimization levels it can fail the build in linking
      stage.
      
      Fix that by adding dummy variables so that modules taking address of
      them still work, but the whole thing is a noop.
      
        [0]: https://lore.kernel.org/bpf/20211110205418.332403-1-vinicius.gomes@intel.com
      
      
      
      Fixes: 14f267d9 ("bpf: btf: Introduce helpers for dynamic BTF set registration")
      Reported-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20211122144742.477787-2-memxor@gmail.com
      d9847eb8
    • Greg Kroah-Hartman's avatar
      HID: add hid_is_usb() function to make it simpler for USB detection · f83baa0c
      Greg Kroah-Hartman authored
      
      A number of HID drivers already call hid_is_using_ll_driver() but only
      for the detection of if this is a USB device or not.  Make this more
      obvious by creating hid_is_usb() and calling the function that way.
      
      Also converts the existing hid_is_using_ll_driver() functions to use the
      new call.
      
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
      Cc: linux-input@vger.kernel.org
      Cc: stable@vger.kernel.org
      Tested-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Link: https://lore.kernel.org/r/20211201183503.2373082-1-gregkh@linuxfoundation.org
      f83baa0c
    • Frederic Weisbecker's avatar
      sched/cputime: Fix getrusage(RUSAGE_THREAD) with nohz_full · e7f2be11
      Frederic Weisbecker authored
      
      getrusage(RUSAGE_THREAD) with nohz_full may return shorter utime/stime
      than the actual time.
      
      task_cputime_adjusted() snapshots utime and stime and then adjust their
      sum to match the scheduler maintained cputime.sum_exec_runtime.
      Unfortunately in nohz_full, sum_exec_runtime is only updated once per
      second in the worst case, causing a discrepancy against utime and stime
      that can be updated anytime by the reader using vtime.
      
      To fix this situation, perform an update of cputime.sum_exec_runtime
      when the cputime snapshot reports the task as actually running while
      the tick is disabled. The related overhead is then contained within the
      relevant situations.
      
      Reported-by: default avatarHasegawa Hitomi <hasegawa-hitomi@fujitsu.com>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarHasegawa Hitomi <hasegawa-hitomi@fujitsu.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Acked-by: default avatarPhil Auld <pauld@redhat.com>
      Link: https://lore.kernel.org/r/20211026141055.57358-3-frederic@kernel.org
      e7f2be11
    • Xiayu Zhang's avatar
      Fix Comment of ETH_P_802_3_MIN · 72f6a452
      Xiayu Zhang authored
      
      The description of ETH_P_802_3_MIN is misleading.
      The value of EthernetType in Ethernet II frame is more than 0x0600,
      the value of Length in 802.3 frame is less than 0x0600.
      
      Signed-off-by: default avatarXiayu Zhang <Xiayu.Zhang@mediatek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72f6a452
    • Eric Dumazet's avatar
      ipv4: convert fib_num_tclassid_users to atomic_t · 213f5f8f
      Eric Dumazet authored
      
      Before commit faa041a4 ("ipv4: Create cleanup helper for fib_nh")
      changes to net->ipv4.fib_num_tclassid_users were protected by RTNL.
      
      After the change, this is no longer the case, as free_fib_info_rcu()
      runs after rcu grace period, without rtnl being held.
      
      Fixes: faa041a4 ("ipv4: Create cleanup helper for fib_nh")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      213f5f8f
    • Eric Dumazet's avatar
      net: avoid uninit-value from tcp_conn_request · a37a0ee4
      Eric Dumazet authored
      
      A recent change triggers a KMSAN warning, because request
      sockets do not initialize @sk_rx_queue_mapping field.
      
      Add sk_rx_queue_update() helper to make our intent clear.
      
      BUG: KMSAN: uninit-value in sk_rx_queue_set include/net/sock.h:1922 [inline]
      BUG: KMSAN: uninit-value in tcp_conn_request+0x3bcc/0x4dc0 net/ipv4/tcp_input.c:6922
       sk_rx_queue_set include/net/sock.h:1922 [inline]
       tcp_conn_request+0x3bcc/0x4dc0 net/ipv4/tcp_input.c:6922
       tcp_v4_conn_request+0x218/0x2a0 net/ipv4/tcp_ipv4.c:1528
       tcp_rcv_state_process+0x2c5/0x3290 net/ipv4/tcp_input.c:6406
       tcp_v4_do_rcv+0xb4e/0x1330 net/ipv4/tcp_ipv4.c:1738
       tcp_v4_rcv+0x468d/0x4ed0 net/ipv4/tcp_ipv4.c:2100
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
       invoke_softirq+0xa4/0x130 kernel/softirq.c:432
       __irq_exit_rcu kernel/softirq.c:636 [inline]
       irq_exit_rcu+0x76/0x130 kernel/softirq.c:648
       common_interrupt+0xb6/0xd0 arch/x86/kernel/irq.c:240
       asm_common_interrupt+0x1e/0x40
       smap_restore arch/x86/include/asm/smap.h:67 [inline]
       get_shadow_origin_ptr mm/kmsan/instrumentation.c:31 [inline]
       __msan_metadata_ptr_for_load_1+0x28/0x30 mm/kmsan/instrumentation.c:63
       tomoyo_check_acl+0x1b0/0x630 security/tomoyo/domain.c:173
       tomoyo_path_permission security/tomoyo/file.c:586 [inline]
       tomoyo_check_open_permission+0x61f/0xe10 security/tomoyo/file.c:777
       tomoyo_file_open+0x24f/0x2d0 security/tomoyo/tomoyo.c:311
       security_file_open+0xb1/0x1f0 security/security.c:1635
       do_dentry_open+0x4e4/0x1bf0 fs/open.c:809
       vfs_open+0xaf/0xe0 fs/open.c:957
       do_open fs/namei.c:3426 [inline]
       path_openat+0x52f1/0x5dd0 fs/namei.c:3559
       do_filp_open+0x306/0x760 fs/namei.c:3586
       do_sys_openat2+0x263/0x8f0 fs/open.c:1212
       do_sys_open fs/open.c:1228 [inline]
       __do_sys_open fs/open.c:1236 [inline]
       __se_sys_open fs/open.c:1232 [inline]
       __x64_sys_open+0x314/0x380 fs/open.c:1232
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was created at:
       __alloc_pages+0xbc7/0x10a0 mm/page_alloc.c:5409
       alloc_pages+0x8a5/0xb80
       alloc_slab_page mm/slub.c:1810 [inline]
       allocate_slab+0x287/0x1c20 mm/slub.c:1947
       new_slab mm/slub.c:2010 [inline]
       ___slab_alloc+0xbdf/0x1e90 mm/slub.c:3039
       __slab_alloc mm/slub.c:3126 [inline]
       slab_alloc_node mm/slub.c:3217 [inline]
       slab_alloc mm/slub.c:3259 [inline]
       kmem_cache_alloc+0xbb3/0x11c0 mm/slub.c:3264
       reqsk_alloc include/net/request_sock.h:91 [inline]
       inet_reqsk_alloc+0xaf/0x8b0 net/ipv4/tcp_input.c:6712
       tcp_conn_request+0x910/0x4dc0 net/ipv4/tcp_input.c:6852
       tcp_v4_conn_request+0x218/0x2a0 net/ipv4/tcp_ipv4.c:1528
       tcp_rcv_state_process+0x2c5/0x3290 net/ipv4/tcp_input.c:6406
       tcp_v4_do_rcv+0xb4e/0x1330 net/ipv4/tcp_ipv4.c:1738
       tcp_v4_rcv+0x468d/0x4ed0 net/ipv4/tcp_ipv4.c:2100
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
      
      Fixes: 342159ee ("net: avoid dirtying sk->sk_rx_queue_mapping")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20211130182939.2584764-1-eric.dumazet@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a37a0ee4
    • Eric Dumazet's avatar
      net: annotate data-races on txq->xmit_lock_owner · 7a10d8c8
      Eric Dumazet authored
      
      syzbot found that __dev_queue_xmit() is reading txq->xmit_lock_owner
      without annotations.
      
      No serious issue there, let's document what is happening there.
      
      BUG: KCSAN: data-race in __dev_queue_xmit / __dev_queue_xmit
      
      write to 0xffff888139d09484 of 4 bytes by interrupt on cpu 0:
       __netif_tx_unlock include/linux/netdevice.h:4437 [inline]
       __dev_queue_xmit+0x948/0xf70 net/core/dev.c:4229
       dev_queue_xmit_accel+0x19/0x20 net/core/dev.c:4265
       macvlan_queue_xmit drivers/net/macvlan.c:543 [inline]
       macvlan_start_xmit+0x2b3/0x3d0 drivers/net/macvlan.c:567
       __netdev_start_xmit include/linux/netdevice.h:4987 [inline]
       netdev_start_xmit include/linux/netdevice.h:5001 [inline]
       xmit_one+0x105/0x2f0 net/core/dev.c:3590
       dev_hard_start_xmit+0x72/0x120 net/core/dev.c:3606
       sch_direct_xmit+0x1b2/0x7c0 net/sched/sch_generic.c:342
       __dev_xmit_skb+0x83d/0x1370 net/core/dev.c:3817
       __dev_queue_xmit+0x590/0xf70 net/core/dev.c:4194
       dev_queue_xmit+0x13/0x20 net/core/dev.c:4259
       neigh_hh_output include/net/neighbour.h:511 [inline]
       neigh_output include/net/neighbour.h:525 [inline]
       ip6_finish_output2+0x995/0xbb0 net/ipv6/ip6_output.c:126
       __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
       ip6_finish_output+0x444/0x4c0 net/ipv6/ip6_output.c:201
       NF_HOOK_COND include/linux/netfilter.h:296 [inline]
       ip6_output+0x10e/0x210 net/ipv6/ip6_output.c:224
       dst_output include/net/dst.h:450 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ndisc_send_skb+0x486/0x610 net/ipv6/ndisc.c:508
       ndisc_send_rs+0x3b0/0x3e0 net/ipv6/ndisc.c:702
       addrconf_rs_timer+0x370/0x540 net/ipv6/addrconf.c:3898
       call_timer_fn+0x2e/0x240 kernel/time/timer.c:1421
       expire_timers+0x116/0x240 kernel/time/timer.c:1466
       __run_timers+0x368/0x410 kernel/time/timer.c:1734
       run_timer_softirq+0x2e/0x60 kernel/time/timer.c:1747
       __do_softirq+0x158/0x2de kernel/softirq.c:558
       __irq_exit_rcu kernel/softirq.c:636 [inline]
       irq_exit_rcu+0x37/0x70 kernel/softirq.c:648
       sysvec_apic_timer_interrupt+0x3e/0xb0 arch/x86/kernel/apic/apic.c:1097
       asm_sysvec_apic_timer_interrupt+0x12/0x20
      
      read to 0xffff888139d09484 of 4 bytes by interrupt on cpu 1:
       __dev_queue_xmit+0x5e3/0xf70 net/core/dev.c:4213
       dev_queue_xmit_accel+0x19/0x20 net/core/dev.c:4265
       macvlan_queue_xmit drivers/net/macvlan.c:543 [inline]
       macvlan_start_xmit+0x2b3/0x3d0 drivers/net/macvlan.c:567
       __netdev_start_xmit include/linux/netdevice.h:4987 [inline]
       netdev_start_xmit include/linux/netdevice.h:5001 [inline]
       xmit_one+0x105/0x2f0 net/core/dev.c:3590
       dev_hard_start_xmit+0x72/0x120 net/core/dev.c:3606
       sch_direct_xmit+0x1b2/0x7c0 net/sched/sch_generic.c:342
       __dev_xmit_skb+0x83d/0x1370 net/core/dev.c:3817
       __dev_queue_xmit+0x590/0xf70 net/core/dev.c:4194
       dev_queue_xmit+0x13/0x20 net/core/dev.c:4259
       neigh_resolve_output+0x3db/0x410 net/core/neighbour.c:1523
       neigh_output include/net/neighbour.h:527 [inline]
       ip6_finish_output2+0x9be/0xbb0 net/ipv6/ip6_output.c:126
       __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
       ip6_finish_output+0x444/0x4c0 net/ipv6/ip6_output.c:201
       NF_HOOK_COND include/linux/netfilter.h:296 [inline]
       ip6_output+0x10e/0x210 net/ipv6/ip6_output.c:224
       dst_output include/net/dst.h:450 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ndisc_send_skb+0x486/0x610 net/ipv6/ndisc.c:508
       ndisc_send_rs+0x3b0/0x3e0 net/ipv6/ndisc.c:702
       addrconf_rs_timer+0x370/0x540 net/ipv6/addrconf.c:3898
       call_timer_fn+0x2e/0x240 kernel/time/timer.c:1421
       expire_timers+0x116/0x240 kernel/time/timer.c:1466
       __run_timers+0x368/0x410 kernel/time/timer.c:1734
       run_timer_softirq+0x2e/0x60 kernel/time/timer.c:1747
       __do_softirq+0x158/0x2de kernel/softirq.c:558
       __irq_exit_rcu kernel/softirq.c:636 [inline]
       irq_exit_rcu+0x37/0x70 kernel/softirq.c:648
       sysvec_apic_timer_interrupt+0x8d/0xb0 arch/x86/kernel/apic/apic.c:1097
       asm_sysvec_apic_timer_interrupt+0x12/0x20
       kcsan_setup_watchpoint+0x94/0x420 kernel/kcsan/core.c:443
       folio_test_anon include/linux/page-flags.h:581 [inline]
       PageAnon include/linux/page-flags.h:586 [inline]
       zap_pte_range+0x5ac/0x10e0 mm/memory.c:1347
       zap_pmd_range mm/memory.c:1467 [inline]
       zap_pud_range mm/memory.c:1496 [inline]
       zap_p4d_range mm/memory.c:1517 [inline]
       unmap_page_range+0x2dc/0x3d0 mm/memory.c:1538
       unmap_single_vma+0x157/0x210 mm/memory.c:1583
       unmap_vmas+0xd0/0x180 mm/memory.c:1615
       exit_mmap+0x23d/0x470 mm/mmap.c:3170
       __mmput+0x27/0x1b0 kernel/fork.c:1113
       mmput+0x3d/0x50 kernel/fork.c:1134
       exit_mm+0xdb/0x170 kernel/exit.c:507
       do_exit+0x608/0x17a0 kernel/exit.c:819
       do_group_exit+0xce/0x180 kernel/exit.c:929
       get_signal+0xfc3/0x1550 kernel/signal.c:2852
       arch_do_signal_or_restart+0x8c/0x2e0 arch/x86/kernel/signal.c:868
       handle_signal_work kernel/entry/common.c:148 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
       exit_to_user_mode_prepare+0x113/0x190 kernel/entry/common.c:207
       __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
       syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
       do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00000000 -> 0xffffffff
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 28712 Comm: syz-executor.0 Tainted: G        W         5.16.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20211130170155.2331929-1-eric.dumazet@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7a10d8c8
    • Masami Hiramatsu's avatar
      kprobes: Limit max data_size of the kretprobe instances · 6bbfa441
      Masami Hiramatsu authored
      The 'kprobe::data_size' is unsigned, thus it can not be negative.  But if
      user sets it enough big number (e.g. (size_t)-8), the result of 'data_size
      + sizeof(struct kretprobe_instance)' becomes smaller than sizeof(struct
      kretprobe_instance) or zero. In result, the kretprobe_instance are
      allocated without enough memory, and kretprobe accesses outside of
      allocated memory.
      
      To avoid this issue, introduce a max limitation of the
      kretprobe::data_size. 4KB per instance should be OK.
      
      Link: https://lkml.kernel.org/r/163836995040.432120.10322772773821182925.stgit@devnote2
      
      
      
      Cc: stable@vger.kernel.org
      Fixes: f47cd9b5 ("kprobes: kretprobe user entry-handler")
      Reported-by: default avatarzhangyue <zhangyue1@kylinos.cn>
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      6bbfa441
  15. Dec 01, 2021
  16. Nov 30, 2021
    • Sebastian Andrzej Siewior's avatar
      bpf: Make sure bpf_disable_instrumentation() is safe vs preemption. · 79364031
      Sebastian Andrzej Siewior authored
      
      The initial implementation of migrate_disable() for mainline was a
      wrapper around preempt_disable(). RT kernels substituted this with a
      real migrate disable implementation.
      
      Later on mainline gained true migrate disable support, but neither
      documentation nor affected code were updated.
      
      Remove stale comments claiming that migrate_disable() is PREEMPT_RT only.
      
      Don't use __this_cpu_inc() in the !PREEMPT_RT path because preemption is
      not disabled and the RMW operation can be preempted.
      
      Fixes: 74d862b6 ("sched: Make migrate_disable/enable() independent of RT")
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20211127163200.10466-3-bigeasy@linutronix.de
      79364031
    • Arnd Bergmann's avatar
      siphash: use _unaligned version by default · f7e5b9bf
      Arnd Bergmann authored
      On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
      because the ordinary load/store instructions (ldr, ldrh, ldrb) can
      tolerate any misalignment of the memory address. However, load/store
      double and load/store multiple instructions (ldrd, ldm) may still only
      be used on memory addresses that are 32-bit aligned, and so we have to
      use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we
      may end up with a severe performance hit due to alignment traps that
      require fixups by the kernel. Testing shows that this currently happens
      with clang-13 but not gcc-11. In theory, any compiler version can
      produce this bug or other problems, as we are dealing with undefined
      behavior in C99 even on architectures that support this in hardware,
      see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363.
      
      Fortunately, the get_unaligned() accessors do the right thing: when
      building for ARMv6 or later, the compiler will emit unaligned accesses
      using the ordinary load/store instructions (but avoid the ones that
      require 32-bit alignment). When building for older ARM, those accessors
      will emit the appropriate sequence of ldrb/mov/orr instructions. And on
      architectures that can truly tolerate any kind of misalignment, the
      get_unaligned() accessors resolve to the leXX_to_cpup accessors that
      operate on aligned addresses.
      
      Since the compiler will in fact emit ldrd or ldm instructions when
      building this code for ARM v6 or later, the solution is to use the
      unaligned accessors unconditionally on architectures where this is
      known to be fast. The _aligned version of the hash function is
      however still needed to get the best performance on architectures
      that cannot do any unaligned access in hardware.
      
      This new version avoids the undefined behavior and should produce
      the fastest hash on all architectures we support.
      
      Link: https://lore.kernel.org/linux-arm-kernel/20181008211554.5355-4-ard.biesheuvel@linaro.org/
      Link: https://lore.kernel.org/linux-crypto/CAK8P3a2KfmmGDbVHULWevB0hv71P2oi2ZCHEAqT=8dQfa0=cqQ@mail.gmail.com/
      
      
      Reported-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Fixes: 2c956a60 ("siphash: add cryptographically secure PRF")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f7e5b9bf
    • Jason A. Donenfeld's avatar
      wireguard: device: reset peer src endpoint when netns exits · 20ae1d6a
      Jason A. Donenfeld authored
      
      Each peer's endpoint contains a dst_cache entry that takes a reference
      to another netdev. When the containing namespace exits, we take down the
      socket and prevent future sockets from being created (by setting
      creating_net to NULL), which removes that potential reference on the
      netns. However, it doesn't release references to the netns that a netdev
      cached in dst_cache might be taking, so the netns still might fail to
      exit. Since the socket is gimped anyway, we can simply clear all the
      dst_caches (by way of clearing the endpoint src), which will release all
      references.
      
      However, the current dst_cache_reset function only releases those
      references lazily. But it turns out that all of our usages of
      wg_socket_clear_peer_endpoint_src are called from contexts that are not
      exactly high-speed or bottle-necked. For example, when there's
      connection difficulty, or when userspace is reconfiguring the interface.
      And in particular for this patch, when the netns is exiting. So for
      those cases, it makes more sense to call dst_release immediately. For
      that, we add a small helper function to dst_cache.
      
      This patch also adds a test to netns.sh from Hangbin Liu to ensure this
      doesn't regress.
      
      Tested-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reported-by: default avatarXiumei Mu <xmu@redhat.com>
      Cc: Toke Høiland-Jørgensen <toke@redhat.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Fixes: 900575aa ("wireguard: device: avoid circular netns references")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      20ae1d6a
  17. Nov 29, 2021
    • msizanoen1's avatar
      ipv6: fix memory leak in fib6_rule_suppress · cdef4852
      msizanoen1 authored
      The kernel leaks memory when a `fib` rule is present in IPv6 nftables
      firewall rules and a suppress_prefix rule is present in the IPv6 routing
      rules (used by certain tools such as wg-quick). In such scenarios, every
      incoming packet will leak an allocation in `ip6_dst_cache` slab cache.
      
      After some hours of `bpftrace`-ing and source code reading, I tracked
      down the issue to ca7a03c4 ("ipv6: do not free rt if
      FIB_LOOKUP_NOREF is set on suppress rule").
      
      The problem with that change is that the generic `args->flags` always have
      `FIB_LOOKUP_NOREF` set[1][2] but the IPv6-specific flag
      `RT6_LOOKUP_F_DST_NOREF` might not be, leading to `fib6_rule_suppress` not
      decreasing the refcount when needed.
      
      How to reproduce:
       - Add the following nftables rule to a prerouting chain:
           meta nfproto ipv6 fib saddr . mark . iif oif missing drop
         This can be done with:
           sudo nft create table inet test
           sudo nft create chain inet test test_chain '{ type filter hook prerouting priority filter + 10; policy accept; }'
           sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . mark . iif oif missing drop
       - Run:
           sudo ip -6 rule add table main suppress_prefixlength 0
       - Watch `sudo slabtop -o | grep ip6_dst_cache` to see memory usage increase
         with every incoming ipv6 packet.
      
      This patch exposes the protocol-specific flags to the protocol
      specific `suppress` function, and check the protocol-specific `flags`
      argument for RT6_LOOKUP_F_DST_NOREF instead of the generic
      FIB_LOOKUP_NOREF when decreasing the refcount, like this.
      
      [1]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L71
      [2]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L99
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215105
      
      
      Fixes: ca7a03c4 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cdef4852
Loading