Skip to content
Snippets Groups Projects
  1. Aug 25, 2015
  2. Aug 07, 2015
  3. Aug 03, 2015
  4. Jul 14, 2015
    • Wengang Wang's avatar
      rds: rds_ib_device.refcount overflow · 4fabb594
      Wengang Wang authored
      
      Fixes: 3e0249f9 ("RDS/IB: add refcount tracking to struct rds_ib_device")
      
      There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
      failed(mr pool running out). this lead to the refcount overflow.
      
      A complain in line 117(see following) is seen. From vmcore:
      s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
      That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
      to return ERR_PTR(-EAGAIN).
      
      115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
      116 {
      117         BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
      118         if (atomic_dec_and_test(&rds_ibdev->refcount))
      119                 queue_work(rds_wq, &rds_ibdev->free_work);
      120 }
      
      fix is to drop refcount when rds_ib_alloc_fmr failed.
      
      Signed-off-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
      Reviewed-by: default avatarHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      4fabb594
  5. Jul 03, 2015
  6. Jun 21, 2015
  7. Jun 12, 2015
  8. Jun 02, 2015
    • Wengang Wang's avatar
      rds: re-entry of rds_ib_xmit/rds_iw_xmit · d655a9fb
      Wengang Wang authored
      
      The BUG_ON at line 452/453 is triggered in function rds_send_xmit.
      
       441                         while (ret) {
       442                                 tmp = min_t(int, ret, sg->length -
       443                                                       conn->c_xmit_data_off);
       444                                 conn->c_xmit_data_off += tmp;
       445                                 ret -= tmp;
       446                                 if (conn->c_xmit_data_off == sg->length) {
       447                                         conn->c_xmit_data_off = 0;
       448                                         sg++;
       449                                         conn->c_xmit_sg++;
       450                                         if (ret != 0 && conn->c_xmit_sg == rm->data.op_nents)
       451                                                 printk(KERN_ERR "conn %p rm %p sg %p ret %d\n", conn, rm, sg, ret);
       452                                         BUG_ON(ret != 0 &&
       453                                                conn->c_xmit_sg == rm->data.op_nents);
       454                                 }
       455                         }
      
      it is complaining the total sent length is bigger that we want to send.
      
      rds_ib_xmit() is wrong for the second entry for the same rds_message returning
      wrong value.
      
      the sg and off passed by rds_send_xmit to rds_ib_xmit is based on
      scatterlist.offset/length, but the rds_ib_xmit action is based on
      scatterlist.dma_address/dma_length. in case dma_length is larger than length
      there is problem. for the 2nd and later entries of rds_ib_xmit for same
      rds_message, at least one of the following two is wrong:
      
      1) the scatterlist to start with,  the choosen one can far beyond the correct
         one.
      2) the offset to start with within the scatterlist.
      
      fix:
      add op_dmasg and op_dmaoff to rm_data_op structure indicating the scatterlist
      and offset within the it to start with for rds_ib_xmit respectively. op_dmasg
      and op_dmaoff are initialized to zero when doing dma mapping for the first see
      of the message and are changed when filling send slots.
      
      the same applies to rds_iw_xmit too.
      
      Signed-off-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      d655a9fb
  9. Jun 01, 2015
  10. May 18, 2015
  11. May 11, 2015
  12. May 09, 2015
    • Sowmini Varadhan's avatar
      net/rds: RDS-TCP: only initiate reconnect attempt on outgoing TCP socket. · c82ac7e6
      Sowmini Varadhan authored
      
      When the peer of an RDS-TCP connection restarts, a reconnect
      attempt should only be made from the active side  of the TCP
      connection, i.e. the side that has a transient TCP port
      number. Do not add the passive side of the TCP connection
      to the c_hash_node and thus avoid triggering rds_queue_reconnect()
      for passive rds connections.
      
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c82ac7e6
    • Sowmini Varadhan's avatar
      net/rds: RDS-TCP: Always create a new rds_sock for an incoming connection. · f711a6ae
      Sowmini Varadhan authored
      
      When running RDS over TCP, the active (client) side connects to the
      listening ("passive") side at the RDS_TCP_PORT.  After the connection
      is established, if the client side reboots (potentially without even
      sending a FIN) the server still has a TCP socket in the esablished
      state.  If the server-side now gets a new SYN comes from the client
      with a different client port, TCP will create a new socket-pair, but
      the RDS layer will incorrectly pull up the old rds_connection (which
      is still associated with the stale t_sock and RDS socket state).
      
      This patch corrects this behavior by having rds_tcp_accept_one()
      always create a new connection for an incoming TCP SYN.
      The rds and tcp state associated with the old socket-pair is cleaned
      up via the rds_tcp_state_change() callback which would typically be
      invoked in most cases when the client-TCP sends a FIN on TCP restart,
      triggering a transition to CLOSE_WAIT state. In the rarer event of client
      death without a FIN, TCP_KEEPALIVE probes on the socket will detect
      the stale socket, and the TCP transition to CLOSE state will trigger
      the RDS state cleanup.
      
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f711a6ae
  13. May 04, 2015
    • David Ahern's avatar
      net/rds: Fix new sparse warning · e2783717
      David Ahern authored
      
      c0adf54a introduced new sparse warnings:
        CHECK   /home/dahern/kernels/linux.git/net/rds/ib_cm.c
      net/rds/ib_cm.c:191:34: warning: incorrect type in initializer (different base types)
      net/rds/ib_cm.c:191:34:    expected unsigned long long [unsigned] [usertype] dp_ack_seq
      net/rds/ib_cm.c:191:34:    got restricted __be64 <noident>
      net/rds/ib_cm.c:194:51: warning: cast to restricted __be64
      
      The temporary variable for sequence number should have been declared as __be64
      rather than u64. Make it so.
      
      Signed-off-by: default avatarDavid Ahern <david.ahern@oracle.com>
      Cc: shamir rabinovitch <shamir.rabinovitch@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2783717
    • shamir rabinovitch's avatar
      net/rds: fix unaligned memory access · c0adf54a
      shamir rabinovitch authored
      
      rdma_conn_param private data is copied using memcpy after headers such
      as cma_hdr (see cma_resolve_ib_udp as example). so the start of the
      private data is aligned to the end of the structure that come before. if
      this structure end with u32 the meaning is that the start of the private
      data will be 4 bytes aligned. structures that use u8/u16/u32/u64 are
      naturally aligned but in case the structure start is not 8 bytes aligned,
      all u64 members of this structure will not be aligned. to solve this issue
      we must use special macros that allow unaligned access to those
      unaligned members.
      
      Addresses the following kernel log seen when attempting to use RDMA:
      
      Kernel unaligned access at TPC[10507a88] rds_ib_cm_connect_complete+0x1bc/0x1e0 [rds_rdma]
      
      Acked-by: default avatarChien Yen <chien.yen@oracle.com>
      Signed-off-by: default avatarshamir rabinovitch <shamir.rabinovitch@oracle.com>
      [Minor tweaks for top of tree by:]
      Signed-off-by: default avatarDavid Ahern <david.ahern@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0adf54a
  14. Apr 08, 2015
    • Sowmini Varadhan's avatar
      RDS: make sure not to loop forever inside rds_send_xmit · 443be0e5
      Sowmini Varadhan authored
      
      If a determined set of concurrent senders keep the send queue full,
      we can loop forever inside rds_send_xmit.  This fix has two parts.
      
      First we are dropping out of the while(1) loop after we've processed a
      large batch of messages.
      
      Second we add a generation number that gets bumped each time the
      xmit bit lock is acquired.  If someone else has jumped in and
      made progress in the queue, we skip our goto restart.
      
      Original patch by Chris Mason.
      
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      443be0e5
    • Sowmini Varadhan's avatar
      RDS: only use passive connections when addresses match · 1789b2c0
      Sowmini Varadhan authored
      
      Passive connections were added for the case where one loopback IB
      connection between identical addresses needs another connection to store
      the second QP.  Unfortunately, they were also created in the case where
      the addesses differ and we already have both QPs.
      
      This lead to a message reordering bug.
      
      - two different IB interfaces and addresses on a machine: A B
      - traffic is sent from A to B
      - connection from A-B is created, connect request sent
      - listening accepts connect request, B-A is created
      - traffic flows, next_rx is incremented
      - unacked messages exist on the retrans list
      - connection A-B is shut down, new connect request sent
      - listen sees existing loopback B-A, creates new passive B-A
      - retrans messages are sent and delivered because of 0 next_rx
      
      The problem is that the second connection request saw the previously
      existing parent connection.  Instead of using it, and using the existing
      next_rx_seq state for the traffic between those IPs, it mistakenly
      thought that it had to create a passive connection.
      
      We fix this by only using passive connections in the special case where
      laddr and faddr match.  In this case we'll only ever have one parent
      sending connection requests and one passive connection created as the
      listening path sees the existing parent connection which initiated the
      request.
      
      Original patch by Zach Brown
      
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1789b2c0
  15. Mar 12, 2015
    • Arnd Bergmann's avatar
      rds: avoid potential stack overflow · f862e07c
      Arnd Bergmann authored
      
      The rds_iw_update_cm_id function stores a large 'struct rds_sock' object
      on the stack in order to pass a pair of addresses. This happens to just
      fit withint the 1024 byte stack size warning limit on x86, but just
      exceed that limit on ARM, which gives us this warning:
      
      net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
      
      As the use of this large variable is basically bogus, we can rearrange
      the code to not do that. Instead of passing an rds socket into
      rds_iw_get_device, we now just pass the two addresses that we have
      available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly,
      to create two address structures on the stack there.
      
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f862e07c
  16. Mar 02, 2015
  17. Feb 11, 2015
    • Sowmini Varadhan's avatar
      rds: rds_cong_queue_updates needs to defer the congestion update transmission · 80ad0d4a
      Sowmini Varadhan authored
      
      When the RDS transport is TCP, we cannot inline the call to rds_send_xmit
      from rds_cong_queue_update because
      (a) we are already holding the sock_lock in the recv path, and
          will deadlock when tcp_setsockopt/tcp_sendmsg try to get the sock
          lock
      (b) cong_queue_update does an irqsave on the rds_cong_lock, and this
          will trigger warnings (for a good reason) from functions called
          out of sock_lock.
      
      This patch reverts the change introduced by
      2fa57129 ("RDS: Bypass workqueue when queueing cong updates").
      
      The patch has been verified for both RDS/TCP as well as RDS/RDMA
      to ensure that there are not regressions for either transport:
       - for verification of  RDS/TCP a client-server unit-test was used,
         with the server blocked in gdb and thus unable to drain its rcvbuf,
         eventually triggering a RDS congestion update.
       - for RDS/RDMA, the standard IB regression tests were used
      
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80ad0d4a
  18. Feb 08, 2015
  19. Feb 05, 2015
  20. Dec 15, 2014
Loading