[pve-devel] ceph rdma mellanox ceph 12.2.1

Gerhard W. Recher gerhard.recher at net4sec.com
Mon Oct 9 14:00:39 CEST 2017


Hi Fellows,

Fabian suggested me to open a thread on dev-list :
based on bug-report: https://bugzilla.proxmox.com/show_bug.cgi?id=1521

 pveversion -v
proxmox-ve: 5.0-23 (running kernel: 4.13.3-1-pve)
pve-manager: 5.0-32 (running version: 5.0-32/2560e073)
pve-kernel-4.13.3-1-pve: 4.13.3-2
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.15-1-pve: 4.10.15-15
pve-kernel-4.10.17-3-pve: 4.10.17-23
pve-kernel-4.10.11-1-pve: 4.10.11-9
pve-kernel-4.10.17-1-pve: 4.10.17-18
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-14
qemu-server: 5.0-15
pve-firmware: 2.0-3
libpve-common-perl: 5.0-18
libpve-guest-common-perl: 2.0-12
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-15
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.1-1
pve-container: 2.0-16
pve-firewall: 3.0-3
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.11-pve18~bpo90
ceph: 12.2.1-pve1


if i enable rdma in ceph.conf
example node#4

 showgids
DEV     PORT    INDEX   GID                                    
IPv4            VER     DEV
---     ----    -----   ---                                    
------------    ---     ---
mlx4_0  1       0      
fe80:0000:0000:0000:268a:07ff:fee2:5d80                 v1      ens1
mlx4_0  1       1      
fe80:0000:0000:0000:268a:07ff:fee2:5d80                 v2      ens1
mlx4_0  1       2       0000:0000:0000:0000:0000:ffff:c0a8:dd90
192.168.221.144         v1      vmbr0
mlx4_0  1       3       0000:0000:0000:0000:0000:ffff:c0a8:dd90
192.168.221.144         v2      vmbr0
mlx4_0  2       0      
fe80:0000:0000:0000:268a:07ff:fee2:5d81                 v1      ens1d1
mlx4_0  2       1      
fe80:0000:0000:0000:268a:07ff:fee2:5d81                 v2      ens1d1
mlx4_0  2       2       0000:0000:0000:0000:0000:ffff:c0a8:6490
192.168.100.144         v1      ens1d1
mlx4_0  2       3       0000:0000:0000:0000:0000:ffff:c0a8:6490
192.168.100.144         v2      ens1d1
n_gids_found=8


[global]
#ms_type=xio
ms_type=async+rdma
ms_cluster_type = async+rdma
ms_async_rdma_port_num=2
ms_async_rdma_device_name=mlx4_0
#1
#ms_async_rdma_local_gid=0000:0000:0000:0000:0000:ffff:c0a8:648d
#fe80:0000:0000:0000:268a:07ff:fee2:6071
#2
#ms_async_rdma_local_gid=0000:0000:0000:0000:0000:ffff:c0a8:648e
#fe80:0000:0000:0000:268a:07ff:fef9:8731
#3
#ms_async_rdma_local_gid=0000:0000:0000:0000:0000:ffff:c0a8:648f
#fe80:0000:0000:0000:268a:07ff:fef9:8791
#4
ms_async_rdma_local_gid=0000:0000:0000:0000:0000:ffff:c0a8:6490
#fe80:0000:0000:0000:268a:07ff:fee2:5d81


I will end up in not operating ceph, osd,mgr and mon processes are up,
but no operations will have success because of asserting!

Mon log:

2017-10-09 13:00:34.307728 7fae5ddeef80  0 set uid:gid to 64045:64045
(ceph:ceph)
2017-10-09 13:00:34.307738 7fae5ddeef80  0 ceph version 12.2.1
(fc129ad90a65dc0b419412e77cb85ac230da42a6) luminous (stable), process
(unknown), pid 20268
2017-10-09 13:00:34.307802 7fae5ddeef80  0 pidfile_write: ignore empty
--pid-file
2017-10-09 13:00:34.314461 7fae5ddeef80  0 load: jerasure load: lrc
load: isa
2017-10-09 13:00:34.315570 7fae5ddeef80  1 leveldb: Recovering log #73065
2017-10-09 13:00:34.369901 7fae5ddeef80  1 leveldb: Level-0 table
#73083: started
2017-10-09 13:00:34.469844 7fae5ddeef80  1 leveldb: Level-0 table
#73083: 27766571 bytes OK
2017-10-09 13:00:34.474225 7fae5ddeef80  1 leveldb: Delete type=0 #73065

2017-10-09 13:00:34.479041 7fae5ddeef80  1 leveldb: Delete type=3 #72279

2017-10-09 13:00:34.479496 7fae5ddeef80  1 RDMAStack RDMAStack
ms_async_rdma_enable_hugepage value is: 0
2017-10-09 13:00:34.479509 7fae5ddeef80 20 RDMAStack RDMAStack
constructing RDMAStack...
2017-10-09 13:00:34.479521 7fae5ddeef80 20 RDMAStack  creating
RDMAStack:0x5558ff76fe70 with dispatcher:0x5558ff7b48c0
2017-10-09 13:00:34.479558 7fae545dd700  2 Event(0x5558ffa98600
nevent=5000 time_id=1).set_owner idx=0 owner=140386716473088
2017-10-09 13:00:34.479574 7fae545dd700 20 Event(0x5558ffa98600
nevent=5000 time_id=1).create_file_event create event started fd=11
mask=1 original mask is 0
2017-10-09 13:00:34.479576 7fae545dd700 20 EpollDriver.add_event add
event fd=11 cur_mask=0 add_mask=1 to 10
2017-10-09 13:00:34.479581 7fae545dd700 20 Event(0x5558ffa98600
nevent=5000 time_id=1).create_file_event create event end fd=11 mask=1
original mask is 1
2017-10-09 13:00:34.479582 7fae545dd700 10 stack operator() starting
2017-10-09 13:00:34.479584 7fae53ddc700  2 Event(0x5558ffa988c0
nevent=5000 time_id=1).set_owner idx=1 owner=140386708080384
2017-10-09 13:00:34.479592 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event started fd=14
mask=1 original mask is 0
2017-10-09 13:00:34.479595 7fae53ddc700 20 EpollDriver.add_event add
event fd=14 cur_mask=0 add_mask=1 to 13
2017-10-09 13:00:34.479598 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event end fd=14 mask=1
original mask is 1
2017-10-09 13:00:34.479601 7fae53ddc700 10 stack operator() starting
2017-10-09 13:00:34.479600 7fae535db700  2 Event(0x5558ffa98b80
nevent=5000 time_id=1).set_owner idx=2 owner=140386699687680
2017-10-09 13:00:34.479603 7fae535db700 20 Event(0x5558ffa98b80
nevent=5000 time_id=1).create_file_event create event started fd=17
mask=1 original mask is 0
2017-10-09 13:00:34.479604 7fae535db700 20 EpollDriver.add_event add
event fd=17 cur_mask=0 add_mask=1 to 16
2017-10-09 13:00:34.479606 7fae535db700 20 Event(0x5558ffa98b80
nevent=5000 time_id=1).create_file_event create event end fd=17 mask=1
original mask is 1
2017-10-09 13:00:34.479607 7fae535db700 10 stack operator() starting
2017-10-09 13:00:34.479625 7fae5ddeef80  0 starting mon.3 rank 3 at
public addr 192.168.100.144:6789/0 at bind addr 192.168.100.144:6789/0
mon_data /var/lib/ceph/mon/ceph-3 fsid cb0aba69-bad9-4d30-b163-c19f0fd1ec53
2017-10-09 13:00:34.479636 7fae5ddeef80 10 -- - bind bind
192.168.100.144:6789/0
2017-10-09 13:00:34.479638 7fae5ddeef80 10 -- - bind Network Stack is
not ready for bind yet - postponed
2017-10-09 13:00:34.479645 7fae5ddeef80  0 starting mon.3 rank 3 at
192.168.100.144:6789/0 mon_data /var/lib/ceph/mon/ceph-3 fsid
cb0aba69-bad9-4d30-b163-c19f0fd1ec53
2017-10-09 13:00:34.480237 7fae5ddeef80  0 mon.3 at -1(probing).mds e1
print_map
e1
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file
layout v2}
legacy client fscid: -1

No filesystems configured

2017-10-09 13:00:34.480408 7fae5ddeef80  0 mon.3 at -1(probing).osd e12305
crush map has features 283675107524608, adjusting msgr requires
2017-10-09 13:00:34.480415 7fae5ddeef80  0 mon.3 at -1(probing).osd e12305
crush map has features 283675107524608, adjusting msgr requires
2017-10-09 13:00:34.480417 7fae5ddeef80  0 mon.3 at -1(probing).osd e12305
crush map has features 720859615486820352, adjusting msgr requires
2017-10-09 13:00:34.480418 7fae5ddeef80  0 mon.3 at -1(probing).osd e12305
crush map has features 283675107524608, adjusting msgr requires
2017-10-09 13:00:34.481692 7fae5ddeef80 10 -- - create_connect
192.168.100.144:6828/21406, creating connection and registering
2017-10-09 13:00:34.481697 7fae5ddeef80 10 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_NONE pgs=0
cs=0 l=0)._connect csq=0
2017-10-09 13:00:34.481701 7fae5ddeef80 20 Event(0x5558ffa98b80
nevent=5000 time_id=1).wakeup
2017-10-09 13:00:34.481706 7fae5ddeef80 10 -- - get_connection
mgr.17664152 192.168.100.144:6828/21406 new 0x555900b4b800
2017-10-09 13:00:34.481709 7fae5ddeef80  1 -- - -->
192.168.100.144:6828/21406 -- mgropen(unknown.3) v2 -- 0x5558ffa99080 con 0
2017-10-09 13:00:34.481713 7fae5ddeef80 15 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_CONNECTING
pgs=0 cs=0 l=0).send_message inline write is denied, reschedule
m=0x5558ffa99080
2017-10-09 13:00:34.481740 7fae535db700 20 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_CONNECTING
pgs=0 cs=0 l=0).process prev state is STATE_CONNECTING
2017-10-09 13:00:34.481855 7fae5ddeef80  1 -- - start start
2017-10-09 13:00:34.481859 7fae5ddeef80  1 -- - start start
2017-10-09 13:00:34.482042 7fae5ddeef80 10 -- - ready -
2017-10-09 13:00:34.482045 7fae5ddeef80 10 -- - bind bind
192.168.100.144:6789/0
2017-10-09 13:00:34.482048 7fae5ddeef80 10  Processor -- bind
2017-10-09 13:00:34.482049 7fae5ddeef80 20 Event(0x5558ffa98600
nevent=5000 time_id=1).wakeup
2017-10-09 13:00:34.482427 7fae535db700 10 Infiniband binding_port port
1 is not what we want. state: 4)
2017-10-09 13:00:34.482522 7fae535db700  1 Infiniband binding_port found
active port 2
2017-10-09 13:00:34.482530 7fae535db700  1 Infiniband init assigning:
1024 receive buffers
2017-10-09 13:00:34.482532 7fae535db700  1 Infiniband init assigning:
1024 send buffers
2017-10-09 13:00:34.482533 7fae535db700  1 Infiniband init device allow
4194303 completion entries
2017-10-09 13:00:34.565741 7fae535db700 20 Infiniband init started.
2017-10-09 13:00:34.565783 7fae535db700 20 Infiniband init started.
2017-10-09 13:00:34.567858 7fae535db700 20 Infiniband init successfully
create cq=0x5558ff77ccb0
2017-10-09 13:00:34.569959 7fae535db700 20 Infiniband init successfully
create cq=0x5558ff77cd80
2017-10-09 13:00:34.570039 7fae535db700 20 Infiniband init started.
2017-10-09 13:00:34.570050 7fae4eb2c700 20 RDMAStack polling going to
poll tx cq: 0x555900e7c540 rx cq: 0x555900e7c570
2017-10-09 13:00:34.570088 7fae545dd700 20  RDMAServerSocketImpl listen
bind to 192.168.100.144:6789 on port 6789
2017-10-09 13:00:34.570130 7fae535db700 20 Infiniband init successfully
create queue pair: qp=0x5558ff7caf00
2017-10-09 13:00:34.570145 7fae5ddeef80 10  Processor -- bind bound to
192.168.100.144:6789/0
2017-10-09 13:00:34.570169 7fae5ddeef80  1 -- 192.168.100.144:6789/0
learned_addr learned my addr 192.168.100.144:6789/0
2017-10-09 13:00:34.570173 7fae5ddeef80  1 -- 192.168.100.144:6789/0
_finish_bind bind my_inst.addr is 192.168.100.144:6789/0
2017-10-09 13:00:34.570177 7fae5ddeef80  1  Processor -- start
2017-10-09 13:00:34.570179 7fae5ddeef80 20 Event(0x5558ffa98600
nevent=5000 time_id=1).wakeup
2017-10-09 13:00:34.570194 7fae545dd700 20 Event(0x5558ffa98600
nevent=5000 time_id=1).create_file_event create event started fd=23
mask=1 original mask is 0
2017-10-09 13:00:34.570202 7fae545dd700 20 EpollDriver.add_event add
event fd=23 cur_mask=0 add_mask=1 to 10
2017-10-09 13:00:34.570206 7fae545dd700 20 Event(0x5558ffa98600
nevent=5000 time_id=1).create_file_event create event end fd=23 mask=1
original mask is 1
2017-10-09 13:00:34.570308 7fae5ddeef80 10 -- - ready -
2017-10-09 13:00:34.570317 7fae5ddeef80  1  Processor -- start
2017-10-09 13:00:34.570347 7fae535db700 20 Infiniband init successfully
change queue pair to INIT: qp=0x5558ff7caf00
2017-10-09 13:00:34.570360 7fae5ddeef80  0 mon.3 at -1(probing) e5  my rank
is now 3 (was -1)
2017-10-09 13:00:34.570362 7fae535db700 20  RDMAConnectedSocketImpl
try_connect nonblock:1, nodelay:1, rbuf_size: 0
2017-10-09 13:00:34.570363 7fae5ddeef80  1 -- 192.168.100.144:6789/0
shutdown_connections
2017-10-09 13:00:34.570379 7fae5ddeef80  1 -- 192.168.100.144:6789/0
_send_message--> mon.0 192.168.100.141:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- ?+0 0x5558ff7c4900
2017-10-09 13:00:34.570392 7fae5ddeef80 20 -- 192.168.100.144:6789/0
submit_message mon_probe(probe cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name
3) v6 remote, 192.168.100.141:6789/0, new connection.
2017-10-09 13:00:34.570385 7fae535db700 10 NetHandler generic_connect
connect: (111) Connection refused
2017-10-09 13:00:34.570395 7fae5ddeef80 10 -- 192.168.100.144:6789/0
create_connect 192.168.100.141:6789/0, creating connection and registering
2017-10-09 13:00:34.570400 7fae535db700  1 RDMAStack connect try
connecting failed.
2017-10-09 13:00:34.570403 7fae535db700 20  RDMAConnectedSocketImpl
~RDMAConnectedSocketImpl destruct.
2017-10-09 13:00:34.570410 7fae4eb2c700 10 RDMAStack polling finally
delete qp=0x55590059d480
2017-10-09 13:00:34.570414 7fae4eb2c700 20 Infiniband ~QueuePair destroy
qp=0x5558ff7caf00
2017-10-09 13:00:34.570408 7fae5ddeef80 10 -- 192.168.100.144:6789/0 >>
192.168.100.141:6789/0 conn(0x555900b4a000 :-1 s=STATE_NONE pgs=0 cs=0
l=0)._connect csq=0
2017-10-09 13:00:34.570418 7fae5ddeef80 20 Event(0x5558ffa98600
nevent=5000 time_id=1).wakeup
2017-10-09 13:00:34.570416 7fae535db700 10 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_CONNECTING
pgs=0 cs=0 l=0).fault waiting 0.200000
2017-10-09 13:00:34.570425 7fae545dd700 20 -- 192.168.100.144:6789/0 >>
192.168.100.141:6789/0 conn(0x555900b4a000 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).process prev state is STATE_CONNECTING
2017-10-09 13:00:34.570423 7fae5ddeef80  1 -- 192.168.100.144:6789/0 -->
192.168.100.141:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- 0x5558ff7c4900 con 0
2017-10-09 13:00:34.570434 7fae535db700 10 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_CONNECTING
pgs=0 cs=0 l=0).handle_write
2017-10-09 13:00:34.570437 7fae545dd700 20 Infiniband init started.
2017-10-09 13:00:34.570436 7fae5ddeef80 15 -- 192.168.100.144:6789/0 >>
192.168.100.141:6789/0 conn(0x555900b4a000 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).send_message inline write is denied, reschedule m=0x5558ff7c4900
2017-10-09 13:00:34.570440 7fae5ddeef80 20 Event(0x5558ffa98600
nevent=5000 time_id=1).wakeup
2017-10-09 13:00:34.570443 7fae5ddeef80  1 -- 192.168.100.144:6789/0
_send_message--> mon.1 192.168.100.142:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- ?+0 0x5558ff7c4600
2017-10-09 13:00:34.570446 7fae5ddeef80 20 -- 192.168.100.144:6789/0
submit_message mon_probe(probe cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name
3) v6 remote, 192.168.100.142:6789/0, new connection.
2017-10-09 13:00:34.570448 7fae5ddeef80 10 -- 192.168.100.144:6789/0
create_connect 192.168.100.142:6789/0, creating connection and registering
2017-10-09 13:00:34.570454 7fae5ddeef80 10 -- 192.168.100.144:6789/0 >>
192.168.100.142:6789/0 conn(0x555900ade800 :-1 s=STATE_NONE pgs=0 cs=0
l=0)._connect csq=0
2017-10-09 13:00:34.570456 7fae5ddeef80 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).wakeup
2017-10-09 13:00:34.570462 7fae5ddeef80  1 -- 192.168.100.144:6789/0 -->
192.168.100.142:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- 0x5558ff7c4600 con 0
2017-10-09 13:00:34.570467 7fae5ddeef80 15 -- 192.168.100.144:6789/0 >>
192.168.100.142:6789/0 conn(0x555900ade800 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).send_message inline write is denied, reschedule m=0x5558ff7c4600
2017-10-09 13:00:34.570470 7fae5ddeef80  1 -- 192.168.100.144:6789/0
_send_message--> mon.2 192.168.100.143:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- ?+0 0x5558ff7c4c00
2017-10-09 13:00:34.570473 7fae5ddeef80 20 -- 192.168.100.144:6789/0
submit_message mon_probe(probe cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name
3) v6 remote, 192.168.100.143:6789/0, new connection.
2017-10-09 13:00:34.570475 7fae5ddeef80 10 -- 192.168.100.144:6789/0
create_connect 192.168.100.143:6789/0, creating connection and registering
2017-10-09 13:00:34.570481 7fae5ddeef80 10 -- 192.168.100.144:6789/0 >>
192.168.100.143:6789/0 conn(0x555900add000 :-1 s=STATE_NONE pgs=0 cs=0
l=0)._connect csq=0
2017-10-09 13:00:34.570474 7fae53ddc700 20 -- 192.168.100.144:6789/0 >>
192.168.100.142:6789/0 conn(0x555900ade800 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).process prev state is STATE_CONNECTING
2017-10-09 13:00:34.570485 7fae5ddeef80 20 Event(0x5558ffa98b80
nevent=5000 time_id=2).wakeup
2017-10-09 13:00:34.570490 7fae53ddc700 20 Infiniband init started.
2017-10-09 13:00:34.570491 7fae5ddeef80  1 -- 192.168.100.144:6789/0 -->
192.168.100.143:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- 0x5558ff7c4c00 con 0
2017-10-09 13:00:34.570493 7fae5ddeef80 15 -- 192.168.100.144:6789/0 >>
192.168.100.143:6789/0 conn(0x555900add000 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).send_message inline write is denied, reschedule m=0x5558ff7c4c00
2017-10-09 13:00:34.570504 7fae535db700 20 -- 192.168.100.144:6789/0 >>
192.168.100.143:6789/0 conn(0x555900add000 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).process prev state is STATE_CONNECTING
2017-10-09 13:00:34.570520 7fae535db700 20 Infiniband init started.
2017-10-09 13:00:34.570616 7fae545dd700 20 Infiniband init successfully
create queue pair: qp=0x5558ff7cb2c0
2017-10-09 13:00:34.570664 7fae53ddc700 20 Infiniband init successfully
create queue pair: qp=0x5558ff7cb400
2017-10-09 13:00:34.570676 7fae535db700 20 Infiniband init successfully
create queue pair: qp=0x5558ff7cb540
2017-10-09 13:00:34.570861 7fae545dd700 20 Infiniband init successfully
change queue pair to INIT: qp=0x5558ff7cb2c0
2017-10-09 13:00:34.570884 7fae545dd700 20  RDMAConnectedSocketImpl
try_connect nonblock:1, nodelay:1, rbuf_size: 0
2017-10-09 13:00:34.570945 7fae545dd700 10 NetHandler generic_connect
connect: (111) Connection refused
2017-10-09 13:00:34.570956 7fae545dd700  1 RDMAStack connect try
connecting failed.
2017-10-09 13:00:34.570959 7fae545dd700 20  RDMAConnectedSocketImpl
~RDMAConnectedSocketImpl destruct.
2017-10-09 13:00:34.570969 7fae4eb2c700 10 RDMAStack polling finally
delete qp=0x55590059d560
2017-10-09 13:00:34.570969 7fae545dd700 10 -- 192.168.100.144:6789/0 >>
192.168.100.141:6789/0 conn(0x555900b4a000 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).fault waiting 0.200000
2017-10-09 13:00:34.570974 7fae4eb2c700 20 Infiniband ~QueuePair destroy
qp=0x5558ff7cb2c0
2017-10-09 13:00:34.570981 7fae545dd700 10 -- 192.168.100.144:6789/0 >>
192.168.100.141:6789/0 conn(0x555900b4a000 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).handle_write
2017-10-09 13:00:34.570999 7fae53ddc700 20 Infiniband init successfully
change queue pair to INIT: qp=0x5558ff7cb400
2017-10-09 13:00:34.571102 7fae535db700 20 Infiniband init successfully
change queue pair to INIT: qp=0x5558ff7cb540
2017-10-09 13:00:34.571244 7fae4eb2c700 20 Infiniband rearm_notify started.
2017-10-09 13:00:34.571265 7fae4eb2c700 20 Infiniband rearm_notify started.
2017-10-09 13:00:34.571274 7fae53ddc700 20  RDMAConnectedSocketImpl
try_connect nonblock:1, nodelay:1, rbuf_size: 0
2017-10-09 13:00:34.571278 7fae535db700 20  RDMAConnectedSocketImpl
try_connect nonblock:1, nodelay:1, rbuf_size: 0
2017-10-09 13:00:34.571378 7fae53ddc700 20  RDMAConnectedSocketImpl
try_connect tcp_fd: 35
2017-10-09 13:00:34.571390 7fae53ddc700 10 Infiniband send_msg sending:
0, 4543, 5515815, 0, fe80000000000000268a07fffee25d81
2017-10-09 13:00:34.571396 7fae535db700 10 NetHandler generic_connect
connect: (111) Connection refused
2017-10-09 13:00:34.571402 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event started fd=35
mask=1 original mask is 0
2017-10-09 13:00:34.571406 7fae53ddc700 20 EpollDriver.add_event add
event fd=35 cur_mask=0 add_mask=1 to 13
2017-10-09 13:00:34.571411 7fae535db700  1 RDMAStack connect try
connecting failed.
2017-10-09 13:00:34.571409 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event end fd=35 mask=1
original mask is 1
2017-10-09 13:00:34.571415 7fae535db700 20  RDMAConnectedSocketImpl
~RDMAConnectedSocketImpl destruct.
2017-10-09 13:00:34.571416 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event started fd=32
mask=1 original mask is 0
2017-10-09 13:00:34.571419 7fae53ddc700 20 EpollDriver.add_event add
event fd=32 cur_mask=0 add_mask=1 to 13
2017-10-09 13:00:34.571421 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event end fd=32 mask=1
original mask is 1
2017-10-09 13:00:34.571419 7fae535db700 10 -- 192.168.100.144:6789/0 >>
192.168.100.143:6789/0 conn(0x555900add000 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).fault waiting 0.200000
2017-10-09 13:00:34.571423 7fae53ddc700 20 -- 192.168.100.144:6789/0 >>
192.168.100.142:6789/0 conn(0x555900ade800 :-1 s=STATE_CONNECTING_RE
pgs=0 cs=0 l=0).process prev state is STATE_CONNECTING
2017-10-09 13:00:34.571426 7fae535db700 10 -- 192.168.100.144:6789/0 >>
192.168.100.143:6789/0 conn(0x555900add000 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).handle_write
2017-10-09 13:00:34.571430 7fae53ddc700 10 -- 192.168.100.144:6789/0 >>
192.168.100.142:6789/0 conn(0x555900ade800 :-1 s=STATE_CONNECTING_RE
pgs=0 cs=0 l=0)._process_connection nonblock connect inprogress
2017-10-09 13:00:34.571436 7fae53ddc700 10 -- 192.168.100.144:6789/0 >>
192.168.100.142:6789/0 conn(0x555900ade800 :-1 s=STATE_CONNECTING_RE
pgs=0 cs=0 l=0).handle_write
2017-10-09 13:00:34.571994 7fae53ddc700 20  RDMAConnectedSocketImpl
handle_connection QP: 4543 tcp_fd: 35 notify_fd: 32
2017-10-09 13:00:34.572010 7fae53ddc700  5 Infiniband recv_msg recevd:
0, 3643, 3062654, 0, fe80000000000000268a07fffef98731
2017-10-09 13:00:34.572014 7fae53ddc700 20  RDMAConnectedSocketImpl
handle_connection peer msg :  < 3643, 3062654, 0, 0>
2017-10-09 13:00:34.572016 7fae53ddc700 20  RDMAConnectedSocketImpl
activate Choosing gid_index 24, sl 3
2017-10-09 13:00:34.572021 7fae53ddc700 20  RDMAConnectedSocketImpl
activate transition to RTR state successfully.
2017-10-09 13:00:34.572022 7fae53ddc700 -1  RDMAConnectedSocketImpl
activate failed to transition to RTS state: (22) Invalid argument
2017-10-09 13:00:34.573257 7fae53ddc700 -1
/home/builder/source/ceph-12.2.1/src/msg/async/rdma/RDMAConnectedSocketImpl.cc:
In function 'void RDMAConnectedSocketImpl::handle_connection()' thread
7fae53ddc700 time 2017-10-09 13:00:34.572032
/home/builder/source/ceph-12.2.1/src/msg/async/rdma/RDMAConnectedSocketImpl.cc:
221: FAILED assert(!r)

 ceph version 12.2.1 (fc129ad90a65dc0b419412e77cb85ac230da42a6) luminous
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x5558f5c809f2]
 2: (RDMAConnectedSocketImpl::handle_connection()+0xb4a) [0x5558f5fd656a]
 3: (EventCenter::process_events(int, std::chrono::duration<unsigned
long, std::ratio<1l, 1000000000l> >*)+0xa08) [0x5558f5d46028]
 4: (()+0x697128) [0x5558f5d4a128]
 5: (()+0xb9e6f) [0x7fae5af0fe6f]
 6: (()+0x7494) [0x7fae5d19e494]
 7: (clone()+0x3f) [0x7fae5a684aff]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- begin dump of recent events ---
  -173> 2017-10-09 13:00:34.302946 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command perfcounters_dump hook 0x5558ff7321b0
  -172> 2017-10-09 13:00:34.302953 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command 1 hook 0x5558ff7321b0
  -171> 2017-10-09 13:00:34.302955 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command perf dump hook 0x5558ff7321b0
  -170> 2017-10-09 13:00:34.302957 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command perfcounters_schema hook 0x5558ff7321b0
  -169> 2017-10-09 13:00:34.302958 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command perf histogram dump hook 0x5558ff7321b0
  -168> 2017-10-09 13:00:34.302960 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command 2 hook 0x5558ff7321b0
  -167> 2017-10-09 13:00:34.302961 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command perf schema hook 0x5558ff7321b0
  -166> 2017-10-09 13:00:34.302962 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command perf histogram schema hook 0x5558ff7321b0
  -165> 2017-10-09 13:00:34.302964 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command perf reset hook 0x5558ff7321b0
  -164> 2017-10-09 13:00:34.302965 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command config show hook 0x5558ff7321b0
  -163> 2017-10-09 13:00:34.302967 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command config help hook 0x5558ff7321b0
  -162> 2017-10-09 13:00:34.302968 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command config set hook 0x5558ff7321b0
  -161> 2017-10-09 13:00:34.302969 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command config get hook 0x5558ff7321b0
  -160> 2017-10-09 13:00:34.302971 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command config diff hook 0x5558ff7321b0
  -159> 2017-10-09 13:00:34.302972 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command config diff get hook 0x5558ff7321b0
  -158> 2017-10-09 13:00:34.302975 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command log flush hook 0x5558ff7321b0
  -157> 2017-10-09 13:00:34.302979 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command log dump hook 0x5558ff7321b0
  -156> 2017-10-09 13:00:34.302981 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command log reopen hook 0x5558ff7321b0
  -155> 2017-10-09 13:00:34.302988 7fae5ddeef80  5 asok(0x5558ff7af2c0)
register_command dump_mempools hook 0x5558ffa33188
  -154> 2017-10-09 13:00:34.307728 7fae5ddeef80  0 set uid:gid to
64045:64045 (ceph:ceph)
  -153> 2017-10-09 13:00:34.307738 7fae5ddeef80  0 ceph version 12.2.1
(fc129ad90a65dc0b419412e77cb85ac230da42a6) luminous (stable), process
(unknown), pid 20268
  -152> 2017-10-09 13:00:34.307802 7fae5ddeef80  0 pidfile_write: ignore
empty --pid-file
  -151> 2017-10-09 13:00:34.314461 7fae5ddeef80  0 load: jerasure load:
lrc load: isa
  -150> 2017-10-09 13:00:34.315570 7fae5ddeef80  1 leveldb: Recovering
log #73065
  -149> 2017-10-09 13:00:34.369901 7fae5ddeef80  1 leveldb: Level-0
table #73083: started
  -148> 2017-10-09 13:00:34.469844 7fae5ddeef80  1 leveldb: Level-0
table #73083: 27766571 bytes OK
  -147> 2017-10-09 13:00:34.474225 7fae5ddeef80  1 leveldb: Delete
type=0 #73065

  -146> 2017-10-09 13:00:34.479041 7fae5ddeef80  1 leveldb: Delete
type=3 #72279

  -145> 2017-10-09 13:00:34.479496 7fae5ddeef80  1 RDMAStack RDMAStack
ms_async_rdma_enable_hugepage value is: 0
  -144> 2017-10-09 13:00:34.479509 7fae5ddeef80 20 RDMAStack RDMAStack
constructing RDMAStack...
  -143> 2017-10-09 13:00:34.479521 7fae5ddeef80 20 RDMAStack  creating
RDMAStack:0x5558ff76fe70 with dispatcher:0x5558ff7b48c0
  -142> 2017-10-09 13:00:34.479558 7fae545dd700  2 Event(0x5558ffa98600
nevent=5000 time_id=1).set_owner idx=0 owner=140386716473088
  -141> 2017-10-09 13:00:34.479574 7fae545dd700 20 Event(0x5558ffa98600
nevent=5000 time_id=1).create_file_event create event started fd=11
mask=1 original mask is 0
  -140> 2017-10-09 13:00:34.479576 7fae545dd700 20 EpollDriver.add_event
add event fd=11 cur_mask=0 add_mask=1 to 10
  -139> 2017-10-09 13:00:34.479581 7fae545dd700 20 Event(0x5558ffa98600
nevent=5000 time_id=1).create_file_event create event end fd=11 mask=1
original mask is 1
  -138> 2017-10-09 13:00:34.479582 7fae545dd700 10 stack operator() starting
  -137> 2017-10-09 13:00:34.479584 7fae53ddc700  2 Event(0x5558ffa988c0
nevent=5000 time_id=1).set_owner idx=1 owner=140386708080384
  -136> 2017-10-09 13:00:34.479592 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event started fd=14
mask=1 original mask is 0
  -135> 2017-10-09 13:00:34.479595 7fae53ddc700 20 EpollDriver.add_event
add event fd=14 cur_mask=0 add_mask=1 to 13
  -134> 2017-10-09 13:00:34.479598 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event end fd=14 mask=1
original mask is 1
  -133> 2017-10-09 13:00:34.479601 7fae53ddc700 10 stack operator() starting
  -132> 2017-10-09 13:00:34.479600 7fae535db700  2 Event(0x5558ffa98b80
nevent=5000 time_id=1).set_owner idx=2 owner=140386699687680
  -131> 2017-10-09 13:00:34.479603 7fae535db700 20 Event(0x5558ffa98b80
nevent=5000 time_id=1).create_file_event create event started fd=17
mask=1 original mask is 0
  -130> 2017-10-09 13:00:34.479604 7fae535db700 20 EpollDriver.add_event
add event fd=17 cur_mask=0 add_mask=1 to 16
  -129> 2017-10-09 13:00:34.479606 7fae535db700 20 Event(0x5558ffa98b80
nevent=5000 time_id=1).create_file_event create event end fd=17 mask=1
original mask is 1
  -128> 2017-10-09 13:00:34.479607 7fae535db700 10 stack operator() starting
  -127> 2017-10-09 13:00:34.479625 7fae5ddeef80  0 starting mon.3 rank 3
at public addr 192.168.100.144:6789/0 at bind addr
192.168.100.144:6789/0 mon_data /var/lib/ceph/mon/ceph-3 fsid
cb0aba69-bad9-4d30-b163-c19f0fd1ec53
  -126> 2017-10-09 13:00:34.479636 7fae5ddeef80 10 -- - bind bind
192.168.100.144:6789/0
  -125> 2017-10-09 13:00:34.479638 7fae5ddeef80 10 -- - bind Network
Stack is not ready for bind yet - postponed
  -124> 2017-10-09 13:00:34.479645 7fae5ddeef80  0 starting mon.3 rank 3
at 192.168.100.144:6789/0 mon_data /var/lib/ceph/mon/ceph-3 fsid
cb0aba69-bad9-4d30-b163-c19f0fd1ec53
  -123> 2017-10-09 13:00:34.480237 7fae5ddeef80  0 mon.3 at -1(probing).mds
e1 print_map
e1
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file
layout v2}
legacy client fscid: -1

No filesystems configured

  -122> 2017-10-09 13:00:34.480408 7fae5ddeef80  0 mon.3 at -1(probing).osd
e12305 crush map has features 283675107524608, adjusting msgr requires
  -121> 2017-10-09 13:00:34.480415 7fae5ddeef80  0 mon.3 at -1(probing).osd
e12305 crush map has features 283675107524608, adjusting msgr requires
  -120> 2017-10-09 13:00:34.480417 7fae5ddeef80  0 mon.3 at -1(probing).osd
e12305 crush map has features 720859615486820352, adjusting msgr requires
  -119> 2017-10-09 13:00:34.480418 7fae5ddeef80  0 mon.3 at -1(probing).osd
e12305 crush map has features 283675107524608, adjusting msgr requires
  -118> 2017-10-09 13:00:34.481682 7fae5ddeef80  4 mgrc handle_mgr_map
Got map version 458808
  -117> 2017-10-09 13:00:34.481687 7fae5ddeef80  4 mgrc handle_mgr_map
Active mgr is now 192.168.100.144:6828/21406
  -116> 2017-10-09 13:00:34.481689 7fae5ddeef80  4 mgrc reconnect
Starting new session with 192.168.100.144:6828/21406
  -115> 2017-10-09 13:00:34.481692 7fae5ddeef80 10 -- - create_connect
192.168.100.144:6828/21406, creating connection and registering
  -114> 2017-10-09 13:00:34.481697 7fae5ddeef80 10 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_NONE pgs=0
cs=0 l=0)._connect csq=0
  -113> 2017-10-09 13:00:34.481701 7fae5ddeef80 20 Event(0x5558ffa98b80
nevent=5000 time_id=1).wakeup
  -112> 2017-10-09 13:00:34.481706 7fae5ddeef80 10 -- - get_connection
mgr.17664152 192.168.100.144:6828/21406 new 0x555900b4b800
  -111> 2017-10-09 13:00:34.481709 7fae5ddeef80  1 -- - -->
192.168.100.144:6828/21406 -- mgropen(unknown.3) v2 -- 0x5558ffa99080 con 0
  -110> 2017-10-09 13:00:34.481713 7fae5ddeef80 15 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_CONNECTING
pgs=0 cs=0 l=0).send_message inline write is denied, reschedule
m=0x5558ffa99080
  -109> 2017-10-09 13:00:34.481740 7fae535db700 20 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_CONNECTING
pgs=0 cs=0 l=0).process prev state is STATE_CONNECTING
  -108> 2017-10-09 13:00:34.481855 7fae5ddeef80  1 -- - start start
  -107> 2017-10-09 13:00:34.481859 7fae5ddeef80  1 -- - start start
  -106> 2017-10-09 13:00:34.482042 7fae5ddeef80 10 -- - ready -
  -105> 2017-10-09 13:00:34.482045 7fae5ddeef80 10 -- - bind bind
192.168.100.144:6789/0
  -104> 2017-10-09 13:00:34.482048 7fae5ddeef80 10  Processor -- bind
  -103> 2017-10-09 13:00:34.482049 7fae5ddeef80 20 Event(0x5558ffa98600
nevent=5000 time_id=1).wakeup
  -102> 2017-10-09 13:00:34.482427 7fae535db700 10 Infiniband
binding_port port 1 is not what we want. state: 4)
  -101> 2017-10-09 13:00:34.482522 7fae535db700  1 Infiniband
binding_port found active port 2
  -100> 2017-10-09 13:00:34.482530 7fae535db700  1 Infiniband init
assigning: 1024 receive buffers
   -99> 2017-10-09 13:00:34.482532 7fae535db700  1 Infiniband init
assigning: 1024 send buffers
   -98> 2017-10-09 13:00:34.482533 7fae535db700  1 Infiniband init
device allow 4194303 completion entries
   -97> 2017-10-09 13:00:34.565741 7fae535db700 20 Infiniband init started.
   -96> 2017-10-09 13:00:34.565783 7fae535db700 20 Infiniband init started.
   -95> 2017-10-09 13:00:34.567858 7fae535db700 20 Infiniband init
successfully create cq=0x5558ff77ccb0
   -94> 2017-10-09 13:00:34.569959 7fae535db700 20 Infiniband init
successfully create cq=0x5558ff77cd80
   -93> 2017-10-09 13:00:34.570039 7fae535db700 20 Infiniband init started.
   -92> 2017-10-09 13:00:34.570050 7fae4eb2c700 20 RDMAStack polling
going to poll tx cq: 0x555900e7c540 rx cq: 0x555900e7c570
   -91> 2017-10-09 13:00:34.570088 7fae545dd700 20  RDMAServerSocketImpl
listen bind to 192.168.100.144:6789 on port 6789
   -90> 2017-10-09 13:00:34.570130 7fae535db700 20 Infiniband init
successfully create queue pair: qp=0x5558ff7caf00
   -89> 2017-10-09 13:00:34.570145 7fae5ddeef80 10  Processor -- bind
bound to 192.168.100.144:6789/0
   -88> 2017-10-09 13:00:34.570169 7fae5ddeef80  1 --
192.168.100.144:6789/0 learned_addr learned my addr 192.168.100.144:6789/0
   -87> 2017-10-09 13:00:34.570173 7fae5ddeef80  1 --
192.168.100.144:6789/0 _finish_bind bind my_inst.addr is
192.168.100.144:6789/0
   -86> 2017-10-09 13:00:34.570177 7fae5ddeef80  1  Processor -- start
   -85> 2017-10-09 13:00:34.570179 7fae5ddeef80 20 Event(0x5558ffa98600
nevent=5000 time_id=1).wakeup
   -84> 2017-10-09 13:00:34.570194 7fae545dd700 20 Event(0x5558ffa98600
nevent=5000 time_id=1).create_file_event create event started fd=23
mask=1 original mask is 0
   -83> 2017-10-09 13:00:34.570202 7fae545dd700 20 EpollDriver.add_event
add event fd=23 cur_mask=0 add_mask=1 to 10
   -82> 2017-10-09 13:00:34.570206 7fae545dd700 20 Event(0x5558ffa98600
nevent=5000 time_id=1).create_file_event create event end fd=23 mask=1
original mask is 1
   -81> 2017-10-09 13:00:34.570308 7fae5ddeef80 10 -- - ready -
   -80> 2017-10-09 13:00:34.570317 7fae5ddeef80  1  Processor -- start
   -79> 2017-10-09 13:00:34.570347 7fae535db700 20 Infiniband init
successfully change queue pair to INIT: qp=0x5558ff7caf00
   -78> 2017-10-09 13:00:34.570360 7fae5ddeef80  0 mon.3 at -1(probing) e5 
my rank is now 3 (was -1)
   -77> 2017-10-09 13:00:34.570362 7fae535db700 20 
RDMAConnectedSocketImpl try_connect nonblock:1, nodelay:1, rbuf_size: 0
   -76> 2017-10-09 13:00:34.570363 7fae5ddeef80  1 --
192.168.100.144:6789/0 shutdown_connections
   -75> 2017-10-09 13:00:34.570379 7fae5ddeef80  1 --
192.168.100.144:6789/0 _send_message--> mon.0 192.168.100.141:6789/0 --
mon_probe(probe cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- ?+0
0x5558ff7c4900
   -74> 2017-10-09 13:00:34.570392 7fae5ddeef80 20 --
192.168.100.144:6789/0 submit_message mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 remote,
192.168.100.141:6789/0, new connection.
   -73> 2017-10-09 13:00:34.570385 7fae535db700 10 NetHandler
generic_connect connect: (111) Connection refused
   -72> 2017-10-09 13:00:34.570395 7fae5ddeef80 10 --
192.168.100.144:6789/0 create_connect 192.168.100.141:6789/0, creating
connection and registering
   -71> 2017-10-09 13:00:34.570400 7fae535db700  1 RDMAStack connect try
connecting failed.
   -70> 2017-10-09 13:00:34.570403 7fae535db700 20 
RDMAConnectedSocketImpl ~RDMAConnectedSocketImpl destruct.
   -69> 2017-10-09 13:00:34.570410 7fae4eb2c700 10 RDMAStack polling
finally delete qp=0x55590059d480
   -68> 2017-10-09 13:00:34.570414 7fae4eb2c700 20 Infiniband ~QueuePair
destroy qp=0x5558ff7caf00
   -67> 2017-10-09 13:00:34.570408 7fae5ddeef80 10 --
192.168.100.144:6789/0 >> 192.168.100.141:6789/0 conn(0x555900b4a000 :-1
s=STATE_NONE pgs=0 cs=0 l=0)._connect csq=0
   -66> 2017-10-09 13:00:34.570418 7fae5ddeef80 20 Event(0x5558ffa98600
nevent=5000 time_id=1).wakeup
   -65> 2017-10-09 13:00:34.570416 7fae535db700 10 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_CONNECTING
pgs=0 cs=0 l=0).fault waiting 0.200000
   -64> 2017-10-09 13:00:34.570425 7fae545dd700 20 --
192.168.100.144:6789/0 >> 192.168.100.141:6789/0 conn(0x555900b4a000 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).process prev state is STATE_CONNECTING
   -63> 2017-10-09 13:00:34.570423 7fae5ddeef80  1 --
192.168.100.144:6789/0 --> 192.168.100.141:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- 0x5558ff7c4900 con 0
   -62> 2017-10-09 13:00:34.570434 7fae535db700 10 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_CONNECTING
pgs=0 cs=0 l=0).handle_write
   -61> 2017-10-09 13:00:34.570437 7fae545dd700 20 Infiniband init started.
   -60> 2017-10-09 13:00:34.570436 7fae5ddeef80 15 --
192.168.100.144:6789/0 >> 192.168.100.141:6789/0 conn(0x555900b4a000 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).send_message inline write is denied,
reschedule m=0x5558ff7c4900
   -59> 2017-10-09 13:00:34.570440 7fae5ddeef80 20 Event(0x5558ffa98600
nevent=5000 time_id=1).wakeup
   -58> 2017-10-09 13:00:34.570443 7fae5ddeef80  1 --
192.168.100.144:6789/0 _send_message--> mon.1 192.168.100.142:6789/0 --
mon_probe(probe cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- ?+0
0x5558ff7c4600
   -57> 2017-10-09 13:00:34.570446 7fae5ddeef80 20 --
192.168.100.144:6789/0 submit_message mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 remote,
192.168.100.142:6789/0, new connection.
   -56> 2017-10-09 13:00:34.570448 7fae5ddeef80 10 --
192.168.100.144:6789/0 create_connect 192.168.100.142:6789/0, creating
connection and registering
   -55> 2017-10-09 13:00:34.570454 7fae5ddeef80 10 --
192.168.100.144:6789/0 >> 192.168.100.142:6789/0 conn(0x555900ade800 :-1
s=STATE_NONE pgs=0 cs=0 l=0)._connect csq=0
   -54> 2017-10-09 13:00:34.570456 7fae5ddeef80 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).wakeup
   -53> 2017-10-09 13:00:34.570462 7fae5ddeef80  1 --
192.168.100.144:6789/0 --> 192.168.100.142:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- 0x5558ff7c4600 con 0
   -52> 2017-10-09 13:00:34.570467 7fae5ddeef80 15 --
192.168.100.144:6789/0 >> 192.168.100.142:6789/0 conn(0x555900ade800 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).send_message inline write is denied,
reschedule m=0x5558ff7c4600
   -51> 2017-10-09 13:00:34.570470 7fae5ddeef80  1 --
192.168.100.144:6789/0 _send_message--> mon.2 192.168.100.143:6789/0 --
mon_probe(probe cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- ?+0
0x5558ff7c4c00
   -50> 2017-10-09 13:00:34.570473 7fae5ddeef80 20 --
192.168.100.144:6789/0 submit_message mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 remote,
192.168.100.143:6789/0, new connection.
   -49> 2017-10-09 13:00:34.570475 7fae5ddeef80 10 --
192.168.100.144:6789/0 create_connect 192.168.100.143:6789/0, creating
connection and registering
   -48> 2017-10-09 13:00:34.570481 7fae5ddeef80 10 --
192.168.100.144:6789/0 >> 192.168.100.143:6789/0 conn(0x555900add000 :-1
s=STATE_NONE pgs=0 cs=0 l=0)._connect csq=0
   -47> 2017-10-09 13:00:34.570474 7fae53ddc700 20 --
192.168.100.144:6789/0 >> 192.168.100.142:6789/0 conn(0x555900ade800 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).process prev state is STATE_CONNECTING
   -46> 2017-10-09 13:00:34.570485 7fae5ddeef80 20 Event(0x5558ffa98b80
nevent=5000 time_id=2).wakeup
   -45> 2017-10-09 13:00:34.570490 7fae53ddc700 20 Infiniband init started.
   -44> 2017-10-09 13:00:34.570491 7fae5ddeef80  1 --
192.168.100.144:6789/0 --> 192.168.100.143:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- 0x5558ff7c4c00 con 0
   -43> 2017-10-09 13:00:34.570493 7fae5ddeef80 15 --
192.168.100.144:6789/0 >> 192.168.100.143:6789/0 conn(0x555900add000 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).send_message inline write is denied,
reschedule m=0x5558ff7c4c00
   -42> 2017-10-09 13:00:34.570504 7fae535db700 20 --
192.168.100.144:6789/0 >> 192.168.100.143:6789/0 conn(0x555900add000 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).process prev state is STATE_CONNECTING
   -41> 2017-10-09 13:00:34.570520 7fae535db700 20 Infiniband init started.
   -40> 2017-10-09 13:00:34.570616 7fae545dd700 20 Infiniband init
successfully create queue pair: qp=0x5558ff7cb2c0
   -39> 2017-10-09 13:00:34.570664 7fae53ddc700 20 Infiniband init
successfully create queue pair: qp=0x5558ff7cb400
   -38> 2017-10-09 13:00:34.570676 7fae535db700 20 Infiniband init
successfully create queue pair: qp=0x5558ff7cb540
   -37> 2017-10-09 13:00:34.570861 7fae545dd700 20 Infiniband init
successfully change queue pair to INIT: qp=0x5558ff7cb2c0
   -36> 2017-10-09 13:00:34.570884 7fae545dd700 20 
RDMAConnectedSocketImpl try_connect nonblock:1, nodelay:1, rbuf_size: 0
   -35> 2017-10-09 13:00:34.570945 7fae545dd700 10 NetHandler
generic_connect connect: (111) Connection refused
   -34> 2017-10-09 13:00:34.570956 7fae545dd700  1 RDMAStack connect try
connecting failed.
   -33> 2017-10-09 13:00:34.570959 7fae545dd700 20 
RDMAConnectedSocketImpl ~RDMAConnectedSocketImpl destruct.
   -32> 2017-10-09 13:00:34.570969 7fae4eb2c700 10 RDMAStack polling
finally delete qp=0x55590059d560
   -31> 2017-10-09 13:00:34.570969 7fae545dd700 10 --
192.168.100.144:6789/0 >> 192.168.100.141:6789/0 conn(0x555900b4a000 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).fault waiting 0.200000
   -30> 2017-10-09 13:00:34.570974 7fae4eb2c700 20 Infiniband ~QueuePair
destroy qp=0x5558ff7cb2c0
   -29> 2017-10-09 13:00:34.570981 7fae545dd700 10 --
192.168.100.144:6789/0 >> 192.168.100.141:6789/0 conn(0x555900b4a000 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).handle_write
   -28> 2017-10-09 13:00:34.570999 7fae53ddc700 20 Infiniband init
successfully change queue pair to INIT: qp=0x5558ff7cb400
   -27> 2017-10-09 13:00:34.571102 7fae535db700 20 Infiniband init
successfully change queue pair to INIT: qp=0x5558ff7cb540
   -26> 2017-10-09 13:00:34.571244 7fae4eb2c700 20 Infiniband
rearm_notify started.
   -25> 2017-10-09 13:00:34.571265 7fae4eb2c700 20 Infiniband
rearm_notify started.
   -24> 2017-10-09 13:00:34.571274 7fae53ddc700 20 
RDMAConnectedSocketImpl try_connect nonblock:1, nodelay:1, rbuf_size: 0
   -23> 2017-10-09 13:00:34.571278 7fae535db700 20 
RDMAConnectedSocketImpl try_connect nonblock:1, nodelay:1, rbuf_size: 0
   -22> 2017-10-09 13:00:34.571378 7fae53ddc700 20 
RDMAConnectedSocketImpl try_connect tcp_fd: 35
   -21> 2017-10-09 13:00:34.571390 7fae53ddc700 10 Infiniband send_msg
sending: 0, 4543, 5515815, 0, fe80000000000000268a07fffee25d81
   -20> 2017-10-09 13:00:34.571396 7fae535db700 10 NetHandler
generic_connect connect: (111) Connection refused
   -19> 2017-10-09 13:00:34.571402 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event started fd=35
mask=1 original mask is 0
   -18> 2017-10-09 13:00:34.571406 7fae53ddc700 20 EpollDriver.add_event
add event fd=35 cur_mask=0 add_mask=1 to 13
   -17> 2017-10-09 13:00:34.571411 7fae535db700  1 RDMAStack connect try
connecting failed.
   -16> 2017-10-09 13:00:34.571409 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event end fd=35 mask=1
original mask is 1
   -15> 2017-10-09 13:00:34.571415 7fae535db700 20 
RDMAConnectedSocketImpl ~RDMAConnectedSocketImpl destruct.
   -14> 2017-10-09 13:00:34.571416 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event started fd=32
mask=1 original mask is 0
   -13> 2017-10-09 13:00:34.571419 7fae53ddc700 20 EpollDriver.add_event
add event fd=32 cur_mask=0 add_mask=1 to 13
   -12> 2017-10-09 13:00:34.571421 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event end fd=32 mask=1
original mask is 1
   -11> 2017-10-09 13:00:34.571419 7fae535db700 10 --
192.168.100.144:6789/0 >> 192.168.100.143:6789/0 conn(0x555900add000 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).fault waiting 0.200000
   -10> 2017-10-09 13:00:34.571423 7fae53ddc700 20 --
192.168.100.144:6789/0 >> 192.168.100.142:6789/0 conn(0x555900ade800 :-1
s=STATE_CONNECTING_RE pgs=0 cs=0 l=0).process prev state is STATE_CONNECTING
    -9> 2017-10-09 13:00:34.571426 7fae535db700 10 --
192.168.100.144:6789/0 >> 192.168.100.143:6789/0 conn(0x555900add000 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).handle_write
    -8> 2017-10-09 13:00:34.571430 7fae53ddc700 10 --
192.168.100.144:6789/0 >> 192.168.100.142:6789/0 conn(0x555900ade800 :-1
s=STATE_CONNECTING_RE pgs=0 cs=0 l=0)._process_connection nonblock
connect inprogress
    -7> 2017-10-09 13:00:34.571436 7fae53ddc700 10 --
192.168.100.144:6789/0 >> 192.168.100.142:6789/0 conn(0x555900ade800 :-1
s=STATE_CONNECTING_RE pgs=0 cs=0 l=0).handle_write
    -6> 2017-10-09 13:00:34.571994 7fae53ddc700 20 
RDMAConnectedSocketImpl handle_connection QP: 4543 tcp_fd: 35 notify_fd: 32
    -5> 2017-10-09 13:00:34.572010 7fae53ddc700  5 Infiniband recv_msg
recevd: 0, 3643, 3062654, 0, fe80000000000000268a07fffef98731
    -4> 2017-10-09 13:00:34.572014 7fae53ddc700 20 
RDMAConnectedSocketImpl handle_connection peer msg :  < 3643, 3062654, 0, 0>
    -3> 2017-10-09 13:00:34.572016 7fae53ddc700 20 
RDMAConnectedSocketImpl activate Choosing gid_index 24, sl 3
    -2> 2017-10-09 13:00:34.572021 7fae53ddc700 20 
RDMAConnectedSocketImpl activate transition to RTR state successfully.
    -1> 2017-10-09 13:00:34.572022 7fae53ddc700 -1 
RDMAConnectedSocketImpl activate failed to transition to RTS state: (22)
Invalid argument
     0> 2017-10-09 13:00:34.573257 7fae53ddc700 -1
/home/builder/source/ceph-12.2.1/src/msg/async/rdma/RDMAConnectedSocketImpl.cc:
In function 'void RDMAConnectedSocketImpl::handle_connection()' thread
7fae53ddc700 time 2017-10-09 13:00:34.572032
/home/builder/source/ceph-12.2.1/src/msg/async/rdma/RDMAConnectedSocketImpl.cc:
221: FAILED assert(!r)

 ceph version 12.2.1 (fc129ad90a65dc0b419412e77cb85ac230da42a6) luminous
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x5558f5c809f2]
 2: (RDMAConnectedSocketImpl::handle_connection()+0xb4a) [0x5558f5fd656a]
 3: (EventCenter::process_events(int, std::chrono::duration<unsigned
long, std::ratio<1l, 1000000000l> >*)+0xa08) [0x5558f5d46028]
 4: (()+0x697128) [0x5558f5d4a128]
 5: (()+0xb9e6f) [0x7fae5af0fe6f]
 6: (()+0x7494) [0x7fae5d19e494]
 7: (clone()+0x3f) [0x7fae5a684aff]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.



-- 
Gerhard W. Recher

net4sec UG (haftungsbeschränkt)
Leitenweg 6
86929 Penzing

+49 171 4802507




More information about the pve-devel mailing list