[pve-devel] ceph rdma mellanox ceph 12.2.1
Gerhard W. Recher
gerhard.recher at net4sec.com
Mon Oct 9 14:00:39 CEST 2017
Hi Fellows,
Fabian suggested me to open a thread on dev-list :
based on bug-report: https://bugzilla.proxmox.com/show_bug.cgi?id=1521
pveversion -v
proxmox-ve: 5.0-23 (running kernel: 4.13.3-1-pve)
pve-manager: 5.0-32 (running version: 5.0-32/2560e073)
pve-kernel-4.13.3-1-pve: 4.13.3-2
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.15-1-pve: 4.10.15-15
pve-kernel-4.10.17-3-pve: 4.10.17-23
pve-kernel-4.10.11-1-pve: 4.10.11-9
pve-kernel-4.10.17-1-pve: 4.10.17-18
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-14
qemu-server: 5.0-15
pve-firmware: 2.0-3
libpve-common-perl: 5.0-18
libpve-guest-common-perl: 2.0-12
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-15
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.1-1
pve-container: 2.0-16
pve-firewall: 3.0-3
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.11-pve18~bpo90
ceph: 12.2.1-pve1
if i enable rdma in ceph.conf
example node#4
showgids
DEV PORT INDEX GID
IPv4 VER DEV
--- ---- ----- ---
------------ --- ---
mlx4_0 1 0
fe80:0000:0000:0000:268a:07ff:fee2:5d80 v1 ens1
mlx4_0 1 1
fe80:0000:0000:0000:268a:07ff:fee2:5d80 v2 ens1
mlx4_0 1 2 0000:0000:0000:0000:0000:ffff:c0a8:dd90
192.168.221.144 v1 vmbr0
mlx4_0 1 3 0000:0000:0000:0000:0000:ffff:c0a8:dd90
192.168.221.144 v2 vmbr0
mlx4_0 2 0
fe80:0000:0000:0000:268a:07ff:fee2:5d81 v1 ens1d1
mlx4_0 2 1
fe80:0000:0000:0000:268a:07ff:fee2:5d81 v2 ens1d1
mlx4_0 2 2 0000:0000:0000:0000:0000:ffff:c0a8:6490
192.168.100.144 v1 ens1d1
mlx4_0 2 3 0000:0000:0000:0000:0000:ffff:c0a8:6490
192.168.100.144 v2 ens1d1
n_gids_found=8
[global]
#ms_type=xio
ms_type=async+rdma
ms_cluster_type = async+rdma
ms_async_rdma_port_num=2
ms_async_rdma_device_name=mlx4_0
#1
#ms_async_rdma_local_gid=0000:0000:0000:0000:0000:ffff:c0a8:648d
#fe80:0000:0000:0000:268a:07ff:fee2:6071
#2
#ms_async_rdma_local_gid=0000:0000:0000:0000:0000:ffff:c0a8:648e
#fe80:0000:0000:0000:268a:07ff:fef9:8731
#3
#ms_async_rdma_local_gid=0000:0000:0000:0000:0000:ffff:c0a8:648f
#fe80:0000:0000:0000:268a:07ff:fef9:8791
#4
ms_async_rdma_local_gid=0000:0000:0000:0000:0000:ffff:c0a8:6490
#fe80:0000:0000:0000:268a:07ff:fee2:5d81
I will end up in not operating ceph, osd,mgr and mon processes are up,
but no operations will have success because of asserting!
Mon log:
2017-10-09 13:00:34.307728 7fae5ddeef80 0 set uid:gid to 64045:64045
(ceph:ceph)
2017-10-09 13:00:34.307738 7fae5ddeef80 0 ceph version 12.2.1
(fc129ad90a65dc0b419412e77cb85ac230da42a6) luminous (stable), process
(unknown), pid 20268
2017-10-09 13:00:34.307802 7fae5ddeef80 0 pidfile_write: ignore empty
--pid-file
2017-10-09 13:00:34.314461 7fae5ddeef80 0 load: jerasure load: lrc
load: isa
2017-10-09 13:00:34.315570 7fae5ddeef80 1 leveldb: Recovering log #73065
2017-10-09 13:00:34.369901 7fae5ddeef80 1 leveldb: Level-0 table
#73083: started
2017-10-09 13:00:34.469844 7fae5ddeef80 1 leveldb: Level-0 table
#73083: 27766571 bytes OK
2017-10-09 13:00:34.474225 7fae5ddeef80 1 leveldb: Delete type=0 #73065
2017-10-09 13:00:34.479041 7fae5ddeef80 1 leveldb: Delete type=3 #72279
2017-10-09 13:00:34.479496 7fae5ddeef80 1 RDMAStack RDMAStack
ms_async_rdma_enable_hugepage value is: 0
2017-10-09 13:00:34.479509 7fae5ddeef80 20 RDMAStack RDMAStack
constructing RDMAStack...
2017-10-09 13:00:34.479521 7fae5ddeef80 20 RDMAStack creating
RDMAStack:0x5558ff76fe70 with dispatcher:0x5558ff7b48c0
2017-10-09 13:00:34.479558 7fae545dd700 2 Event(0x5558ffa98600
nevent=5000 time_id=1).set_owner idx=0 owner=140386716473088
2017-10-09 13:00:34.479574 7fae545dd700 20 Event(0x5558ffa98600
nevent=5000 time_id=1).create_file_event create event started fd=11
mask=1 original mask is 0
2017-10-09 13:00:34.479576 7fae545dd700 20 EpollDriver.add_event add
event fd=11 cur_mask=0 add_mask=1 to 10
2017-10-09 13:00:34.479581 7fae545dd700 20 Event(0x5558ffa98600
nevent=5000 time_id=1).create_file_event create event end fd=11 mask=1
original mask is 1
2017-10-09 13:00:34.479582 7fae545dd700 10 stack operator() starting
2017-10-09 13:00:34.479584 7fae53ddc700 2 Event(0x5558ffa988c0
nevent=5000 time_id=1).set_owner idx=1 owner=140386708080384
2017-10-09 13:00:34.479592 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event started fd=14
mask=1 original mask is 0
2017-10-09 13:00:34.479595 7fae53ddc700 20 EpollDriver.add_event add
event fd=14 cur_mask=0 add_mask=1 to 13
2017-10-09 13:00:34.479598 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event end fd=14 mask=1
original mask is 1
2017-10-09 13:00:34.479601 7fae53ddc700 10 stack operator() starting
2017-10-09 13:00:34.479600 7fae535db700 2 Event(0x5558ffa98b80
nevent=5000 time_id=1).set_owner idx=2 owner=140386699687680
2017-10-09 13:00:34.479603 7fae535db700 20 Event(0x5558ffa98b80
nevent=5000 time_id=1).create_file_event create event started fd=17
mask=1 original mask is 0
2017-10-09 13:00:34.479604 7fae535db700 20 EpollDriver.add_event add
event fd=17 cur_mask=0 add_mask=1 to 16
2017-10-09 13:00:34.479606 7fae535db700 20 Event(0x5558ffa98b80
nevent=5000 time_id=1).create_file_event create event end fd=17 mask=1
original mask is 1
2017-10-09 13:00:34.479607 7fae535db700 10 stack operator() starting
2017-10-09 13:00:34.479625 7fae5ddeef80 0 starting mon.3 rank 3 at
public addr 192.168.100.144:6789/0 at bind addr 192.168.100.144:6789/0
mon_data /var/lib/ceph/mon/ceph-3 fsid cb0aba69-bad9-4d30-b163-c19f0fd1ec53
2017-10-09 13:00:34.479636 7fae5ddeef80 10 -- - bind bind
192.168.100.144:6789/0
2017-10-09 13:00:34.479638 7fae5ddeef80 10 -- - bind Network Stack is
not ready for bind yet - postponed
2017-10-09 13:00:34.479645 7fae5ddeef80 0 starting mon.3 rank 3 at
192.168.100.144:6789/0 mon_data /var/lib/ceph/mon/ceph-3 fsid
cb0aba69-bad9-4d30-b163-c19f0fd1ec53
2017-10-09 13:00:34.480237 7fae5ddeef80 0 mon.3 at -1(probing).mds e1
print_map
e1
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file
layout v2}
legacy client fscid: -1
No filesystems configured
2017-10-09 13:00:34.480408 7fae5ddeef80 0 mon.3 at -1(probing).osd e12305
crush map has features 283675107524608, adjusting msgr requires
2017-10-09 13:00:34.480415 7fae5ddeef80 0 mon.3 at -1(probing).osd e12305
crush map has features 283675107524608, adjusting msgr requires
2017-10-09 13:00:34.480417 7fae5ddeef80 0 mon.3 at -1(probing).osd e12305
crush map has features 720859615486820352, adjusting msgr requires
2017-10-09 13:00:34.480418 7fae5ddeef80 0 mon.3 at -1(probing).osd e12305
crush map has features 283675107524608, adjusting msgr requires
2017-10-09 13:00:34.481692 7fae5ddeef80 10 -- - create_connect
192.168.100.144:6828/21406, creating connection and registering
2017-10-09 13:00:34.481697 7fae5ddeef80 10 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_NONE pgs=0
cs=0 l=0)._connect csq=0
2017-10-09 13:00:34.481701 7fae5ddeef80 20 Event(0x5558ffa98b80
nevent=5000 time_id=1).wakeup
2017-10-09 13:00:34.481706 7fae5ddeef80 10 -- - get_connection
mgr.17664152 192.168.100.144:6828/21406 new 0x555900b4b800
2017-10-09 13:00:34.481709 7fae5ddeef80 1 -- - -->
192.168.100.144:6828/21406 -- mgropen(unknown.3) v2 -- 0x5558ffa99080 con 0
2017-10-09 13:00:34.481713 7fae5ddeef80 15 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_CONNECTING
pgs=0 cs=0 l=0).send_message inline write is denied, reschedule
m=0x5558ffa99080
2017-10-09 13:00:34.481740 7fae535db700 20 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_CONNECTING
pgs=0 cs=0 l=0).process prev state is STATE_CONNECTING
2017-10-09 13:00:34.481855 7fae5ddeef80 1 -- - start start
2017-10-09 13:00:34.481859 7fae5ddeef80 1 -- - start start
2017-10-09 13:00:34.482042 7fae5ddeef80 10 -- - ready -
2017-10-09 13:00:34.482045 7fae5ddeef80 10 -- - bind bind
192.168.100.144:6789/0
2017-10-09 13:00:34.482048 7fae5ddeef80 10 Processor -- bind
2017-10-09 13:00:34.482049 7fae5ddeef80 20 Event(0x5558ffa98600
nevent=5000 time_id=1).wakeup
2017-10-09 13:00:34.482427 7fae535db700 10 Infiniband binding_port port
1 is not what we want. state: 4)
2017-10-09 13:00:34.482522 7fae535db700 1 Infiniband binding_port found
active port 2
2017-10-09 13:00:34.482530 7fae535db700 1 Infiniband init assigning:
1024 receive buffers
2017-10-09 13:00:34.482532 7fae535db700 1 Infiniband init assigning:
1024 send buffers
2017-10-09 13:00:34.482533 7fae535db700 1 Infiniband init device allow
4194303 completion entries
2017-10-09 13:00:34.565741 7fae535db700 20 Infiniband init started.
2017-10-09 13:00:34.565783 7fae535db700 20 Infiniband init started.
2017-10-09 13:00:34.567858 7fae535db700 20 Infiniband init successfully
create cq=0x5558ff77ccb0
2017-10-09 13:00:34.569959 7fae535db700 20 Infiniband init successfully
create cq=0x5558ff77cd80
2017-10-09 13:00:34.570039 7fae535db700 20 Infiniband init started.
2017-10-09 13:00:34.570050 7fae4eb2c700 20 RDMAStack polling going to
poll tx cq: 0x555900e7c540 rx cq: 0x555900e7c570
2017-10-09 13:00:34.570088 7fae545dd700 20 RDMAServerSocketImpl listen
bind to 192.168.100.144:6789 on port 6789
2017-10-09 13:00:34.570130 7fae535db700 20 Infiniband init successfully
create queue pair: qp=0x5558ff7caf00
2017-10-09 13:00:34.570145 7fae5ddeef80 10 Processor -- bind bound to
192.168.100.144:6789/0
2017-10-09 13:00:34.570169 7fae5ddeef80 1 -- 192.168.100.144:6789/0
learned_addr learned my addr 192.168.100.144:6789/0
2017-10-09 13:00:34.570173 7fae5ddeef80 1 -- 192.168.100.144:6789/0
_finish_bind bind my_inst.addr is 192.168.100.144:6789/0
2017-10-09 13:00:34.570177 7fae5ddeef80 1 Processor -- start
2017-10-09 13:00:34.570179 7fae5ddeef80 20 Event(0x5558ffa98600
nevent=5000 time_id=1).wakeup
2017-10-09 13:00:34.570194 7fae545dd700 20 Event(0x5558ffa98600
nevent=5000 time_id=1).create_file_event create event started fd=23
mask=1 original mask is 0
2017-10-09 13:00:34.570202 7fae545dd700 20 EpollDriver.add_event add
event fd=23 cur_mask=0 add_mask=1 to 10
2017-10-09 13:00:34.570206 7fae545dd700 20 Event(0x5558ffa98600
nevent=5000 time_id=1).create_file_event create event end fd=23 mask=1
original mask is 1
2017-10-09 13:00:34.570308 7fae5ddeef80 10 -- - ready -
2017-10-09 13:00:34.570317 7fae5ddeef80 1 Processor -- start
2017-10-09 13:00:34.570347 7fae535db700 20 Infiniband init successfully
change queue pair to INIT: qp=0x5558ff7caf00
2017-10-09 13:00:34.570360 7fae5ddeef80 0 mon.3 at -1(probing) e5 my rank
is now 3 (was -1)
2017-10-09 13:00:34.570362 7fae535db700 20 RDMAConnectedSocketImpl
try_connect nonblock:1, nodelay:1, rbuf_size: 0
2017-10-09 13:00:34.570363 7fae5ddeef80 1 -- 192.168.100.144:6789/0
shutdown_connections
2017-10-09 13:00:34.570379 7fae5ddeef80 1 -- 192.168.100.144:6789/0
_send_message--> mon.0 192.168.100.141:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- ?+0 0x5558ff7c4900
2017-10-09 13:00:34.570392 7fae5ddeef80 20 -- 192.168.100.144:6789/0
submit_message mon_probe(probe cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name
3) v6 remote, 192.168.100.141:6789/0, new connection.
2017-10-09 13:00:34.570385 7fae535db700 10 NetHandler generic_connect
connect: (111) Connection refused
2017-10-09 13:00:34.570395 7fae5ddeef80 10 -- 192.168.100.144:6789/0
create_connect 192.168.100.141:6789/0, creating connection and registering
2017-10-09 13:00:34.570400 7fae535db700 1 RDMAStack connect try
connecting failed.
2017-10-09 13:00:34.570403 7fae535db700 20 RDMAConnectedSocketImpl
~RDMAConnectedSocketImpl destruct.
2017-10-09 13:00:34.570410 7fae4eb2c700 10 RDMAStack polling finally
delete qp=0x55590059d480
2017-10-09 13:00:34.570414 7fae4eb2c700 20 Infiniband ~QueuePair destroy
qp=0x5558ff7caf00
2017-10-09 13:00:34.570408 7fae5ddeef80 10 -- 192.168.100.144:6789/0 >>
192.168.100.141:6789/0 conn(0x555900b4a000 :-1 s=STATE_NONE pgs=0 cs=0
l=0)._connect csq=0
2017-10-09 13:00:34.570418 7fae5ddeef80 20 Event(0x5558ffa98600
nevent=5000 time_id=1).wakeup
2017-10-09 13:00:34.570416 7fae535db700 10 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_CONNECTING
pgs=0 cs=0 l=0).fault waiting 0.200000
2017-10-09 13:00:34.570425 7fae545dd700 20 -- 192.168.100.144:6789/0 >>
192.168.100.141:6789/0 conn(0x555900b4a000 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).process prev state is STATE_CONNECTING
2017-10-09 13:00:34.570423 7fae5ddeef80 1 -- 192.168.100.144:6789/0 -->
192.168.100.141:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- 0x5558ff7c4900 con 0
2017-10-09 13:00:34.570434 7fae535db700 10 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_CONNECTING
pgs=0 cs=0 l=0).handle_write
2017-10-09 13:00:34.570437 7fae545dd700 20 Infiniband init started.
2017-10-09 13:00:34.570436 7fae5ddeef80 15 -- 192.168.100.144:6789/0 >>
192.168.100.141:6789/0 conn(0x555900b4a000 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).send_message inline write is denied, reschedule m=0x5558ff7c4900
2017-10-09 13:00:34.570440 7fae5ddeef80 20 Event(0x5558ffa98600
nevent=5000 time_id=1).wakeup
2017-10-09 13:00:34.570443 7fae5ddeef80 1 -- 192.168.100.144:6789/0
_send_message--> mon.1 192.168.100.142:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- ?+0 0x5558ff7c4600
2017-10-09 13:00:34.570446 7fae5ddeef80 20 -- 192.168.100.144:6789/0
submit_message mon_probe(probe cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name
3) v6 remote, 192.168.100.142:6789/0, new connection.
2017-10-09 13:00:34.570448 7fae5ddeef80 10 -- 192.168.100.144:6789/0
create_connect 192.168.100.142:6789/0, creating connection and registering
2017-10-09 13:00:34.570454 7fae5ddeef80 10 -- 192.168.100.144:6789/0 >>
192.168.100.142:6789/0 conn(0x555900ade800 :-1 s=STATE_NONE pgs=0 cs=0
l=0)._connect csq=0
2017-10-09 13:00:34.570456 7fae5ddeef80 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).wakeup
2017-10-09 13:00:34.570462 7fae5ddeef80 1 -- 192.168.100.144:6789/0 -->
192.168.100.142:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- 0x5558ff7c4600 con 0
2017-10-09 13:00:34.570467 7fae5ddeef80 15 -- 192.168.100.144:6789/0 >>
192.168.100.142:6789/0 conn(0x555900ade800 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).send_message inline write is denied, reschedule m=0x5558ff7c4600
2017-10-09 13:00:34.570470 7fae5ddeef80 1 -- 192.168.100.144:6789/0
_send_message--> mon.2 192.168.100.143:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- ?+0 0x5558ff7c4c00
2017-10-09 13:00:34.570473 7fae5ddeef80 20 -- 192.168.100.144:6789/0
submit_message mon_probe(probe cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name
3) v6 remote, 192.168.100.143:6789/0, new connection.
2017-10-09 13:00:34.570475 7fae5ddeef80 10 -- 192.168.100.144:6789/0
create_connect 192.168.100.143:6789/0, creating connection and registering
2017-10-09 13:00:34.570481 7fae5ddeef80 10 -- 192.168.100.144:6789/0 >>
192.168.100.143:6789/0 conn(0x555900add000 :-1 s=STATE_NONE pgs=0 cs=0
l=0)._connect csq=0
2017-10-09 13:00:34.570474 7fae53ddc700 20 -- 192.168.100.144:6789/0 >>
192.168.100.142:6789/0 conn(0x555900ade800 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).process prev state is STATE_CONNECTING
2017-10-09 13:00:34.570485 7fae5ddeef80 20 Event(0x5558ffa98b80
nevent=5000 time_id=2).wakeup
2017-10-09 13:00:34.570490 7fae53ddc700 20 Infiniband init started.
2017-10-09 13:00:34.570491 7fae5ddeef80 1 -- 192.168.100.144:6789/0 -->
192.168.100.143:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- 0x5558ff7c4c00 con 0
2017-10-09 13:00:34.570493 7fae5ddeef80 15 -- 192.168.100.144:6789/0 >>
192.168.100.143:6789/0 conn(0x555900add000 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).send_message inline write is denied, reschedule m=0x5558ff7c4c00
2017-10-09 13:00:34.570504 7fae535db700 20 -- 192.168.100.144:6789/0 >>
192.168.100.143:6789/0 conn(0x555900add000 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).process prev state is STATE_CONNECTING
2017-10-09 13:00:34.570520 7fae535db700 20 Infiniband init started.
2017-10-09 13:00:34.570616 7fae545dd700 20 Infiniband init successfully
create queue pair: qp=0x5558ff7cb2c0
2017-10-09 13:00:34.570664 7fae53ddc700 20 Infiniband init successfully
create queue pair: qp=0x5558ff7cb400
2017-10-09 13:00:34.570676 7fae535db700 20 Infiniband init successfully
create queue pair: qp=0x5558ff7cb540
2017-10-09 13:00:34.570861 7fae545dd700 20 Infiniband init successfully
change queue pair to INIT: qp=0x5558ff7cb2c0
2017-10-09 13:00:34.570884 7fae545dd700 20 RDMAConnectedSocketImpl
try_connect nonblock:1, nodelay:1, rbuf_size: 0
2017-10-09 13:00:34.570945 7fae545dd700 10 NetHandler generic_connect
connect: (111) Connection refused
2017-10-09 13:00:34.570956 7fae545dd700 1 RDMAStack connect try
connecting failed.
2017-10-09 13:00:34.570959 7fae545dd700 20 RDMAConnectedSocketImpl
~RDMAConnectedSocketImpl destruct.
2017-10-09 13:00:34.570969 7fae4eb2c700 10 RDMAStack polling finally
delete qp=0x55590059d560
2017-10-09 13:00:34.570969 7fae545dd700 10 -- 192.168.100.144:6789/0 >>
192.168.100.141:6789/0 conn(0x555900b4a000 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).fault waiting 0.200000
2017-10-09 13:00:34.570974 7fae4eb2c700 20 Infiniband ~QueuePair destroy
qp=0x5558ff7cb2c0
2017-10-09 13:00:34.570981 7fae545dd700 10 -- 192.168.100.144:6789/0 >>
192.168.100.141:6789/0 conn(0x555900b4a000 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).handle_write
2017-10-09 13:00:34.570999 7fae53ddc700 20 Infiniband init successfully
change queue pair to INIT: qp=0x5558ff7cb400
2017-10-09 13:00:34.571102 7fae535db700 20 Infiniband init successfully
change queue pair to INIT: qp=0x5558ff7cb540
2017-10-09 13:00:34.571244 7fae4eb2c700 20 Infiniband rearm_notify started.
2017-10-09 13:00:34.571265 7fae4eb2c700 20 Infiniband rearm_notify started.
2017-10-09 13:00:34.571274 7fae53ddc700 20 RDMAConnectedSocketImpl
try_connect nonblock:1, nodelay:1, rbuf_size: 0
2017-10-09 13:00:34.571278 7fae535db700 20 RDMAConnectedSocketImpl
try_connect nonblock:1, nodelay:1, rbuf_size: 0
2017-10-09 13:00:34.571378 7fae53ddc700 20 RDMAConnectedSocketImpl
try_connect tcp_fd: 35
2017-10-09 13:00:34.571390 7fae53ddc700 10 Infiniband send_msg sending:
0, 4543, 5515815, 0, fe80000000000000268a07fffee25d81
2017-10-09 13:00:34.571396 7fae535db700 10 NetHandler generic_connect
connect: (111) Connection refused
2017-10-09 13:00:34.571402 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event started fd=35
mask=1 original mask is 0
2017-10-09 13:00:34.571406 7fae53ddc700 20 EpollDriver.add_event add
event fd=35 cur_mask=0 add_mask=1 to 13
2017-10-09 13:00:34.571411 7fae535db700 1 RDMAStack connect try
connecting failed.
2017-10-09 13:00:34.571409 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event end fd=35 mask=1
original mask is 1
2017-10-09 13:00:34.571415 7fae535db700 20 RDMAConnectedSocketImpl
~RDMAConnectedSocketImpl destruct.
2017-10-09 13:00:34.571416 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event started fd=32
mask=1 original mask is 0
2017-10-09 13:00:34.571419 7fae53ddc700 20 EpollDriver.add_event add
event fd=32 cur_mask=0 add_mask=1 to 13
2017-10-09 13:00:34.571421 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event end fd=32 mask=1
original mask is 1
2017-10-09 13:00:34.571419 7fae535db700 10 -- 192.168.100.144:6789/0 >>
192.168.100.143:6789/0 conn(0x555900add000 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).fault waiting 0.200000
2017-10-09 13:00:34.571423 7fae53ddc700 20 -- 192.168.100.144:6789/0 >>
192.168.100.142:6789/0 conn(0x555900ade800 :-1 s=STATE_CONNECTING_RE
pgs=0 cs=0 l=0).process prev state is STATE_CONNECTING
2017-10-09 13:00:34.571426 7fae535db700 10 -- 192.168.100.144:6789/0 >>
192.168.100.143:6789/0 conn(0x555900add000 :-1 s=STATE_CONNECTING pgs=0
cs=0 l=0).handle_write
2017-10-09 13:00:34.571430 7fae53ddc700 10 -- 192.168.100.144:6789/0 >>
192.168.100.142:6789/0 conn(0x555900ade800 :-1 s=STATE_CONNECTING_RE
pgs=0 cs=0 l=0)._process_connection nonblock connect inprogress
2017-10-09 13:00:34.571436 7fae53ddc700 10 -- 192.168.100.144:6789/0 >>
192.168.100.142:6789/0 conn(0x555900ade800 :-1 s=STATE_CONNECTING_RE
pgs=0 cs=0 l=0).handle_write
2017-10-09 13:00:34.571994 7fae53ddc700 20 RDMAConnectedSocketImpl
handle_connection QP: 4543 tcp_fd: 35 notify_fd: 32
2017-10-09 13:00:34.572010 7fae53ddc700 5 Infiniband recv_msg recevd:
0, 3643, 3062654, 0, fe80000000000000268a07fffef98731
2017-10-09 13:00:34.572014 7fae53ddc700 20 RDMAConnectedSocketImpl
handle_connection peer msg : < 3643, 3062654, 0, 0>
2017-10-09 13:00:34.572016 7fae53ddc700 20 RDMAConnectedSocketImpl
activate Choosing gid_index 24, sl 3
2017-10-09 13:00:34.572021 7fae53ddc700 20 RDMAConnectedSocketImpl
activate transition to RTR state successfully.
2017-10-09 13:00:34.572022 7fae53ddc700 -1 RDMAConnectedSocketImpl
activate failed to transition to RTS state: (22) Invalid argument
2017-10-09 13:00:34.573257 7fae53ddc700 -1
/home/builder/source/ceph-12.2.1/src/msg/async/rdma/RDMAConnectedSocketImpl.cc:
In function 'void RDMAConnectedSocketImpl::handle_connection()' thread
7fae53ddc700 time 2017-10-09 13:00:34.572032
/home/builder/source/ceph-12.2.1/src/msg/async/rdma/RDMAConnectedSocketImpl.cc:
221: FAILED assert(!r)
ceph version 12.2.1 (fc129ad90a65dc0b419412e77cb85ac230da42a6) luminous
(stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x5558f5c809f2]
2: (RDMAConnectedSocketImpl::handle_connection()+0xb4a) [0x5558f5fd656a]
3: (EventCenter::process_events(int, std::chrono::duration<unsigned
long, std::ratio<1l, 1000000000l> >*)+0xa08) [0x5558f5d46028]
4: (()+0x697128) [0x5558f5d4a128]
5: (()+0xb9e6f) [0x7fae5af0fe6f]
6: (()+0x7494) [0x7fae5d19e494]
7: (clone()+0x3f) [0x7fae5a684aff]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- begin dump of recent events ---
-173> 2017-10-09 13:00:34.302946 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command perfcounters_dump hook 0x5558ff7321b0
-172> 2017-10-09 13:00:34.302953 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command 1 hook 0x5558ff7321b0
-171> 2017-10-09 13:00:34.302955 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command perf dump hook 0x5558ff7321b0
-170> 2017-10-09 13:00:34.302957 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command perfcounters_schema hook 0x5558ff7321b0
-169> 2017-10-09 13:00:34.302958 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command perf histogram dump hook 0x5558ff7321b0
-168> 2017-10-09 13:00:34.302960 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command 2 hook 0x5558ff7321b0
-167> 2017-10-09 13:00:34.302961 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command perf schema hook 0x5558ff7321b0
-166> 2017-10-09 13:00:34.302962 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command perf histogram schema hook 0x5558ff7321b0
-165> 2017-10-09 13:00:34.302964 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command perf reset hook 0x5558ff7321b0
-164> 2017-10-09 13:00:34.302965 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command config show hook 0x5558ff7321b0
-163> 2017-10-09 13:00:34.302967 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command config help hook 0x5558ff7321b0
-162> 2017-10-09 13:00:34.302968 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command config set hook 0x5558ff7321b0
-161> 2017-10-09 13:00:34.302969 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command config get hook 0x5558ff7321b0
-160> 2017-10-09 13:00:34.302971 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command config diff hook 0x5558ff7321b0
-159> 2017-10-09 13:00:34.302972 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command config diff get hook 0x5558ff7321b0
-158> 2017-10-09 13:00:34.302975 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command log flush hook 0x5558ff7321b0
-157> 2017-10-09 13:00:34.302979 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command log dump hook 0x5558ff7321b0
-156> 2017-10-09 13:00:34.302981 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command log reopen hook 0x5558ff7321b0
-155> 2017-10-09 13:00:34.302988 7fae5ddeef80 5 asok(0x5558ff7af2c0)
register_command dump_mempools hook 0x5558ffa33188
-154> 2017-10-09 13:00:34.307728 7fae5ddeef80 0 set uid:gid to
64045:64045 (ceph:ceph)
-153> 2017-10-09 13:00:34.307738 7fae5ddeef80 0 ceph version 12.2.1
(fc129ad90a65dc0b419412e77cb85ac230da42a6) luminous (stable), process
(unknown), pid 20268
-152> 2017-10-09 13:00:34.307802 7fae5ddeef80 0 pidfile_write: ignore
empty --pid-file
-151> 2017-10-09 13:00:34.314461 7fae5ddeef80 0 load: jerasure load:
lrc load: isa
-150> 2017-10-09 13:00:34.315570 7fae5ddeef80 1 leveldb: Recovering
log #73065
-149> 2017-10-09 13:00:34.369901 7fae5ddeef80 1 leveldb: Level-0
table #73083: started
-148> 2017-10-09 13:00:34.469844 7fae5ddeef80 1 leveldb: Level-0
table #73083: 27766571 bytes OK
-147> 2017-10-09 13:00:34.474225 7fae5ddeef80 1 leveldb: Delete
type=0 #73065
-146> 2017-10-09 13:00:34.479041 7fae5ddeef80 1 leveldb: Delete
type=3 #72279
-145> 2017-10-09 13:00:34.479496 7fae5ddeef80 1 RDMAStack RDMAStack
ms_async_rdma_enable_hugepage value is: 0
-144> 2017-10-09 13:00:34.479509 7fae5ddeef80 20 RDMAStack RDMAStack
constructing RDMAStack...
-143> 2017-10-09 13:00:34.479521 7fae5ddeef80 20 RDMAStack creating
RDMAStack:0x5558ff76fe70 with dispatcher:0x5558ff7b48c0
-142> 2017-10-09 13:00:34.479558 7fae545dd700 2 Event(0x5558ffa98600
nevent=5000 time_id=1).set_owner idx=0 owner=140386716473088
-141> 2017-10-09 13:00:34.479574 7fae545dd700 20 Event(0x5558ffa98600
nevent=5000 time_id=1).create_file_event create event started fd=11
mask=1 original mask is 0
-140> 2017-10-09 13:00:34.479576 7fae545dd700 20 EpollDriver.add_event
add event fd=11 cur_mask=0 add_mask=1 to 10
-139> 2017-10-09 13:00:34.479581 7fae545dd700 20 Event(0x5558ffa98600
nevent=5000 time_id=1).create_file_event create event end fd=11 mask=1
original mask is 1
-138> 2017-10-09 13:00:34.479582 7fae545dd700 10 stack operator() starting
-137> 2017-10-09 13:00:34.479584 7fae53ddc700 2 Event(0x5558ffa988c0
nevent=5000 time_id=1).set_owner idx=1 owner=140386708080384
-136> 2017-10-09 13:00:34.479592 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event started fd=14
mask=1 original mask is 0
-135> 2017-10-09 13:00:34.479595 7fae53ddc700 20 EpollDriver.add_event
add event fd=14 cur_mask=0 add_mask=1 to 13
-134> 2017-10-09 13:00:34.479598 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event end fd=14 mask=1
original mask is 1
-133> 2017-10-09 13:00:34.479601 7fae53ddc700 10 stack operator() starting
-132> 2017-10-09 13:00:34.479600 7fae535db700 2 Event(0x5558ffa98b80
nevent=5000 time_id=1).set_owner idx=2 owner=140386699687680
-131> 2017-10-09 13:00:34.479603 7fae535db700 20 Event(0x5558ffa98b80
nevent=5000 time_id=1).create_file_event create event started fd=17
mask=1 original mask is 0
-130> 2017-10-09 13:00:34.479604 7fae535db700 20 EpollDriver.add_event
add event fd=17 cur_mask=0 add_mask=1 to 16
-129> 2017-10-09 13:00:34.479606 7fae535db700 20 Event(0x5558ffa98b80
nevent=5000 time_id=1).create_file_event create event end fd=17 mask=1
original mask is 1
-128> 2017-10-09 13:00:34.479607 7fae535db700 10 stack operator() starting
-127> 2017-10-09 13:00:34.479625 7fae5ddeef80 0 starting mon.3 rank 3
at public addr 192.168.100.144:6789/0 at bind addr
192.168.100.144:6789/0 mon_data /var/lib/ceph/mon/ceph-3 fsid
cb0aba69-bad9-4d30-b163-c19f0fd1ec53
-126> 2017-10-09 13:00:34.479636 7fae5ddeef80 10 -- - bind bind
192.168.100.144:6789/0
-125> 2017-10-09 13:00:34.479638 7fae5ddeef80 10 -- - bind Network
Stack is not ready for bind yet - postponed
-124> 2017-10-09 13:00:34.479645 7fae5ddeef80 0 starting mon.3 rank 3
at 192.168.100.144:6789/0 mon_data /var/lib/ceph/mon/ceph-3 fsid
cb0aba69-bad9-4d30-b163-c19f0fd1ec53
-123> 2017-10-09 13:00:34.480237 7fae5ddeef80 0 mon.3 at -1(probing).mds
e1 print_map
e1
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file
layout v2}
legacy client fscid: -1
No filesystems configured
-122> 2017-10-09 13:00:34.480408 7fae5ddeef80 0 mon.3 at -1(probing).osd
e12305 crush map has features 283675107524608, adjusting msgr requires
-121> 2017-10-09 13:00:34.480415 7fae5ddeef80 0 mon.3 at -1(probing).osd
e12305 crush map has features 283675107524608, adjusting msgr requires
-120> 2017-10-09 13:00:34.480417 7fae5ddeef80 0 mon.3 at -1(probing).osd
e12305 crush map has features 720859615486820352, adjusting msgr requires
-119> 2017-10-09 13:00:34.480418 7fae5ddeef80 0 mon.3 at -1(probing).osd
e12305 crush map has features 283675107524608, adjusting msgr requires
-118> 2017-10-09 13:00:34.481682 7fae5ddeef80 4 mgrc handle_mgr_map
Got map version 458808
-117> 2017-10-09 13:00:34.481687 7fae5ddeef80 4 mgrc handle_mgr_map
Active mgr is now 192.168.100.144:6828/21406
-116> 2017-10-09 13:00:34.481689 7fae5ddeef80 4 mgrc reconnect
Starting new session with 192.168.100.144:6828/21406
-115> 2017-10-09 13:00:34.481692 7fae5ddeef80 10 -- - create_connect
192.168.100.144:6828/21406, creating connection and registering
-114> 2017-10-09 13:00:34.481697 7fae5ddeef80 10 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_NONE pgs=0
cs=0 l=0)._connect csq=0
-113> 2017-10-09 13:00:34.481701 7fae5ddeef80 20 Event(0x5558ffa98b80
nevent=5000 time_id=1).wakeup
-112> 2017-10-09 13:00:34.481706 7fae5ddeef80 10 -- - get_connection
mgr.17664152 192.168.100.144:6828/21406 new 0x555900b4b800
-111> 2017-10-09 13:00:34.481709 7fae5ddeef80 1 -- - -->
192.168.100.144:6828/21406 -- mgropen(unknown.3) v2 -- 0x5558ffa99080 con 0
-110> 2017-10-09 13:00:34.481713 7fae5ddeef80 15 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_CONNECTING
pgs=0 cs=0 l=0).send_message inline write is denied, reschedule
m=0x5558ffa99080
-109> 2017-10-09 13:00:34.481740 7fae535db700 20 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_CONNECTING
pgs=0 cs=0 l=0).process prev state is STATE_CONNECTING
-108> 2017-10-09 13:00:34.481855 7fae5ddeef80 1 -- - start start
-107> 2017-10-09 13:00:34.481859 7fae5ddeef80 1 -- - start start
-106> 2017-10-09 13:00:34.482042 7fae5ddeef80 10 -- - ready -
-105> 2017-10-09 13:00:34.482045 7fae5ddeef80 10 -- - bind bind
192.168.100.144:6789/0
-104> 2017-10-09 13:00:34.482048 7fae5ddeef80 10 Processor -- bind
-103> 2017-10-09 13:00:34.482049 7fae5ddeef80 20 Event(0x5558ffa98600
nevent=5000 time_id=1).wakeup
-102> 2017-10-09 13:00:34.482427 7fae535db700 10 Infiniband
binding_port port 1 is not what we want. state: 4)
-101> 2017-10-09 13:00:34.482522 7fae535db700 1 Infiniband
binding_port found active port 2
-100> 2017-10-09 13:00:34.482530 7fae535db700 1 Infiniband init
assigning: 1024 receive buffers
-99> 2017-10-09 13:00:34.482532 7fae535db700 1 Infiniband init
assigning: 1024 send buffers
-98> 2017-10-09 13:00:34.482533 7fae535db700 1 Infiniband init
device allow 4194303 completion entries
-97> 2017-10-09 13:00:34.565741 7fae535db700 20 Infiniband init started.
-96> 2017-10-09 13:00:34.565783 7fae535db700 20 Infiniband init started.
-95> 2017-10-09 13:00:34.567858 7fae535db700 20 Infiniband init
successfully create cq=0x5558ff77ccb0
-94> 2017-10-09 13:00:34.569959 7fae535db700 20 Infiniband init
successfully create cq=0x5558ff77cd80
-93> 2017-10-09 13:00:34.570039 7fae535db700 20 Infiniband init started.
-92> 2017-10-09 13:00:34.570050 7fae4eb2c700 20 RDMAStack polling
going to poll tx cq: 0x555900e7c540 rx cq: 0x555900e7c570
-91> 2017-10-09 13:00:34.570088 7fae545dd700 20 RDMAServerSocketImpl
listen bind to 192.168.100.144:6789 on port 6789
-90> 2017-10-09 13:00:34.570130 7fae535db700 20 Infiniband init
successfully create queue pair: qp=0x5558ff7caf00
-89> 2017-10-09 13:00:34.570145 7fae5ddeef80 10 Processor -- bind
bound to 192.168.100.144:6789/0
-88> 2017-10-09 13:00:34.570169 7fae5ddeef80 1 --
192.168.100.144:6789/0 learned_addr learned my addr 192.168.100.144:6789/0
-87> 2017-10-09 13:00:34.570173 7fae5ddeef80 1 --
192.168.100.144:6789/0 _finish_bind bind my_inst.addr is
192.168.100.144:6789/0
-86> 2017-10-09 13:00:34.570177 7fae5ddeef80 1 Processor -- start
-85> 2017-10-09 13:00:34.570179 7fae5ddeef80 20 Event(0x5558ffa98600
nevent=5000 time_id=1).wakeup
-84> 2017-10-09 13:00:34.570194 7fae545dd700 20 Event(0x5558ffa98600
nevent=5000 time_id=1).create_file_event create event started fd=23
mask=1 original mask is 0
-83> 2017-10-09 13:00:34.570202 7fae545dd700 20 EpollDriver.add_event
add event fd=23 cur_mask=0 add_mask=1 to 10
-82> 2017-10-09 13:00:34.570206 7fae545dd700 20 Event(0x5558ffa98600
nevent=5000 time_id=1).create_file_event create event end fd=23 mask=1
original mask is 1
-81> 2017-10-09 13:00:34.570308 7fae5ddeef80 10 -- - ready -
-80> 2017-10-09 13:00:34.570317 7fae5ddeef80 1 Processor -- start
-79> 2017-10-09 13:00:34.570347 7fae535db700 20 Infiniband init
successfully change queue pair to INIT: qp=0x5558ff7caf00
-78> 2017-10-09 13:00:34.570360 7fae5ddeef80 0 mon.3 at -1(probing) e5
my rank is now 3 (was -1)
-77> 2017-10-09 13:00:34.570362 7fae535db700 20
RDMAConnectedSocketImpl try_connect nonblock:1, nodelay:1, rbuf_size: 0
-76> 2017-10-09 13:00:34.570363 7fae5ddeef80 1 --
192.168.100.144:6789/0 shutdown_connections
-75> 2017-10-09 13:00:34.570379 7fae5ddeef80 1 --
192.168.100.144:6789/0 _send_message--> mon.0 192.168.100.141:6789/0 --
mon_probe(probe cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- ?+0
0x5558ff7c4900
-74> 2017-10-09 13:00:34.570392 7fae5ddeef80 20 --
192.168.100.144:6789/0 submit_message mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 remote,
192.168.100.141:6789/0, new connection.
-73> 2017-10-09 13:00:34.570385 7fae535db700 10 NetHandler
generic_connect connect: (111) Connection refused
-72> 2017-10-09 13:00:34.570395 7fae5ddeef80 10 --
192.168.100.144:6789/0 create_connect 192.168.100.141:6789/0, creating
connection and registering
-71> 2017-10-09 13:00:34.570400 7fae535db700 1 RDMAStack connect try
connecting failed.
-70> 2017-10-09 13:00:34.570403 7fae535db700 20
RDMAConnectedSocketImpl ~RDMAConnectedSocketImpl destruct.
-69> 2017-10-09 13:00:34.570410 7fae4eb2c700 10 RDMAStack polling
finally delete qp=0x55590059d480
-68> 2017-10-09 13:00:34.570414 7fae4eb2c700 20 Infiniband ~QueuePair
destroy qp=0x5558ff7caf00
-67> 2017-10-09 13:00:34.570408 7fae5ddeef80 10 --
192.168.100.144:6789/0 >> 192.168.100.141:6789/0 conn(0x555900b4a000 :-1
s=STATE_NONE pgs=0 cs=0 l=0)._connect csq=0
-66> 2017-10-09 13:00:34.570418 7fae5ddeef80 20 Event(0x5558ffa98600
nevent=5000 time_id=1).wakeup
-65> 2017-10-09 13:00:34.570416 7fae535db700 10 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_CONNECTING
pgs=0 cs=0 l=0).fault waiting 0.200000
-64> 2017-10-09 13:00:34.570425 7fae545dd700 20 --
192.168.100.144:6789/0 >> 192.168.100.141:6789/0 conn(0x555900b4a000 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).process prev state is STATE_CONNECTING
-63> 2017-10-09 13:00:34.570423 7fae5ddeef80 1 --
192.168.100.144:6789/0 --> 192.168.100.141:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- 0x5558ff7c4900 con 0
-62> 2017-10-09 13:00:34.570434 7fae535db700 10 -- - >>
192.168.100.144:6828/21406 conn(0x555900b4b800 :-1 s=STATE_CONNECTING
pgs=0 cs=0 l=0).handle_write
-61> 2017-10-09 13:00:34.570437 7fae545dd700 20 Infiniband init started.
-60> 2017-10-09 13:00:34.570436 7fae5ddeef80 15 --
192.168.100.144:6789/0 >> 192.168.100.141:6789/0 conn(0x555900b4a000 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).send_message inline write is denied,
reschedule m=0x5558ff7c4900
-59> 2017-10-09 13:00:34.570440 7fae5ddeef80 20 Event(0x5558ffa98600
nevent=5000 time_id=1).wakeup
-58> 2017-10-09 13:00:34.570443 7fae5ddeef80 1 --
192.168.100.144:6789/0 _send_message--> mon.1 192.168.100.142:6789/0 --
mon_probe(probe cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- ?+0
0x5558ff7c4600
-57> 2017-10-09 13:00:34.570446 7fae5ddeef80 20 --
192.168.100.144:6789/0 submit_message mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 remote,
192.168.100.142:6789/0, new connection.
-56> 2017-10-09 13:00:34.570448 7fae5ddeef80 10 --
192.168.100.144:6789/0 create_connect 192.168.100.142:6789/0, creating
connection and registering
-55> 2017-10-09 13:00:34.570454 7fae5ddeef80 10 --
192.168.100.144:6789/0 >> 192.168.100.142:6789/0 conn(0x555900ade800 :-1
s=STATE_NONE pgs=0 cs=0 l=0)._connect csq=0
-54> 2017-10-09 13:00:34.570456 7fae5ddeef80 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).wakeup
-53> 2017-10-09 13:00:34.570462 7fae5ddeef80 1 --
192.168.100.144:6789/0 --> 192.168.100.142:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- 0x5558ff7c4600 con 0
-52> 2017-10-09 13:00:34.570467 7fae5ddeef80 15 --
192.168.100.144:6789/0 >> 192.168.100.142:6789/0 conn(0x555900ade800 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).send_message inline write is denied,
reschedule m=0x5558ff7c4600
-51> 2017-10-09 13:00:34.570470 7fae5ddeef80 1 --
192.168.100.144:6789/0 _send_message--> mon.2 192.168.100.143:6789/0 --
mon_probe(probe cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- ?+0
0x5558ff7c4c00
-50> 2017-10-09 13:00:34.570473 7fae5ddeef80 20 --
192.168.100.144:6789/0 submit_message mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 remote,
192.168.100.143:6789/0, new connection.
-49> 2017-10-09 13:00:34.570475 7fae5ddeef80 10 --
192.168.100.144:6789/0 create_connect 192.168.100.143:6789/0, creating
connection and registering
-48> 2017-10-09 13:00:34.570481 7fae5ddeef80 10 --
192.168.100.144:6789/0 >> 192.168.100.143:6789/0 conn(0x555900add000 :-1
s=STATE_NONE pgs=0 cs=0 l=0)._connect csq=0
-47> 2017-10-09 13:00:34.570474 7fae53ddc700 20 --
192.168.100.144:6789/0 >> 192.168.100.142:6789/0 conn(0x555900ade800 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).process prev state is STATE_CONNECTING
-46> 2017-10-09 13:00:34.570485 7fae5ddeef80 20 Event(0x5558ffa98b80
nevent=5000 time_id=2).wakeup
-45> 2017-10-09 13:00:34.570490 7fae53ddc700 20 Infiniband init started.
-44> 2017-10-09 13:00:34.570491 7fae5ddeef80 1 --
192.168.100.144:6789/0 --> 192.168.100.143:6789/0 -- mon_probe(probe
cb0aba69-bad9-4d30-b163-c19f0fd1ec53 name 3) v6 -- 0x5558ff7c4c00 con 0
-43> 2017-10-09 13:00:34.570493 7fae5ddeef80 15 --
192.168.100.144:6789/0 >> 192.168.100.143:6789/0 conn(0x555900add000 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).send_message inline write is denied,
reschedule m=0x5558ff7c4c00
-42> 2017-10-09 13:00:34.570504 7fae535db700 20 --
192.168.100.144:6789/0 >> 192.168.100.143:6789/0 conn(0x555900add000 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).process prev state is STATE_CONNECTING
-41> 2017-10-09 13:00:34.570520 7fae535db700 20 Infiniband init started.
-40> 2017-10-09 13:00:34.570616 7fae545dd700 20 Infiniband init
successfully create queue pair: qp=0x5558ff7cb2c0
-39> 2017-10-09 13:00:34.570664 7fae53ddc700 20 Infiniband init
successfully create queue pair: qp=0x5558ff7cb400
-38> 2017-10-09 13:00:34.570676 7fae535db700 20 Infiniband init
successfully create queue pair: qp=0x5558ff7cb540
-37> 2017-10-09 13:00:34.570861 7fae545dd700 20 Infiniband init
successfully change queue pair to INIT: qp=0x5558ff7cb2c0
-36> 2017-10-09 13:00:34.570884 7fae545dd700 20
RDMAConnectedSocketImpl try_connect nonblock:1, nodelay:1, rbuf_size: 0
-35> 2017-10-09 13:00:34.570945 7fae545dd700 10 NetHandler
generic_connect connect: (111) Connection refused
-34> 2017-10-09 13:00:34.570956 7fae545dd700 1 RDMAStack connect try
connecting failed.
-33> 2017-10-09 13:00:34.570959 7fae545dd700 20
RDMAConnectedSocketImpl ~RDMAConnectedSocketImpl destruct.
-32> 2017-10-09 13:00:34.570969 7fae4eb2c700 10 RDMAStack polling
finally delete qp=0x55590059d560
-31> 2017-10-09 13:00:34.570969 7fae545dd700 10 --
192.168.100.144:6789/0 >> 192.168.100.141:6789/0 conn(0x555900b4a000 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).fault waiting 0.200000
-30> 2017-10-09 13:00:34.570974 7fae4eb2c700 20 Infiniband ~QueuePair
destroy qp=0x5558ff7cb2c0
-29> 2017-10-09 13:00:34.570981 7fae545dd700 10 --
192.168.100.144:6789/0 >> 192.168.100.141:6789/0 conn(0x555900b4a000 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).handle_write
-28> 2017-10-09 13:00:34.570999 7fae53ddc700 20 Infiniband init
successfully change queue pair to INIT: qp=0x5558ff7cb400
-27> 2017-10-09 13:00:34.571102 7fae535db700 20 Infiniband init
successfully change queue pair to INIT: qp=0x5558ff7cb540
-26> 2017-10-09 13:00:34.571244 7fae4eb2c700 20 Infiniband
rearm_notify started.
-25> 2017-10-09 13:00:34.571265 7fae4eb2c700 20 Infiniband
rearm_notify started.
-24> 2017-10-09 13:00:34.571274 7fae53ddc700 20
RDMAConnectedSocketImpl try_connect nonblock:1, nodelay:1, rbuf_size: 0
-23> 2017-10-09 13:00:34.571278 7fae535db700 20
RDMAConnectedSocketImpl try_connect nonblock:1, nodelay:1, rbuf_size: 0
-22> 2017-10-09 13:00:34.571378 7fae53ddc700 20
RDMAConnectedSocketImpl try_connect tcp_fd: 35
-21> 2017-10-09 13:00:34.571390 7fae53ddc700 10 Infiniband send_msg
sending: 0, 4543, 5515815, 0, fe80000000000000268a07fffee25d81
-20> 2017-10-09 13:00:34.571396 7fae535db700 10 NetHandler
generic_connect connect: (111) Connection refused
-19> 2017-10-09 13:00:34.571402 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event started fd=35
mask=1 original mask is 0
-18> 2017-10-09 13:00:34.571406 7fae53ddc700 20 EpollDriver.add_event
add event fd=35 cur_mask=0 add_mask=1 to 13
-17> 2017-10-09 13:00:34.571411 7fae535db700 1 RDMAStack connect try
connecting failed.
-16> 2017-10-09 13:00:34.571409 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event end fd=35 mask=1
original mask is 1
-15> 2017-10-09 13:00:34.571415 7fae535db700 20
RDMAConnectedSocketImpl ~RDMAConnectedSocketImpl destruct.
-14> 2017-10-09 13:00:34.571416 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event started fd=32
mask=1 original mask is 0
-13> 2017-10-09 13:00:34.571419 7fae53ddc700 20 EpollDriver.add_event
add event fd=32 cur_mask=0 add_mask=1 to 13
-12> 2017-10-09 13:00:34.571421 7fae53ddc700 20 Event(0x5558ffa988c0
nevent=5000 time_id=1).create_file_event create event end fd=32 mask=1
original mask is 1
-11> 2017-10-09 13:00:34.571419 7fae535db700 10 --
192.168.100.144:6789/0 >> 192.168.100.143:6789/0 conn(0x555900add000 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).fault waiting 0.200000
-10> 2017-10-09 13:00:34.571423 7fae53ddc700 20 --
192.168.100.144:6789/0 >> 192.168.100.142:6789/0 conn(0x555900ade800 :-1
s=STATE_CONNECTING_RE pgs=0 cs=0 l=0).process prev state is STATE_CONNECTING
-9> 2017-10-09 13:00:34.571426 7fae535db700 10 --
192.168.100.144:6789/0 >> 192.168.100.143:6789/0 conn(0x555900add000 :-1
s=STATE_CONNECTING pgs=0 cs=0 l=0).handle_write
-8> 2017-10-09 13:00:34.571430 7fae53ddc700 10 --
192.168.100.144:6789/0 >> 192.168.100.142:6789/0 conn(0x555900ade800 :-1
s=STATE_CONNECTING_RE pgs=0 cs=0 l=0)._process_connection nonblock
connect inprogress
-7> 2017-10-09 13:00:34.571436 7fae53ddc700 10 --
192.168.100.144:6789/0 >> 192.168.100.142:6789/0 conn(0x555900ade800 :-1
s=STATE_CONNECTING_RE pgs=0 cs=0 l=0).handle_write
-6> 2017-10-09 13:00:34.571994 7fae53ddc700 20
RDMAConnectedSocketImpl handle_connection QP: 4543 tcp_fd: 35 notify_fd: 32
-5> 2017-10-09 13:00:34.572010 7fae53ddc700 5 Infiniband recv_msg
recevd: 0, 3643, 3062654, 0, fe80000000000000268a07fffef98731
-4> 2017-10-09 13:00:34.572014 7fae53ddc700 20
RDMAConnectedSocketImpl handle_connection peer msg : < 3643, 3062654, 0, 0>
-3> 2017-10-09 13:00:34.572016 7fae53ddc700 20
RDMAConnectedSocketImpl activate Choosing gid_index 24, sl 3
-2> 2017-10-09 13:00:34.572021 7fae53ddc700 20
RDMAConnectedSocketImpl activate transition to RTR state successfully.
-1> 2017-10-09 13:00:34.572022 7fae53ddc700 -1
RDMAConnectedSocketImpl activate failed to transition to RTS state: (22)
Invalid argument
0> 2017-10-09 13:00:34.573257 7fae53ddc700 -1
/home/builder/source/ceph-12.2.1/src/msg/async/rdma/RDMAConnectedSocketImpl.cc:
In function 'void RDMAConnectedSocketImpl::handle_connection()' thread
7fae53ddc700 time 2017-10-09 13:00:34.572032
/home/builder/source/ceph-12.2.1/src/msg/async/rdma/RDMAConnectedSocketImpl.cc:
221: FAILED assert(!r)
ceph version 12.2.1 (fc129ad90a65dc0b419412e77cb85ac230da42a6) luminous
(stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x5558f5c809f2]
2: (RDMAConnectedSocketImpl::handle_connection()+0xb4a) [0x5558f5fd656a]
3: (EventCenter::process_events(int, std::chrono::duration<unsigned
long, std::ratio<1l, 1000000000l> >*)+0xa08) [0x5558f5d46028]
4: (()+0x697128) [0x5558f5d4a128]
5: (()+0xb9e6f) [0x7fae5af0fe6f]
6: (()+0x7494) [0x7fae5d19e494]
7: (clone()+0x3f) [0x7fae5a684aff]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--
Gerhard W. Recher
net4sec UG (haftungsbeschränkt)
Leitenweg 6
86929 Penzing
+49 171 4802507
More information about the pve-devel
mailing list