[PVE-User] Ceph Cluster with proxmox failure

Discussion:

Gilberto Nunes

2018-09-28 19:49:48 UTC

Hi there
I have a 6 server Ceph Cluster maded with proxmox 5.2
Suddenly, after power failure, I have only 3 servers UP, but even with 3
server, Ceph Cluster doesn't work.
pveceph status give me a timeout
pveceph status got timeout

Any advice?

---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Woods, Ken A (DNR)

2018-09-28 20:02:42 UTC

Permalink

Network issues?
Time issues?

> On Sep 28, 2018, at 11:50, Gilberto Nunes <***@gmail.com> wrote:
>
> Hi there
> I have a 6 server Ceph Cluster maded with proxmox 5.2
> Suddenly, after power failure, I have only 3 servers UP, but even with 3
> server, Ceph Cluster doesn't work.
> pveceph status give me a timeout
> pveceph status got timeout
>
> Any advice?
>
>
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=9BUtASlCjgOxfXJ55imNPY2xKhHLVcP7E8Klu3Ddc0E&s=XjLyjlncigLbXaMeQ-15anc5Wjnneqwz-n1VWZ_zj_E&e=

Gilberto Nunes

2018-09-28 20:07:54 UTC

Permalink

pve-ceph01:~# ssh pve-ceph01 date
Fri Sep 28 17:06:34 -03 2018
pve-ceph01:~# ssh pve-ceph02 date
Fri Sep 28 17:06:37 -03 2018
pve-ceph01:~# ssh pve-ceph05 date
Fri Sep 28 17:06:39 -03 2018

pve-ceph01:~# ping -c 1 pve-ceph01
PING pve-ceph01.cepam.com.br (10.10.10.100) 56(84) bytes of data.
64 bytes from pve-ceph01.cepam.com.br (10.10.10.100): icmp_seq=1 ttl=64
time=0.020 ms

--- pve-ceph01.cepam.com.br ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.020/0.020/0.020/0.000 ms
pve-ceph01:~# ping -c 1 pve-ceph02
once.
PING pve-ceph02.cepam.com.br (10.10.10.110) 56(84) bytes of data.
64 bytes from pve-ceph02.cepam.com.br (10.10.10.110): icmp_seq=1 ttl=64
time=0.120 ms

--- pve-ceph02.cepam.com.br ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.120/0.120/0.120/0.000 ms
pve-ceph01:~# ping -c 1 pve-ceph05
PING pve-ceph05.cepam.com.br (10.10.10.140) 56(84) bytes of data.
64 bytes from pve-ceph05.cepam.com.br (10.10.10.140): icmp_seq=1 ttl=64
time=0.078 ms

--- pve-ceph05.cepam.com.br ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.078/0.078/0.078/0.000 ms

I can communicate with other...
ceph command stuck
all ceph service it's appears to be running...

---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Em sex, 28 de set de 2018 às 17:02, Woods, Ken A (DNR) <***@alaska.gov>
escreveu:

> Network issues?
> Time issues?
>
>
> > On Sep 28, 2018, at 11:50, Gilberto Nunes <***@gmail.com>
> wrote:
> >
> > Hi there
> > I have a 6 server Ceph Cluster maded with proxmox 5.2
> > Suddenly, after power failure, I have only 3 servers UP, but even with 3
> > server, Ceph Cluster doesn't work.
> > pveceph status give me a timeout
> > pveceph status got timeout
> >
> > Any advice?
> >
> >
> > ---
> > Gilberto Nunes Ferreira
> >
> > (47) 3025-5907
> > (47) 99676-7530 - Whatsapp / Telegram
> >
> > Skype: gilberto.nunes36
> > _______________________________________________
> > pve-user mailing list
> > pve-***@pve.proxmox.com
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=9BUtASlCjgOxfXJ55imNPY2xKhHLVcP7E8Klu3Ddc0E&s=XjLyjlncigLbXaMeQ-15anc5Wjnneqwz-n1VWZ_zj_E&e=
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>

Gilberto Nunes

2018-09-28 20:08:28 UTC

Permalink

Oh and cluster is up and running too!

---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Em sex, 28 de set de 2018 às 17:07, Gilberto Nunes <
***@gmail.com> escreveu:

> pve-ceph01:~# ssh pve-ceph01 date
> Fri Sep 28 17:06:34 -03 2018
> pve-ceph01:~# ssh pve-ceph02 date
> Fri Sep 28 17:06:37 -03 2018
> pve-ceph01:~# ssh pve-ceph05 date
> Fri Sep 28 17:06:39 -03 2018
>
> pve-ceph01:~# ping -c 1 pve-ceph01
> PING pve-ceph01.cepam.com.br (10.10.10.100) 56(84) bytes of data.
> 64 bytes from pve-ceph01.cepam.com.br (10.10.10.100): icmp_seq=1 ttl=64
> time=0.020 ms
>
> --- pve-ceph01.cepam.com.br ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.020/0.020/0.020/0.000 ms
> pve-ceph01:~# ping -c 1 pve-ceph02
> once.
> PING pve-ceph02.cepam.com.br (10.10.10.110) 56(84) bytes of data.
> 64 bytes from pve-ceph02.cepam.com.br (10.10.10.110): icmp_seq=1 ttl=64
> time=0.120 ms
>
> --- pve-ceph02.cepam.com.br ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.120/0.120/0.120/0.000 ms
> pve-ceph01:~# ping -c 1 pve-ceph05
> PING pve-ceph05.cepam.com.br (10.10.10.140) 56(84) bytes of data.
> 64 bytes from pve-ceph05.cepam.com.br (10.10.10.140): icmp_seq=1 ttl=64
> time=0.078 ms
>
> --- pve-ceph05.cepam.com.br ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.078/0.078/0.078/0.000 ms
>
> I can communicate with other...
> ceph command stuck
> all ceph service it's appears to be running...
>
>
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
>
>
>
>
>
> Em sex, 28 de set de 2018 às 17:02, Woods, Ken A (DNR) <
> ***@alaska.gov> escreveu:
>
>> Network issues?
>> Time issues?
>>
>>
>> > On Sep 28, 2018, at 11:50, Gilberto Nunes <***@gmail.com>
>> wrote:
>> >
>> > Hi there
>> > I have a 6 server Ceph Cluster maded with proxmox 5.2
>> > Suddenly, after power failure, I have only 3 servers UP, but even with 3
>> > server, Ceph Cluster doesn't work.
>> > pveceph status give me a timeout
>> > pveceph status got timeout
>> >
>> > Any advice?
>> >
>> >
>> > ---
>> > Gilberto Nunes Ferreira
>> >
>> > (47) 3025-5907
>> > (47) 99676-7530 - Whatsapp / Telegram
>> >
>> > Skype: gilberto.nunes36
>> > _______________________________________________
>> > pve-user mailing list
>> > pve-***@pve.proxmox.com
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=9BUtASlCjgOxfXJ55imNPY2xKhHLVcP7E8Klu3Ddc0E&s=XjLyjlncigLbXaMeQ-15anc5Wjnneqwz-n1VWZ_zj_E&e=
>> _______________________________________________
>> pve-user mailing list
>> pve-***@pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>

Gilberto Nunes

2018-09-28 20:11:02 UTC

Permalink

I get this error

ceph-mon
2018-09-28 17:10:49.189979 7f8588d66100 -1 monitor data directory at
'/var/lib/ceph/mon/ceph-admin' does not exist: have
you run 'mkfs'?

---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Em sex, 28 de set de 2018 às 17:08, Gilberto Nunes <
***@gmail.com> escreveu:

> Oh and cluster is up and running too!
>
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
>
>
>
>
>
> Em sex, 28 de set de 2018 às 17:07, Gilberto Nunes <
> ***@gmail.com> escreveu:
>
>> pve-ceph01:~# ssh pve-ceph01 date
>> Fri Sep 28 17:06:34 -03 2018
>> pve-ceph01:~# ssh pve-ceph02 date
>> Fri Sep 28 17:06:37 -03 2018
>> pve-ceph01:~# ssh pve-ceph05 date
>> Fri Sep 28 17:06:39 -03 2018
>>
>> pve-ceph01:~# ping -c 1 pve-ceph01
>> PING pve-ceph01.cepam.com.br (10.10.10.100) 56(84) bytes of data.
>> 64 bytes from pve-ceph01.cepam.com.br (10.10.10.100): icmp_seq=1 ttl=64
>> time=0.020 ms
>>
>> --- pve-ceph01.cepam.com.br ping statistics ---
>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> rtt min/avg/max/mdev = 0.020/0.020/0.020/0.000 ms
>> pve-ceph01:~# ping -c 1 pve-ceph02
>> once.
>> PING pve-ceph02.cepam.com.br (10.10.10.110) 56(84) bytes of data.
>> 64 bytes from pve-ceph02.cepam.com.br (10.10.10.110): icmp_seq=1 ttl=64
>> time=0.120 ms
>>
>> --- pve-ceph02.cepam.com.br ping statistics ---
>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> rtt min/avg/max/mdev = 0.120/0.120/0.120/0.000 ms
>> pve-ceph01:~# ping -c 1 pve-ceph05
>> PING pve-ceph05.cepam.com.br (10.10.10.140) 56(84) bytes of data.
>> 64 bytes from pve-ceph05.cepam.com.br (10.10.10.140): icmp_seq=1 ttl=64
>> time=0.078 ms
>>
>> --- pve-ceph05.cepam.com.br ping statistics ---
>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> rtt min/avg/max/mdev = 0.078/0.078/0.078/0.000 ms
>>
>> I can communicate with other...
>> ceph command stuck
>> all ceph service it's appears to be running...
>>
>>
>> ---
>> Gilberto Nunes Ferreira
>>
>> (47) 3025-5907
>> (47) 99676-7530 - Whatsapp / Telegram
>>
>> Skype: gilberto.nunes36
>>
>>
>>
>>
>>
>> Em sex, 28 de set de 2018 às 17:02, Woods, Ken A (DNR) <
>> ***@alaska.gov> escreveu:
>>
>>> Network issues?
>>> Time issues?
>>>
>>>
>>> > On Sep 28, 2018, at 11:50, Gilberto Nunes <***@gmail.com>
>>> wrote:
>>> >
>>> > Hi there
>>> > I have a 6 server Ceph Cluster maded with proxmox 5.2
>>> > Suddenly, after power failure, I have only 3 servers UP, but even with
>>> 3
>>> > server, Ceph Cluster doesn't work.
>>> > pveceph status give me a timeout
>>> > pveceph status got timeout
>>> >
>>> > Any advice?
>>> >
>>> >
>>> > ---
>>> > Gilberto Nunes Ferreira
>>> >
>>> > (47) 3025-5907
>>> > (47) 99676-7530 - Whatsapp / Telegram
>>> >
>>> > Skype: gilberto.nunes36
>>> > _______________________________________________
>>> > pve-user mailing list
>>> > pve-***@pve.proxmox.com
>>> >
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=9BUtASlCjgOxfXJ55imNPY2xKhHLVcP7E8Klu3Ddc0E&s=XjLyjlncigLbXaMeQ-15anc5Wjnneqwz-n1VWZ_zj_E&e=
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-***@pve.proxmox.com
>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>
>>

Woods, Ken A (DNR)

2018-09-28 20:13:18 UTC

Permalink

So the times are massively different. Fix that.

And corosync needs multicast , and that’s tested using omping, not ping.

Go back through the initial set up documentation and read what is required for the basic network configuration.

If corosync and ceph are both not working, start there.

> On Sep 28, 2018, at 12:08, Gilberto Nunes <***@gmail.com> wrote:
>
> pve-ceph01:~# ssh pve-ceph01 date
> Fri Sep 28 17:06:34 -03 2018
> pve-ceph01:~# ssh pve-ceph02 date
> Fri Sep 28 17:06:37 -03 2018
> pve-ceph01:~# ssh pve-ceph05 date
> Fri Sep 28 17:06:39 -03 2018
>
> pve-ceph01:~# ping -c 1 pve-ceph01
> PING pve-ceph01.cepam.com.br (10.10.10.100) 56(84) bytes of data.
> 64 bytes from pve-ceph01.cepam.com.br (10.10.10.100): icmp_seq=1 ttl=64
> time=0.020 ms
>
> --- pve-ceph01.cepam.com.br ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.020/0.020/0.020/0.000 ms
> pve-ceph01:~# ping -c 1 pve-ceph02
> once.
> PING pve-ceph02.cepam.com.br (10.10.10.110) 56(84) bytes of data.
> 64 bytes from pve-ceph02.cepam.com.br (10.10.10.110): icmp_seq=1 ttl=64
> time=0.120 ms
>
> --- pve-ceph02.cepam.com.br ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.120/0.120/0.120/0.000 ms
> pve-ceph01:~# ping -c 1 pve-ceph05
> PING pve-ceph05.cepam.com.br (10.10.10.140) 56(84) bytes of data.
> 64 bytes from pve-ceph05.cepam.com.br (10.10.10.140): icmp_seq=1 ttl=64
> time=0.078 ms
>
> --- pve-ceph05.cepam.com.br ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.078/0.078/0.078/0.000 ms
>
> I can communicate with other...
> ceph command stuck
> all ceph service it's appears to be running...
>
>
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
>
>
>
>
>
> Em sex, 28 de set de 2018 às 17:02, Woods, Ken A (DNR) <***@alaska.gov>
> escreveu:
>
>> Network issues?
>> Time issues?
>>
>>
>>> On Sep 28, 2018, at 11:50, Gilberto Nunes <***@gmail.com>
>> wrote:
>>>
>>> Hi there
>>> I have a 6 server Ceph Cluster maded with proxmox 5.2
>>> Suddenly, after power failure, I have only 3 servers UP, but even with 3
>>> server, Ceph Cluster doesn't work.
>>> pveceph status give me a timeout
>>> pveceph status got timeout
>>>
>>> Any advice?
>>>
>>>
>>> ---
>>> Gilberto Nunes Ferreira
>>>
>>> (47) 3025-5907
>>> (47) 99676-7530 - Whatsapp / Telegram
>>>
>>> Skype: gilberto.nunes36
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-***@pve.proxmox.com
>>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=9BUtASlCjgOxfXJ55imNPY2xKhHLVcP7E8Klu3Ddc0E&s=XjLyjlncigLbXaMeQ-15anc5Wjnneqwz-n1VWZ_zj_E&e=
>> _______________________________________________
>> pve-user mailing list
>> pve-***@pve.proxmox.com
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=hPQWd4X3X9m5HCF_f-fftckCIT3Ix1hGUcOW__0w32A&s=rpDVR25KfEMwiCsK3aj5RrfERa2_hxyOHkuuxfgUYho&e=
>>
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=hPQWd4X3X9m5HCF_f-fftckCIT3Ix1hGUcOW__0w32A&s=rpDVR25KfEMwiCsK3aj5RrfERa2_hxyOHkuuxfgUYho&e=

Gilberto Nunes

2018-09-28 20:19:05 UTC

Permalink

Everything is working couple of hours ago!
Due a power failure, 3 hosts are down, but I suppose with 3 host the
cluster garantee the quorum and allow ceph continues to work.
This already happen before and ceph works perfectly with 3 servers

---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Em sex, 28 de set de 2018 às 17:13, Woods, Ken A (DNR) <***@alaska.gov>
escreveu:

> So the times are massively different. Fix that.
>
> And corosync needs multicast , and that’s tested using omping, not ping.
>
> Go back through the initial set up documentation and read what is required
> for the basic network configuration.
>
> If corosync and ceph are both not working, start there.
>
>
> > On Sep 28, 2018, at 12:08, Gilberto Nunes <***@gmail.com>
> wrote:
> >
> > pve-ceph01:~# ssh pve-ceph01 date
> > Fri Sep 28 17:06:34 -03 2018
> > pve-ceph01:~# ssh pve-ceph02 date
> > Fri Sep 28 17:06:37 -03 2018
> > pve-ceph01:~# ssh pve-ceph05 date
> > Fri Sep 28 17:06:39 -03 2018
> >
> > pve-ceph01:~# ping -c 1 pve-ceph01
> > PING pve-ceph01.cepam.com.br (10.10.10.100) 56(84) bytes of data.
> > 64 bytes from pve-ceph01.cepam.com.br (10.10.10.100): icmp_seq=1 ttl=64
> > time=0.020 ms
> >
> > --- pve-ceph01.cepam.com.br ping statistics ---
> > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > rtt min/avg/max/mdev = 0.020/0.020/0.020/0.000 ms
> > pve-ceph01:~# ping -c 1 pve-ceph02
> > once.
> > PING pve-ceph02.cepam.com.br (10.10.10.110) 56(84) bytes of data.
> > 64 bytes from pve-ceph02.cepam.com.br (10.10.10.110): icmp_seq=1 ttl=64
> > time=0.120 ms
> >
> > --- pve-ceph02.cepam.com.br ping statistics ---
> > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > rtt min/avg/max/mdev = 0.120/0.120/0.120/0.000 ms
> > pve-ceph01:~# ping -c 1 pve-ceph05
> > PING pve-ceph05.cepam.com.br (10.10.10.140) 56(84) bytes of data.
> > 64 bytes from pve-ceph05.cepam.com.br (10.10.10.140): icmp_seq=1 ttl=64
> > time=0.078 ms
> >
> > --- pve-ceph05.cepam.com.br ping statistics ---
> > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > rtt min/avg/max/mdev = 0.078/0.078/0.078/0.000 ms
> >
> > I can communicate with other...
> > ceph command stuck
> > all ceph service it's appears to be running...
> >
> >
> > ---
> > Gilberto Nunes Ferreira
> >
> > (47) 3025-5907
> > (47) 99676-7530 - Whatsapp / Telegram
> >
> > Skype: gilberto.nunes36
> >
> >
> >
> >
> >
> > Em sex, 28 de set de 2018 às 17:02, Woods, Ken A (DNR) <
> ***@alaska.gov>
> > escreveu:
> >
> >> Network issues?
> >> Time issues?
> >>
> >>
> >>> On Sep 28, 2018, at 11:50, Gilberto Nunes <***@gmail.com>
> >> wrote:
> >>>
> >>> Hi there
> >>> I have a 6 server Ceph Cluster maded with proxmox 5.2
> >>> Suddenly, after power failure, I have only 3 servers UP, but even with
> 3
> >>> server, Ceph Cluster doesn't work.
> >>> pveceph status give me a timeout
> >>> pveceph status got timeout
> >>>
> >>> Any advice?
> >>>
> >>>
> >>> ---
> >>> Gilberto Nunes Ferreira
> >>>
> >>> (47) 3025-5907
> >>> (47) 99676-7530 - Whatsapp / Telegram
> >>>
> >>> Skype: gilberto.nunes36
> >>> _______________________________________________
> >>> pve-user mailing list
> >>> pve-***@pve.proxmox.com
> >>>
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=9BUtASlCjgOxfXJ55imNPY2xKhHLVcP7E8Klu3Ddc0E&s=XjLyjlncigLbXaMeQ-15anc5Wjnneqwz-n1VWZ_zj_E&e=
> >> _______________________________________________
> >> pve-user mailing list
> >> pve-***@pve.proxmox.com
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=hPQWd4X3X9m5HCF_f-fftckCIT3Ix1hGUcOW__0w32A&s=rpDVR25KfEMwiCsK3aj5RrfERa2_hxyOHkuuxfgUYho&e=
> >>
> > _______________________________________________
> > pve-user mailing list
> > pve-***@pve.proxmox.com
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=hPQWd4X3X9m5HCF_f-fftckCIT3Ix1hGUcOW__0w32A&s=rpDVR25KfEMwiCsK3aj5RrfERa2_hxyOHkuuxfgUYho&e=
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>

Mark Adams

2018-09-28 20:22:53 UTC

Permalink

the exact same 3 servers have been down and everything has worked? do you
run ceph mons on every server?

On Fri, 28 Sep 2018, 21:19 Gilberto Nunes, <***@gmail.com>
wrote:

> Everything is working couple of hours ago!
> Due a power failure, 3 hosts are down, but I suppose with 3 host the
> cluster garantee the quorum and allow ceph continues to work.
> This already happen before and ceph works perfectly with 3 servers
>
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
>
>
>
>
>
> Em sex, 28 de set de 2018 às 17:13, Woods, Ken A (DNR) <
> ***@alaska.gov>
> escreveu:
>
> > So the times are massively different. Fix that.
> >
> > And corosync needs multicast , and that’s tested using omping, not ping.
> >
> > Go back through the initial set up documentation and read what is
> required
> > for the basic network configuration.
> >
> > If corosync and ceph are both not working, start there.
> >
> >
> > > On Sep 28, 2018, at 12:08, Gilberto Nunes <***@gmail.com>
> > wrote:
> > >
> > > pve-ceph01:~# ssh pve-ceph01 date
> > > Fri Sep 28 17:06:34 -03 2018
> > > pve-ceph01:~# ssh pve-ceph02 date
> > > Fri Sep 28 17:06:37 -03 2018
> > > pve-ceph01:~# ssh pve-ceph05 date
> > > Fri Sep 28 17:06:39 -03 2018
> > >
> > > pve-ceph01:~# ping -c 1 pve-ceph01
> > > PING pve-ceph01.cepam.com.br (10.10.10.100) 56(84) bytes of data.
> > > 64 bytes from pve-ceph01.cepam.com.br (10.10.10.100): icmp_seq=1
> ttl=64
> > > time=0.020 ms
> > >
> > > --- pve-ceph01.cepam.com.br ping statistics ---
> > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > > rtt min/avg/max/mdev = 0.020/0.020/0.020/0.000 ms
> > > pve-ceph01:~# ping -c 1 pve-ceph02
> > > once.
> > > PING pve-ceph02.cepam.com.br (10.10.10.110) 56(84) bytes of data.
> > > 64 bytes from pve-ceph02.cepam.com.br (10.10.10.110): icmp_seq=1
> ttl=64
> > > time=0.120 ms
> > >
> > > --- pve-ceph02.cepam.com.br ping statistics ---
> > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > > rtt min/avg/max/mdev = 0.120/0.120/0.120/0.000 ms
> > > pve-ceph01:~# ping -c 1 pve-ceph05
> > > PING pve-ceph05.cepam.com.br (10.10.10.140) 56(84) bytes of data.
> > > 64 bytes from pve-ceph05.cepam.com.br (10.10.10.140): icmp_seq=1
> ttl=64
> > > time=0.078 ms
> > >
> > > --- pve-ceph05.cepam.com.br ping statistics ---
> > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > > rtt min/avg/max/mdev = 0.078/0.078/0.078/0.000 ms
> > >
> > > I can communicate with other...
> > > ceph command stuck
> > > all ceph service it's appears to be running...
> > >
> > >
> > > ---
> > > Gilberto Nunes Ferreira
> > >
> > > (47) 3025-5907
> > > (47) 99676-7530 - Whatsapp / Telegram
> > >
> > > Skype: gilberto.nunes36
> > >
> > >
> > >
> > >
> > >
> > > Em sex, 28 de set de 2018 às 17:02, Woods, Ken A (DNR) <
> > ***@alaska.gov>
> > > escreveu:
> > >
> > >> Network issues?
> > >> Time issues?
> > >>
> > >>
> > >>> On Sep 28, 2018, at 11:50, Gilberto Nunes <
> ***@gmail.com>
> > >> wrote:
> > >>>
> > >>> Hi there
> > >>> I have a 6 server Ceph Cluster maded with proxmox 5.2
> > >>> Suddenly, after power failure, I have only 3 servers UP, but even
> with
> > 3
> > >>> server, Ceph Cluster doesn't work.
> > >>> pveceph status give me a timeout
> > >>> pveceph status got timeout
> > >>>
> > >>> Any advice?
> > >>>
> > >>>
> > >>> ---
> > >>> Gilberto Nunes Ferreira
> > >>>
> > >>> (47) 3025-5907
> > >>> (47) 99676-7530 - Whatsapp / Telegram
> > >>>
> > >>> Skype: gilberto.nunes36
> > >>> _______________________________________________
> > >>> pve-user mailing list
> > >>> pve-***@pve.proxmox.com
> > >>>
> > >>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=9BUtASlCjgOxfXJ55imNPY2xKhHLVcP7E8Klu3Ddc0E&s=XjLyjlncigLbXaMeQ-15anc5Wjnneqwz-n1VWZ_zj_E&e=
> > >> _______________________________________________
> > >> pve-user mailing list
> > >> pve-***@pve.proxmox.com
> > >>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=hPQWd4X3X9m5HCF_f-fftckCIT3Ix1hGUcOW__0w32A&s=rpDVR25KfEMwiCsK3aj5RrfERa2_hxyOHkuuxfgUYho&e=
> > >>
> > > _______________________________________________
> > > pve-user mailing list
> > > pve-***@pve.proxmox.com
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=hPQWd4X3X9m5HCF_f-fftckCIT3Ix1hGUcOW__0w32A&s=rpDVR25KfEMwiCsK3aj5RrfERa2_hxyOHkuuxfgUYho&e=
> > _______________________________________________
> > pve-user mailing list
> > pve-***@pve.proxmox.com
> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>

Mark Adams

2018-09-28 20:23:41 UTC

Permalink

also, 3 out of 6 servers is not quorum. you need a majority of the total.

On Fri, 28 Sep 2018, 21:22 Mark Adams, <***@openvs.co.uk> wrote:

> the exact same 3 servers have been down and everything has worked? do you
> run ceph mons on every server?
>
> On Fri, 28 Sep 2018, 21:19 Gilberto Nunes, <***@gmail.com>
> wrote:
>
>> Everything is working couple of hours ago!
>> Due a power failure, 3 hosts are down, but I suppose with 3 host the
>> cluster garantee the quorum and allow ceph continues to work.
>> This already happen before and ceph works perfectly with 3 servers
>>
>> ---
>> Gilberto Nunes Ferreira
>>
>> (47) 3025-5907
>> (47) 99676-7530 - Whatsapp / Telegram
>>
>> Skype: gilberto.nunes36
>>
>>
>>
>>
>>
>> Em sex, 28 de set de 2018 às 17:13, Woods, Ken A (DNR) <
>> ***@alaska.gov>
>> escreveu:
>>
>> > So the times are massively different. Fix that.
>> >
>> > And corosync needs multicast , and that’s tested using omping, not ping.
>> >
>> > Go back through the initial set up documentation and read what is
>> required
>> > for the basic network configuration.
>> >
>> > If corosync and ceph are both not working, start there.
>> >
>> >
>> > > On Sep 28, 2018, at 12:08, Gilberto Nunes <***@gmail.com
>> >
>> > wrote:
>> > >
>> > > pve-ceph01:~# ssh pve-ceph01 date
>> > > Fri Sep 28 17:06:34 -03 2018
>> > > pve-ceph01:~# ssh pve-ceph02 date
>> > > Fri Sep 28 17:06:37 -03 2018
>> > > pve-ceph01:~# ssh pve-ceph05 date
>> > > Fri Sep 28 17:06:39 -03 2018
>> > >
>> > > pve-ceph01:~# ping -c 1 pve-ceph01
>> > > PING pve-ceph01.cepam.com.br (10.10.10.100) 56(84) bytes of data.
>> > > 64 bytes from pve-ceph01.cepam.com.br (10.10.10.100): icmp_seq=1
>> ttl=64
>> > > time=0.020 ms
>> > >
>> > > --- pve-ceph01.cepam.com.br ping statistics ---
>> > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> > > rtt min/avg/max/mdev = 0.020/0.020/0.020/0.000 ms
>> > > pve-ceph01:~# ping -c 1 pve-ceph02
>> > > once.
>> > > PING pve-ceph02.cepam.com.br (10.10.10.110) 56(84) bytes of data.
>> > > 64 bytes from pve-ceph02.cepam.com.br (10.10.10.110): icmp_seq=1
>> ttl=64
>> > > time=0.120 ms
>> > >
>> > > --- pve-ceph02.cepam.com.br ping statistics ---
>> > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> > > rtt min/avg/max/mdev = 0.120/0.120/0.120/0.000 ms
>> > > pve-ceph01:~# ping -c 1 pve-ceph05
>> > > PING pve-ceph05.cepam.com.br (10.10.10.140) 56(84) bytes of data.
>> > > 64 bytes from pve-ceph05.cepam.com.br (10.10.10.140): icmp_seq=1
>> ttl=64
>> > > time=0.078 ms
>> > >
>> > > --- pve-ceph05.cepam.com.br ping statistics ---
>> > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> > > rtt min/avg/max/mdev = 0.078/0.078/0.078/0.000 ms
>> > >
>> > > I can communicate with other...
>> > > ceph command stuck
>> > > all ceph service it's appears to be running...
>> > >
>> > >
>> > > ---
>> > > Gilberto Nunes Ferreira
>> > >
>> > > (47) 3025-5907
>> > > (47) 99676-7530 - Whatsapp / Telegram
>> > >
>> > > Skype: gilberto.nunes36
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > Em sex, 28 de set de 2018 às 17:02, Woods, Ken A (DNR) <
>> > ***@alaska.gov>
>> > > escreveu:
>> > >
>> > >> Network issues?
>> > >> Time issues?
>> > >>
>> > >>
>> > >>> On Sep 28, 2018, at 11:50, Gilberto Nunes <
>> ***@gmail.com>
>> > >> wrote:
>> > >>>
>> > >>> Hi there
>> > >>> I have a 6 server Ceph Cluster maded with proxmox 5.2
>> > >>> Suddenly, after power failure, I have only 3 servers UP, but even
>> with
>> > 3
>> > >>> server, Ceph Cluster doesn't work.
>> > >>> pveceph status give me a timeout
>> > >>> pveceph status got timeout
>> > >>>
>> > >>> Any advice?
>> > >>>
>> > >>>
>> > >>> ---
>> > >>> Gilberto Nunes Ferreira
>> > >>>
>> > >>> (47) 3025-5907
>> > >>> (47) 99676-7530 - Whatsapp / Telegram
>> > >>>
>> > >>> Skype: gilberto.nunes36
>> > >>> _______________________________________________
>> > >>> pve-user mailing list
>> > >>> pve-***@pve.proxmox.com
>> > >>>
>> > >>
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=9BUtASlCjgOxfXJ55imNPY2xKhHLVcP7E8Klu3Ddc0E&s=XjLyjlncigLbXaMeQ-15anc5Wjnneqwz-n1VWZ_zj_E&e=
>> > >> _______________________________________________
>> > >> pve-user mailing list
>> > >> pve-***@pve.proxmox.com
>> > >>
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=hPQWd4X3X9m5HCF_f-fftckCIT3Ix1hGUcOW__0w32A&s=rpDVR25KfEMwiCsK3aj5RrfERa2_hxyOHkuuxfgUYho&e=
>> > >>
>> > > _______________________________________________
>> > > pve-user mailing list
>> > > pve-***@pve.proxmox.com
>> > >
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=hPQWd4X3X9m5HCF_f-fftckCIT3Ix1hGUcOW__0w32A&s=rpDVR25KfEMwiCsK3aj5RrfERa2_hxyOHkuuxfgUYho&e=
>> > _______________________________________________
>> > pve-user mailing list
>> > pve-***@pve.proxmox.com
>> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> >
>> _______________________________________________
>> pve-user mailing list
>> pve-***@pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>

Gilberto Nunes

2018-09-28 20:24:59 UTC

Permalink

I thing I figure out what happens...
For my lucky one of the servers, pve-ceph05 that remains, has the OSD crush
marked to 0.
My environment consist a mix of different disks sizes....

---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Em sex, 28 de set de 2018 às 17:23, Mark Adams <***@openvs.co.uk> escreveu:

> the exact same 3 servers have been down and everything has worked? do you
> run ceph mons on every server?
>
> On Fri, 28 Sep 2018, 21:19 Gilberto Nunes, <***@gmail.com>
> wrote:
>
> > Everything is working couple of hours ago!
> > Due a power failure, 3 hosts are down, but I suppose with 3 host the
> > cluster garantee the quorum and allow ceph continues to work.
> > This already happen before and ceph works perfectly with 3 servers
> >
> > ---
> > Gilberto Nunes Ferreira
> >
> > (47) 3025-5907
> > (47) 99676-7530 - Whatsapp / Telegram
> >
> > Skype: gilberto.nunes36
> >
> >
> >
> >
> >
> > Em sex, 28 de set de 2018 às 17:13, Woods, Ken A (DNR) <
> > ***@alaska.gov>
> > escreveu:
> >
> > > So the times are massively different. Fix that.
> > >
> > > And corosync needs multicast , and that’s tested using omping, not
> ping.
> > >
> > > Go back through the initial set up documentation and read what is
> > required
> > > for the basic network configuration.
> > >
> > > If corosync and ceph are both not working, start there.
> > >
> > >
> > > > On Sep 28, 2018, at 12:08, Gilberto Nunes <
> ***@gmail.com>
> > > wrote:
> > > >
> > > > pve-ceph01:~# ssh pve-ceph01 date
> > > > Fri Sep 28 17:06:34 -03 2018
> > > > pve-ceph01:~# ssh pve-ceph02 date
> > > > Fri Sep 28 17:06:37 -03 2018
> > > > pve-ceph01:~# ssh pve-ceph05 date
> > > > Fri Sep 28 17:06:39 -03 2018
> > > >
> > > > pve-ceph01:~# ping -c 1 pve-ceph01
> > > > PING pve-ceph01.cepam.com.br (10.10.10.100) 56(84) bytes of data.
> > > > 64 bytes from pve-ceph01.cepam.com.br (10.10.10.100): icmp_seq=1
> > ttl=64
> > > > time=0.020 ms
> > > >
> > > > --- pve-ceph01.cepam.com.br ping statistics ---
> > > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > > > rtt min/avg/max/mdev = 0.020/0.020/0.020/0.000 ms
> > > > pve-ceph01:~# ping -c 1 pve-ceph02
> > > > once.
> > > > PING pve-ceph02.cepam.com.br (10.10.10.110) 56(84) bytes of data.
> > > > 64 bytes from pve-ceph02.cepam.com.br (10.10.10.110): icmp_seq=1
> > ttl=64
> > > > time=0.120 ms
> > > >
> > > > --- pve-ceph02.cepam.com.br ping statistics ---
> > > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > > > rtt min/avg/max/mdev = 0.120/0.120/0.120/0.000 ms
> > > > pve-ceph01:~# ping -c 1 pve-ceph05
> > > > PING pve-ceph05.cepam.com.br (10.10.10.140) 56(84) bytes of data.
> > > > 64 bytes from pve-ceph05.cepam.com.br (10.10.10.140): icmp_seq=1
> > ttl=64
> > > > time=0.078 ms
> > > >
> > > > --- pve-ceph05.cepam.com.br ping statistics ---
> > > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > > > rtt min/avg/max/mdev = 0.078/0.078/0.078/0.000 ms
> > > >
> > > > I can communicate with other...
> > > > ceph command stuck
> > > > all ceph service it's appears to be running...
> > > >
> > > >
> > > > ---
> > > > Gilberto Nunes Ferreira
> > > >
> > > > (47) 3025-5907
> > > > (47) 99676-7530 - Whatsapp / Telegram
> > > >
> > > > Skype: gilberto.nunes36
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Em sex, 28 de set de 2018 às 17:02, Woods, Ken A (DNR) <
> > > ***@alaska.gov>
> > > > escreveu:
> > > >
> > > >> Network issues?
> > > >> Time issues?
> > > >>
> > > >>
> > > >>> On Sep 28, 2018, at 11:50, Gilberto Nunes <
> > ***@gmail.com>
> > > >> wrote:
> > > >>>
> > > >>> Hi there
> > > >>> I have a 6 server Ceph Cluster maded with proxmox 5.2
> > > >>> Suddenly, after power failure, I have only 3 servers UP, but even
> > with
> > > 3
> > > >>> server, Ceph Cluster doesn't work.
> > > >>> pveceph status give me a timeout
> > > >>> pveceph status got timeout
> > > >>>
> > > >>> Any advice?
> > > >>>
> > > >>>
> > > >>> ---
> > > >>> Gilberto Nunes Ferreira
> > > >>>
> > > >>> (47) 3025-5907
> > > >>> (47) 99676-7530 - Whatsapp / Telegram
> > > >>>
> > > >>> Skype: gilberto.nunes36
> > > >>> _______________________________________________
> > > >>> pve-user mailing list
> > > >>> pve-***@pve.proxmox.com
> > > >>>
> > > >>
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=9BUtASlCjgOxfXJ55imNPY2xKhHLVcP7E8Klu3Ddc0E&s=XjLyjlncigLbXaMeQ-15anc5Wjnneqwz-n1VWZ_zj_E&e=
> > > >> _______________________________________________
> > > >> pve-user mailing list
> > > >> pve-***@pve.proxmox.com
> > > >>
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=hPQWd4X3X9m5HCF_f-fftckCIT3Ix1hGUcOW__0w32A&s=rpDVR25KfEMwiCsK3aj5RrfERa2_hxyOHkuuxfgUYho&e=
> > > >>
> > > > _______________________________________________
> > > > pve-user mailing list
> > > > pve-***@pve.proxmox.com
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=hPQWd4X3X9m5HCF_f-fftckCIT3Ix1hGUcOW__0w32A&s=rpDVR25KfEMwiCsK3aj5RrfERa2_hxyOHkuuxfgUYho&e=
> > > _______________________________________________
> > > pve-user mailing list
> > > pve-***@pve.proxmox.com
> > > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> > >
> > _______________________________________________
> > pve-user mailing list
> > pve-***@pve.proxmox.com
> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>

Gilberto Nunes

2018-09-28 20:29:34 UTC

Permalink

And to make day outstanding, I get a fiber optic ruptured!!!!
😥
---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Em sex, 28 de set de 2018 às 17:24, Gilberto Nunes <
***@gmail.com> escreveu:

> I thing I figure out what happens...
> For my lucky one of the servers, pve-ceph05 that remains, has the OSD
> crush marked to 0.
> My environment consist a mix of different disks sizes....
>
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
>
>
>
>
>
> Em sex, 28 de set de 2018 às 17:23, Mark Adams <***@openvs.co.uk>
> escreveu:
>
>> the exact same 3 servers have been down and everything has worked? do you
>> run ceph mons on every server?
>>
>> On Fri, 28 Sep 2018, 21:19 Gilberto Nunes, <***@gmail.com>
>> wrote:
>>
>> > Everything is working couple of hours ago!
>> > Due a power failure, 3 hosts are down, but I suppose with 3 host the
>> > cluster garantee the quorum and allow ceph continues to work.
>> > This already happen before and ceph works perfectly with 3 servers
>> >
>> > ---
>> > Gilberto Nunes Ferreira
>> >
>> > (47) 3025-5907
>> > (47) 99676-7530 - Whatsapp / Telegram
>> >
>> > Skype: gilberto.nunes36
>> >
>> >
>> >
>> >
>> >
>> > Em sex, 28 de set de 2018 às 17:13, Woods, Ken A (DNR) <
>> > ***@alaska.gov>
>> > escreveu:
>> >
>> > > So the times are massively different. Fix that.
>> > >
>> > > And corosync needs multicast , and that’s tested using omping, not
>> ping.
>> > >
>> > > Go back through the initial set up documentation and read what is
>> > required
>> > > for the basic network configuration.
>> > >
>> > > If corosync and ceph are both not working, start there.
>> > >
>> > >
>> > > > On Sep 28, 2018, at 12:08, Gilberto Nunes <
>> ***@gmail.com>
>> > > wrote:
>> > > >
>> > > > pve-ceph01:~# ssh pve-ceph01 date
>> > > > Fri Sep 28 17:06:34 -03 2018
>> > > > pve-ceph01:~# ssh pve-ceph02 date
>> > > > Fri Sep 28 17:06:37 -03 2018
>> > > > pve-ceph01:~# ssh pve-ceph05 date
>> > > > Fri Sep 28 17:06:39 -03 2018
>> > > >
>> > > > pve-ceph01:~# ping -c 1 pve-ceph01
>> > > > PING pve-ceph01.cepam.com.br (10.10.10.100) 56(84) bytes of data.
>> > > > 64 bytes from pve-ceph01.cepam.com.br (10.10.10.100): icmp_seq=1
>> > ttl=64
>> > > > time=0.020 ms
>> > > >
>> > > > --- pve-ceph01.cepam.com.br ping statistics ---
>> > > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> > > > rtt min/avg/max/mdev = 0.020/0.020/0.020/0.000 ms
>> > > > pve-ceph01:~# ping -c 1 pve-ceph02
>> > > > once.
>> > > > PING pve-ceph02.cepam.com.br (10.10.10.110) 56(84) bytes of data.
>> > > > 64 bytes from pve-ceph02.cepam.com.br (10.10.10.110): icmp_seq=1
>> > ttl=64
>> > > > time=0.120 ms
>> > > >
>> > > > --- pve-ceph02.cepam.com.br ping statistics ---
>> > > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> > > > rtt min/avg/max/mdev = 0.120/0.120/0.120/0.000 ms
>> > > > pve-ceph01:~# ping -c 1 pve-ceph05
>> > > > PING pve-ceph05.cepam.com.br (10.10.10.140) 56(84) bytes of data.
>> > > > 64 bytes from pve-ceph05.cepam.com.br (10.10.10.140): icmp_seq=1
>> > ttl=64
>> > > > time=0.078 ms
>> > > >
>> > > > --- pve-ceph05.cepam.com.br ping statistics ---
>> > > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> > > > rtt min/avg/max/mdev = 0.078/0.078/0.078/0.000 ms
>> > > >
>> > > > I can communicate with other...
>> > > > ceph command stuck
>> > > > all ceph service it's appears to be running...
>> > > >
>> > > >
>> > > > ---
>> > > > Gilberto Nunes Ferreira
>> > > >
>> > > > (47) 3025-5907
>> > > > (47) 99676-7530 - Whatsapp / Telegram
>> > > >
>> > > > Skype: gilberto.nunes36
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > Em sex, 28 de set de 2018 às 17:02, Woods, Ken A (DNR) <
>> > > ***@alaska.gov>
>> > > > escreveu:
>> > > >
>> > > >> Network issues?
>> > > >> Time issues?
>> > > >>
>> > > >>
>> > > >>> On Sep 28, 2018, at 11:50, Gilberto Nunes <
>> > ***@gmail.com>
>> > > >> wrote:
>> > > >>>
>> > > >>> Hi there
>> > > >>> I have a 6 server Ceph Cluster maded with proxmox 5.2
>> > > >>> Suddenly, after power failure, I have only 3 servers UP, but even
>> > with
>> > > 3
>> > > >>> server, Ceph Cluster doesn't work.
>> > > >>> pveceph status give me a timeout
>> > > >>> pveceph status got timeout
>> > > >>>
>> > > >>> Any advice?
>> > > >>>
>> > > >>>
>> > > >>> ---
>> > > >>> Gilberto Nunes Ferreira
>> > > >>>
>> > > >>> (47) 3025-5907
>> > > >>> (47) 99676-7530 - Whatsapp / Telegram
>> > > >>>
>> > > >>> Skype: gilberto.nunes36
>> > > >>> _______________________________________________
>> > > >>> pve-user mailing list
>> > > >>> pve-***@pve.proxmox.com
>> > > >>>
>> > > >>
>> > >
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=9BUtASlCjgOxfXJ55imNPY2xKhHLVcP7E8Klu3Ddc0E&s=XjLyjlncigLbXaMeQ-15anc5Wjnneqwz-n1VWZ_zj_E&e=
>> > > >> _______________________________________________
>> > > >> pve-user mailing list
>> > > >> pve-***@pve.proxmox.com
>> > > >>
>> > >
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=hPQWd4X3X9m5HCF_f-fftckCIT3Ix1hGUcOW__0w32A&s=rpDVR25KfEMwiCsK3aj5RrfERa2_hxyOHkuuxfgUYho&e=
>> > > >>
>> > > > _______________________________________________
>> > > > pve-user mailing list
>> > > > pve-***@pve.proxmox.com
>> > > >
>> > >
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=hPQWd4X3X9m5HCF_f-fftckCIT3Ix1hGUcOW__0w32A&s=rpDVR25KfEMwiCsK3aj5RrfERa2_hxyOHkuuxfgUYho&e=
>> > > _______________________________________________
>> > > pve-user mailing list
>> > > pve-***@pve.proxmox.com
>> > > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> > >
>> > _______________________________________________
>> > pve-user mailing list
>> > pve-***@pve.proxmox.com
>> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> >
>> _______________________________________________
>> pve-user mailing list
>> pve-***@pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>