[PVE-User] Proxmox CEPH 6 servers failures!

Discussion:

Gilberto Nunes

2018-10-04 20:05:16 UTC

Hi there

I have something like this:

CEPH01 ----|
|----- CEPH04
|
|
CEPH02 ----|-----------------------------------------------------|----
CEPH05
| Optic Fiber
|
CEPH03 ----|
|--- CEPH06

Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and CEPH03
remains, the entire cluster fail!
I find out the cause!

ceph.conf

[global] auth client required = cephx auth cluster required = cephx auth
service required = cephx cluster network = 10.10.10.0/24 fsid =
e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
/etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true osd
journal size = 5120 osd pool default min size = 2 osd pool default size = 3
public network = 10.10.10.0/24 [osd] keyring =
/var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host = pve-ceph01 mon
addr = 10.10.10.100:6789 mon osd allow primary affinity = true
[mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789 mon osd
allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03 mon addr =
10.10.10.120:6789 mon osd allow primary affinity = true [mon.pve-ceph04]
host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow primary
affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
10.10.10.140:6789 mon osd allow primary affinity = true [mon.pve-ceph06]
host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow primary
affinity = true

Any help will be welcome!

---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Alwin Antreich

2018-10-04 21:27:49 UTC

Permalink

Hello Gilberto,

On Thu, Oct 4, 2018, 22:05 Gilberto Nunes <***@gmail.com>
wrote:

> Hi there
>
> I have something like this:
>
> CEPH01 ----|
> |----- CEPH04
> |
> |
> CEPH02 ----|-----------------------------------------------------|----
> CEPH05
> | Optic Fiber
> |
> CEPH03 ----|
> |--- CEPH06
>
> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and CEPH03
> remains, the entire cluster fail!
> I find out the cause!
>
Look into this.
https://forum.proxmox.com/threads/quorum-even-node-number.30005/

> ceph.conf
>
> [global] auth client required = cephx auth cluster required = cephx auth
> service required = cephx cluster network = 10.10.10.0/24 fsid =
> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true osd
> journal size = 5120 osd pool default min size = 2 osd pool default size = 3
> public network = 10.10.10.0/24 [osd] keyring =
> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host = pve-ceph01 mon
> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789 mon osd
> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03 mon addr =
> 10.10.10.120:6789 mon osd allow primary affinity = true [mon.pve-ceph04]
> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow primary
> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
> 10.10.10.140:6789 mon osd allow primary affinity = true [mon.pve-ceph06]
> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow primary
> affinity = true
>
> Any help will be welcome!
>
> ---
> Gilberto Nunes Ferreira
>
--
Cheers,
Alwin

>

Alexandre DERUMIER

2018-10-05 06:55:01 UTC

Permalink

Hi,

Can you resend your schema, because it's impossible to read.

but you need to have to quorum on monitor to have the cluster working.

----- Mail original -----
De: "Gilberto Nunes" <***@gmail.com>
À: "proxmoxve" <pve-***@pve.proxmox.com>
Envoyé: Jeudi 4 Octobre 2018 22:05:16
Objet: [PVE-User] Proxmox CEPH 6 servers failures!

Hi there

I have something like this:

CEPH01 ----|
|----- CEPH04
|
|
CEPH02 ----|-----------------------------------------------------|----
CEPH05
| Optic Fiber
|
CEPH03 ----|
|--- CEPH06

Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and CEPH03
remains, the entire cluster fail!
I find out the cause!

ceph.conf

[global] auth client required = cephx auth cluster required = cephx auth
service required = cephx cluster network = 10.10.10.0/24 fsid =
e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
/etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true osd
journal size = 5120 osd pool default min size = 2 osd pool default size = 3
public network = 10.10.10.0/24 [osd] keyring =
/var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host = pve-ceph01 mon
addr = 10.10.10.100:6789 mon osd allow primary affinity = true
[mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789 mon osd
allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03 mon addr =
10.10.10.120:6789 mon osd allow primary affinity = true [mon.pve-ceph04]
host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow primary
affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
10.10.10.140:6789 mon osd allow primary affinity = true [mon.pve-ceph06]
host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow primary
affinity = true

Any help will be welcome!

---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36
_______________________________________________
pve-user mailing list
pve-***@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Gilberto Nunes

2018-10-05 12:10:10 UTC

Permalink

Hi
Perhaps this can help:

https://imageshack.com/a/img921/6208/X7ha8R.png

I was thing about it, and perhaps if I deploy a VM in both side, with
Proxmox and add this VM to the CEPH cluster, maybe this can help!

thanks
---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Em sex, 5 de out de 2018 às 03:55, Alexandre DERUMIER <***@odiso.com>
escreveu:

> Hi,
>
> Can you resend your schema, because it's impossible to read.
>
>
> but you need to have to quorum on monitor to have the cluster working.
>
>
> ----- Mail original -----
> De: "Gilberto Nunes" <***@gmail.com>
> À: "proxmoxve" <pve-***@pve.proxmox.com>
> Envoyé: Jeudi 4 Octobre 2018 22:05:16
> Objet: [PVE-User] Proxmox CEPH 6 servers failures!
>
> Hi there
>
> I have something like this:
>
> CEPH01 ----|
> |----- CEPH04
> |
> |
> CEPH02 ----|-----------------------------------------------------|----
> CEPH05
> | Optic Fiber
> |
> CEPH03 ----|
> |--- CEPH06
>
> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and CEPH03
> remains, the entire cluster fail!
> I find out the cause!
>
> ceph.conf
>
> [global] auth client required = cephx auth cluster required = cephx auth
> service required = cephx cluster network = 10.10.10.0/24 fsid =
> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true osd
> journal size = 5120 osd pool default min size = 2 osd pool default size =
> 3
> public network = 10.10.10.0/24 [osd] keyring =
> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host = pve-ceph01 mon
> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789 mon osd
> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03 mon addr
> =
> 10.10.10.120:6789 mon osd allow primary affinity = true [mon.pve-ceph04]
> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow primary
> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
> 10.10.10.140:6789 mon osd allow primary affinity = true [mon.pve-ceph06]
> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow primary
> affinity = true
>
> Any help will be welcome!
>
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>

dorsy

2018-10-05 12:23:19 UTC

Permalink

Your question has already been answered. You need majority to have quorum.

On 2018. 10. 05. 14:10, Gilberto Nunes wrote:
> Hi
> Perhaps this can help:
>
> https://imageshack.com/a/img921/6208/X7ha8R.png
>
> I was thing about it, and perhaps if I deploy a VM in both side, with
> Proxmox and add this VM to the CEPH cluster, maybe this can help!
>
> thanks
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
>
>
>
>
>
> Em sex, 5 de out de 2018 às 03:55, Alexandre DERUMIER <***@odiso.com>
> escreveu:
>
>> Hi,
>>
>> Can you resend your schema, because it's impossible to read.
>>
>>
>> but you need to have to quorum on monitor to have the cluster working.
>>
>>
>> ----- Mail original -----
>> De: "Gilberto Nunes" <***@gmail.com>
>> À: "proxmoxve" <pve-***@pve.proxmox.com>
>> Envoyé: Jeudi 4 Octobre 2018 22:05:16
>> Objet: [PVE-User] Proxmox CEPH 6 servers failures!
>>
>> Hi there
>>
>> I have something like this:
>>
>> CEPH01 ----|
>> |----- CEPH04
>> |
>> |
>> CEPH02 ----|-----------------------------------------------------|----
>> CEPH05
>> | Optic Fiber
>> |
>> CEPH03 ----|
>> |--- CEPH06
>>
>> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and CEPH03
>> remains, the entire cluster fail!
>> I find out the cause!
>>
>> ceph.conf
>>
>> [global] auth client required = cephx auth cluster required = cephx auth
>> service required = cephx cluster network = 10.10.10.0/24 fsid =
>> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
>> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true osd
>> journal size = 5120 osd pool default min size = 2 osd pool default size =
>> 3
>> public network = 10.10.10.0/24 [osd] keyring =
>> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host = pve-ceph01 mon
>> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
>> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789 mon osd
>> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03 mon addr
>> =
>> 10.10.10.120:6789 mon osd allow primary affinity = true [mon.pve-ceph04]
>> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow primary
>> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
>> 10.10.10.140:6789 mon osd allow primary affinity = true [mon.pve-ceph06]
>> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow primary
>> affinity = true
>>
>> Any help will be welcome!
>>
>> ---
>> Gilberto Nunes Ferreira
>>
>> (47) 3025-5907
>> (47) 99676-7530 - Whatsapp / Telegram
>>
>> Skype: gilberto.nunes36
>> _______________________________________________
>> pve-user mailing list
>> pve-***@pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-***@pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Gilberto Nunes

2018-10-05 12:31:20 UTC

Permalink

Nice.. Perhaps if I create a VM in Proxmox01 and Proxmox02, and join this
VM into Cluster Ceph, can I solve to quorum problem?
---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Em sex, 5 de out de 2018 às 09:23, dorsy <***@yahoo.com> escreveu:

> Your question has already been answered. You need majority to have quorum.
>
> On 2018. 10. 05. 14:10, Gilberto Nunes wrote:
> > Hi
> > Perhaps this can help:
> >
> > https://imageshack.com/a/img921/6208/X7ha8R.png
> >
> > I was thing about it, and perhaps if I deploy a VM in both side, with
> > Proxmox and add this VM to the CEPH cluster, maybe this can help!
> >
> > thanks
> > ---
> > Gilberto Nunes Ferreira
> >
> > (47) 3025-5907
> > (47) 99676-7530 - Whatsapp / Telegram
> >
> > Skype: gilberto.nunes36
> >
> >
> >
> >
> >
> > Em sex, 5 de out de 2018 às 03:55, Alexandre DERUMIER <
> ***@odiso.com>
> > escreveu:
> >
> >> Hi,
> >>
> >> Can you resend your schema, because it's impossible to read.
> >>
> >>
> >> but you need to have to quorum on monitor to have the cluster working.
> >>
> >>
> >> ----- Mail original -----
> >> De: "Gilberto Nunes" <***@gmail.com>
> >> À: "proxmoxve" <pve-***@pve.proxmox.com>
> >> Envoyé: Jeudi 4 Octobre 2018 22:05:16
> >> Objet: [PVE-User] Proxmox CEPH 6 servers failures!
> >>
> >> Hi there
> >>
> >> I have something like this:
> >>
> >> CEPH01 ----|
> >> |----- CEPH04
> >> |
> >> |
> >> CEPH02 ----|-----------------------------------------------------|----
> >> CEPH05
> >> | Optic Fiber
> >> |
> >> CEPH03 ----|
> >> |--- CEPH06
> >>
> >> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and CEPH03
> >> remains, the entire cluster fail!
> >> I find out the cause!
> >>
> >> ceph.conf
> >>
> >> [global] auth client required = cephx auth cluster required = cephx auth
> >> service required = cephx cluster network = 10.10.10.0/24 fsid =
> >> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
> >> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true osd
> >> journal size = 5120 osd pool default min size = 2 osd pool default size
> =
> >> 3
> >> public network = 10.10.10.0/24 [osd] keyring =
> >> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host = pve-ceph01
> mon
> >> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
> >> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789 mon osd
> >> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03 mon
> addr
> >> =
> >> 10.10.10.120:6789 mon osd allow primary affinity = true
> [mon.pve-ceph04]
> >> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow primary
> >> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
> >> 10.10.10.140:6789 mon osd allow primary affinity = true
> [mon.pve-ceph06]
> >> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow primary
> >> affinity = true
> >>
> >> Any help will be welcome!
> >>
> >> ---
> >> Gilberto Nunes Ferreira
> >>
> >> (47) 3025-5907
> >> (47) 99676-7530 - Whatsapp / Telegram
> >>
> >> Skype: gilberto.nunes36
> >> _______________________________________________
> >> pve-user mailing list
> >> pve-***@pve.proxmox.com
> >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >>
> >> _______________________________________________
> >> pve-user mailing list
> >> pve-***@pve.proxmox.com
> >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >>
> > _______________________________________________
> > pve-user mailing list
> > pve-***@pve.proxmox.com
> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>

dorsy

2018-10-05 12:38:02 UTC

Permalink

moveing from 6 to 8 mons and loosing 4 of them instead of 3 will not
save you.

Basic maths:
floor((n/2)+1)

On 2018. 10. 05. 14:31, Gilberto Nunes wrote:
> Nice.. Perhaps if I create a VM in Proxmox01 and Proxmox02, and join this
> VM into Cluster Ceph, can I solve to quorum problem?
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
>
>
>
>
>
> Em sex, 5 de out de 2018 às 09:23, dorsy <***@yahoo.com> escreveu:
>
>> Your question has already been answered. You need majority to have quorum.
>>
>> On 2018. 10. 05. 14:10, Gilberto Nunes wrote:
>>> Hi
>>> Perhaps this can help:
>>>
>>> https://imageshack.com/a/img921/6208/X7ha8R.png
>>>
>>> I was thing about it, and perhaps if I deploy a VM in both side, with
>>> Proxmox and add this VM to the CEPH cluster, maybe this can help!
>>>
>>> thanks
>>> ---
>>> Gilberto Nunes Ferreira
>>>
>>> (47) 3025-5907
>>> (47) 99676-7530 - Whatsapp / Telegram
>>>
>>> Skype: gilberto.nunes36
>>>
>>>
>>>
>>>
>>>
>>> Em sex, 5 de out de 2018 às 03:55, Alexandre DERUMIER <
>> ***@odiso.com>
>>> escreveu:
>>>
>>>> Hi,
>>>>
>>>> Can you resend your schema, because it's impossible to read.
>>>>
>>>>
>>>> but you need to have to quorum on monitor to have the cluster working.
>>>>
>>>>
>>>> ----- Mail original -----
>>>> De: "Gilberto Nunes" <***@gmail.com>
>>>> À: "proxmoxve" <pve-***@pve.proxmox.com>
>>>> Envoyé: Jeudi 4 Octobre 2018 22:05:16
>>>> Objet: [PVE-User] Proxmox CEPH 6 servers failures!
>>>>
>>>> Hi there
>>>>
>>>> I have something like this:
>>>>
>>>> CEPH01 ----|
>>>> |----- CEPH04
>>>> |
>>>> |
>>>> CEPH02 ----|-----------------------------------------------------|----
>>>> CEPH05
>>>> | Optic Fiber
>>>> |
>>>> CEPH03 ----|
>>>> |--- CEPH06
>>>>
>>>> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and CEPH03
>>>> remains, the entire cluster fail!
>>>> I find out the cause!
>>>>
>>>> ceph.conf
>>>>
>>>> [global] auth client required = cephx auth cluster required = cephx auth
>>>> service required = cephx cluster network = 10.10.10.0/24 fsid =
>>>> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
>>>> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true osd
>>>> journal size = 5120 osd pool default min size = 2 osd pool default size
>> =
>>>> 3
>>>> public network = 10.10.10.0/24 [osd] keyring =
>>>> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host = pve-ceph01
>> mon
>>>> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
>>>> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789 mon osd
>>>> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03 mon
>> addr
>>>> =
>>>> 10.10.10.120:6789 mon osd allow primary affinity = true
>> [mon.pve-ceph04]
>>>> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow primary
>>>> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
>>>> 10.10.10.140:6789 mon osd allow primary affinity = true
>> [mon.pve-ceph06]
>>>> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow primary
>>>> affinity = true
>>>>
>>>> Any help will be welcome!
>>>>
>>>> ---
>>>> Gilberto Nunes Ferreira
>>>>
>>>> (47) 3025-5907
>>>> (47) 99676-7530 - Whatsapp / Telegram
>>>>
>>>> Skype: gilberto.nunes36
>>>> _______________________________________________
>>>> pve-user mailing list
>>>> pve-***@pve.proxmox.com
>>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>>
>>>> _______________________________________________
>>>> pve-user mailing list
>>>> pve-***@pve.proxmox.com
>>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>>
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-***@pve.proxmox.com
>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> _______________________________________________
>> pve-user mailing list
>> pve-***@pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Alexandre DERUMIER

2018-10-05 12:43:27 UTC

Permalink

>>Nice.. Perhaps if I create a VM in Proxmox01 and Proxmox02, and join this
>>VM into Cluster Ceph, can I solve to quorum problem?

you could have 1VM (only one) , but you need to replicate it (syncronously) between the 2 proxmox server.
and you'll have a cluster hang until it's have failover.
maybe with drbd for example
here an old blog about this kind of setup
https://www.sebastien-han.fr/blog/2013/01/28/ceph-geo-replication-sort-of/

The only clean way, to have a ceph cluster multi site,
is to have 3 sites, with 1 monitor on each site. (and multiple network links between sites).

BTW, if your second site have a powerfailure, you're ceph cluster will hang too on 1st site.(you'll lost quorum too)

and you also need to configure the crushmap to get replication working correctly between the site.

----- Mail original -----
De: "Gilberto Nunes" <***@gmail.com>
À: "proxmoxve" <pve-***@pve.proxmox.com>
Envoyé: Vendredi 5 Octobre 2018 14:31:20
Objet: Re: [PVE-User] Proxmox CEPH 6 servers failures!

Nice.. Perhaps if I create a VM in Proxmox01 and Proxmox02, and join this
VM into Cluster Ceph, can I solve to quorum problem?
---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Em sex, 5 de out de 2018 às 09:23, dorsy <***@yahoo.com> escreveu:

> Your question has already been answered. You need majority to have quorum.
>
> On 2018. 10. 05. 14:10, Gilberto Nunes wrote:
> > Hi
> > Perhaps this can help:
> >
> > https://imageshack.com/a/img921/6208/X7ha8R.png
> >
> > I was thing about it, and perhaps if I deploy a VM in both side, with
> > Proxmox and add this VM to the CEPH cluster, maybe this can help!
> >
> > thanks
> > ---
> > Gilberto Nunes Ferreira
> >
> > (47) 3025-5907
> > (47) 99676-7530 - Whatsapp / Telegram
> >
> > Skype: gilberto.nunes36
> >
> >
> >
> >
> >
> > Em sex, 5 de out de 2018 às 03:55, Alexandre DERUMIER <
> ***@odiso.com>
> > escreveu:
> >
> >> Hi,
> >>
> >> Can you resend your schema, because it's impossible to read.
> >>
> >>
> >> but you need to have to quorum on monitor to have the cluster working.
> >>
> >>
> >> ----- Mail original -----
> >> De: "Gilberto Nunes" <***@gmail.com>
> >> À: "proxmoxve" <pve-***@pve.proxmox.com>
> >> Envoyé: Jeudi 4 Octobre 2018 22:05:16
> >> Objet: [PVE-User] Proxmox CEPH 6 servers failures!
> >>
> >> Hi there
> >>
> >> I have something like this:
> >>
> >> CEPH01 ----|
> >> |----- CEPH04
> >> |
> >> |
> >> CEPH02 ----|-----------------------------------------------------|----
> >> CEPH05
> >> | Optic Fiber
> >> |
> >> CEPH03 ----|
> >> |--- CEPH06
> >>
> >> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and CEPH03
> >> remains, the entire cluster fail!
> >> I find out the cause!
> >>
> >> ceph.conf
> >>
> >> [global] auth client required = cephx auth cluster required = cephx auth
> >> service required = cephx cluster network = 10.10.10.0/24 fsid =
> >> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
> >> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true osd
> >> journal size = 5120 osd pool default min size = 2 osd pool default size
> =
> >> 3
> >> public network = 10.10.10.0/24 [osd] keyring =
> >> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host = pve-ceph01
> mon
> >> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
> >> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789 mon osd
> >> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03 mon
> addr
> >> =
> >> 10.10.10.120:6789 mon osd allow primary affinity = true
> [mon.pve-ceph04]
> >> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow primary
> >> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
> >> 10.10.10.140:6789 mon osd allow primary affinity = true
> [mon.pve-ceph06]
> >> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow primary
> >> affinity = true
> >>
> >> Any help will be welcome!
> >>
> >> ---
> >> Gilberto Nunes Ferreira
> >>
> >> (47) 3025-5907
> >> (47) 99676-7530 - Whatsapp / Telegram
> >>
> >> Skype: gilberto.nunes36
> >> _______________________________________________
> >> pve-user mailing list
> >> pve-***@pve.proxmox.com
> >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >>
> >> _______________________________________________
> >> pve-user mailing list
> >> pve-***@pve.proxmox.com
> >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >>
> > _______________________________________________
> > pve-user mailing list
> > pve-***@pve.proxmox.com
> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
_______________________________________________
pve-user mailing list
pve-***@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Marcus Haarmann

2018-10-05 12:44:36 UTC

Permalink

Gilberto,

the underlying problem is a ceph problem and not related to VMs or Proxmox.
The ceph system requires a mayority of monitor nodes to be active.
Your setup seems to have 3 mon nodes, which results in a loss of quorum
when two of these servers are gone.
Check "ceph -s" on each side if you see any reaction of ceph.
If not, probably not enough mons are present.

Also, when one side is down you should see a non-presence of some OSD instances.
In this case, ceph might be up but your VMs which are spread over the OSD disks,
might block because of the non-accessibility of the primary storage.
The distribution of data over the OSD instances is steered by the crush map.
You should make sure to have enough copies configured and the crush map set up in a way
that on each side of your cluster is minimum one copy.
In case the crush map is mis-configured, all copies of your data may be on the wrong side,
esulting in proxmox not being able to access the VM data.

Marcus Haarmann

Von: "Gilberto Nunes" <***@gmail.com>
An: "pve-user" <pve-***@pve.proxmox.com>
Gesendet: Freitag, 5. Oktober 2018 14:31:20
Betreff: Re: [PVE-User] Proxmox CEPH 6 servers failures!

Nice.. Perhaps if I create a VM in Proxmox01 and Proxmox02, and join this
VM into Cluster Ceph, can I solve to quorum problem?
---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Em sex, 5 de out de 2018 às 09:23, dorsy <***@yahoo.com> escreveu:

> Your question has already been answered. You need majority to have quorum.
>
> On 2018. 10. 05. 14:10, Gilberto Nunes wrote:
> > Hi
> > Perhaps this can help:
> >
> > https://imageshack.com/a/img921/6208/X7ha8R.png
> >
> > I was thing about it, and perhaps if I deploy a VM in both side, with
> > Proxmox and add this VM to the CEPH cluster, maybe this can help!
> >
> > thanks
> > ---
> > Gilberto Nunes Ferreira
> >
> > (47) 3025-5907
> > (47) 99676-7530 - Whatsapp / Telegram
> >
> > Skype: gilberto.nunes36
> >
> >
> >
> >
> >
> > Em sex, 5 de out de 2018 às 03:55, Alexandre DERUMIER <
> ***@odiso.com>
> > escreveu:
> >
> >> Hi,
> >>
> >> Can you resend your schema, because it's impossible to read.
> >>
> >>
> >> but you need to have to quorum on monitor to have the cluster working.
> >>
> >>
> >> ----- Mail original -----
> >> De: "Gilberto Nunes" <***@gmail.com>
> >> À: "proxmoxve" <pve-***@pve.proxmox.com>
> >> Envoyé: Jeudi 4 Octobre 2018 22:05:16
> >> Objet: [PVE-User] Proxmox CEPH 6 servers failures!
> >>
> >> Hi there
> >>
> >> I have something like this:
> >>
> >> CEPH01 ----|
> >> |----- CEPH04
> >> |
> >> |
> >> CEPH02 ----|-----------------------------------------------------|----
> >> CEPH05
> >> | Optic Fiber
> >> |
> >> CEPH03 ----|
> >> |--- CEPH06
> >>
> >> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and CEPH03
> >> remains, the entire cluster fail!
> >> I find out the cause!
> >>
> >> ceph.conf
> >>
> >> [global] auth client required = cephx auth cluster required = cephx auth
> >> service required = cephx cluster network = 10.10.10.0/24 fsid =
> >> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
> >> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true osd
> >> journal size = 5120 osd pool default min size = 2 osd pool default size
> =
> >> 3
> >> public network = 10.10.10.0/24 [osd] keyring =
> >> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host = pve-ceph01
> mon
> >> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
> >> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789 mon osd
> >> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03 mon
> addr
> >> =
> >> 10.10.10.120:6789 mon osd allow primary affinity = true
> [mon.pve-ceph04]
> >> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow primary
> >> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
> >> 10.10.10.140:6789 mon osd allow primary affinity = true
> [mon.pve-ceph06]
> >> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow primary
> >> affinity = true
> >>
> >> Any help will be welcome!
> >>
> >> ---
> >> Gilberto Nunes Ferreira
> >>
> >> (47) 3025-5907
> >> (47) 99676-7530 - Whatsapp / Telegram
> >>
> >> Skype: gilberto.nunes36
> >> _______________________________________________
> >> pve-user mailing list
> >> pve-***@pve.proxmox.com
> >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >>
> >> _______________________________________________
> >> pve-user mailing list
> >> pve-***@pve.proxmox.com
> >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >>
> > _______________________________________________
> > pve-user mailing list
> > pve-***@pve.proxmox.com
> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
_______________________________________________
pve-user mailing list
pve-***@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Gilberto Nunes

2018-10-05 12:53:48 UTC

Permalink

Folks...

I CEPH servers are in the same network: 10.10.10.0/24...
There is a optic channel between the builds: buildA and buildB, just to
identified!
When I create the cluster in first time, 3 servers going down in buildB,
and the remain ceph servers continued to worked properly...
I do not understand why now this cant happens anymore!
Sorry if I sound like a newbie! I still learn about it!
---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Em sex, 5 de out de 2018 às 09:44, Marcus Haarmann <
***@midoco.de> escreveu:

> Gilberto,
>
> the underlying problem is a ceph problem and not related to VMs or
> Proxmox.
> The ceph system requires a mayority of monitor nodes to be active.
> Your setup seems to have 3 mon nodes, which results in a loss of quorum
> when two of these servers are gone.
> Check "ceph -s" on each side if you see any reaction of ceph.
> If not, probably not enough mons are present.
>
> Also, when one side is down you should see a non-presence of some OSD
> instances.
> In this case, ceph might be up but your VMs which are spread over the OSD
> disks,
> might block because of the non-accessibility of the primary storage.
> The distribution of data over the OSD instances is steered by the crush
> map.
> You should make sure to have enough copies configured and the crush map
> set up in a way
> that on each side of your cluster is minimum one copy.
> In case the crush map is mis-configured, all copies of your data may be on
> the wrong side,
> esulting in proxmox not being able to access the VM data.
>
> Marcus Haarmann
>
>
> Von: "Gilberto Nunes" <***@gmail.com>
> An: "pve-user" <pve-***@pve.proxmox.com>
> Gesendet: Freitag, 5. Oktober 2018 14:31:20
> Betreff: Re: [PVE-User] Proxmox CEPH 6 servers failures!
>
> Nice.. Perhaps if I create a VM in Proxmox01 and Proxmox02, and join this
> VM into Cluster Ceph, can I solve to quorum problem?
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
>
>
>
>
>
> Em sex, 5 de out de 2018 às 09:23, dorsy <***@yahoo.com> escreveu:
>
> > Your question has already been answered. You need majority to have
> quorum.
> >
> > On 2018. 10. 05. 14:10, Gilberto Nunes wrote:
> > > Hi
> > > Perhaps this can help:
> > >
> > > https://imageshack.com/a/img921/6208/X7ha8R.png
> > >
> > > I was thing about it, and perhaps if I deploy a VM in both side, with
> > > Proxmox and add this VM to the CEPH cluster, maybe this can help!
> > >
> > > thanks
> > > ---
> > > Gilberto Nunes Ferreira
> > >
> > > (47) 3025-5907
> > > (47) 99676-7530 - Whatsapp / Telegram
> > >
> > > Skype: gilberto.nunes36
> > >
> > >
> > >
> > >
> > >
> > > Em sex, 5 de out de 2018 às 03:55, Alexandre DERUMIER <
> > ***@odiso.com>
> > > escreveu:
> > >
> > >> Hi,
> > >>
> > >> Can you resend your schema, because it's impossible to read.
> > >>
> > >>
> > >> but you need to have to quorum on monitor to have the cluster
> working.
> > >>
> > >>
> > >> ----- Mail original -----
> > >> De: "Gilberto Nunes" <***@gmail.com>
> > >> À: "proxmoxve" <pve-***@pve.proxmox.com>
> > >> Envoyé: Jeudi 4 Octobre 2018 22:05:16
> > >> Objet: [PVE-User] Proxmox CEPH 6 servers failures!
> > >>
> > >> Hi there
> > >>
> > >> I have something like this:
> > >>
> > >> CEPH01 ----|
> > >> |----- CEPH04
> > >> |
> > >> |
> > >> CEPH02
> ----|-----------------------------------------------------|----
> > >> CEPH05
> > >> | Optic Fiber
> > >> |
> > >> CEPH03 ----|
> > >> |--- CEPH06
> > >>
> > >> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and
> CEPH03
> > >> remains, the entire cluster fail!
> > >> I find out the cause!
> > >>
> > >> ceph.conf
> > >>
> > >> [global] auth client required = cephx auth cluster required = cephx
> auth
> > >> service required = cephx cluster network = 10.10.10.0/24 fsid =
> > >> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
> > >> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true osd
> > >> journal size = 5120 osd pool default min size = 2 osd pool default
> size
> > =
> > >> 3
> > >> public network = 10.10.10.0/24 [osd] keyring =
> > >> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host = pve-ceph01
> > mon
> > >> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
> > >> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789 mon
> osd
> > >> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03 mon
> > addr
> > >> =
> > >> 10.10.10.120:6789 mon osd allow primary affinity = true
> > [mon.pve-ceph04]
> > >> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow primary
> > >> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
> > >> 10.10.10.140:6789 mon osd allow primary affinity = true
> > [mon.pve-ceph06]
> > >> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow primary
> > >> affinity = true
> > >>
> > >> Any help will be welcome!
> > >>
> > >> ---
> > >> Gilberto Nunes Ferreira
> > >>
> > >> (47) 3025-5907
> > >> (47) 99676-7530 - Whatsapp / Telegram
> > >>
> > >> Skype: gilberto.nunes36
> > >> _______________________________________________
> > >> pve-user mailing list
> > >> pve-***@pve.proxmox.com
> > >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> > >>
> > >> _______________________________________________
> > >> pve-user mailing list
> > >> pve-***@pve.proxmox.com
> > >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> > >>
> > > _______________________________________________
> > > pve-user mailing list
> > > pve-***@pve.proxmox.com
> > > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> > _______________________________________________
> > pve-user mailing list
> > pve-***@pve.proxmox.com
> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>

Gilberto Nunes

2018-10-05 13:08:24 UTC

Permalink

Ok! Now I get it!
pvecm show me
pve-ceph01:/etc/pve# pvecm status
Quorum information
------------------
Date: Fri Oct 5 10:04:57 2018
Quorum provider: corosync_votequorum
Nodes: 6
Node ID: 0x00000001
Ring ID: 1/32764
Quorate: Yes

Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 6
Quorum: 4
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.10.10.100 (local)
0x00000002 1 10.10.10.110
0x00000003 1 10.10.10.120
0x00000004 1 10.10.10.130
0x00000005 1 10.10.10.140
0x00000006 1 10.10.10.150

*Quorum: 4*
So I need 4 server online, at least!
Now when I loose 3 of 6, I remain, of course, just with 3 and not with 4,
which is required...
I will request new server to make quorum. Thanks for clarify this situation!
---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Em sex, 5 de out de 2018 às 09:53, Gilberto Nunes <
***@gmail.com> escreveu:

> Folks...
>
> I CEPH servers are in the same network: 10.10.10.0/24...
> There is a optic channel between the builds: buildA and buildB, just to
> identified!
> When I create the cluster in first time, 3 servers going down in buildB,
> and the remain ceph servers continued to worked properly...
> I do not understand why now this cant happens anymore!
> Sorry if I sound like a newbie! I still learn about it!
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
>
>
>
>
>
> Em sex, 5 de out de 2018 às 09:44, Marcus Haarmann <
> ***@midoco.de> escreveu:
>
>> Gilberto,
>>
>> the underlying problem is a ceph problem and not related to VMs or
>> Proxmox.
>> The ceph system requires a mayority of monitor nodes to be active.
>> Your setup seems to have 3 mon nodes, which results in a loss of quorum
>> when two of these servers are gone.
>> Check "ceph -s" on each side if you see any reaction of ceph.
>> If not, probably not enough mons are present.
>>
>> Also, when one side is down you should see a non-presence of some OSD
>> instances.
>> In this case, ceph might be up but your VMs which are spread over the OSD
>> disks,
>> might block because of the non-accessibility of the primary storage.
>> The distribution of data over the OSD instances is steered by the crush
>> map.
>> You should make sure to have enough copies configured and the crush map
>> set up in a way
>> that on each side of your cluster is minimum one copy.
>> In case the crush map is mis-configured, all copies of your data may be
>> on the wrong side,
>> esulting in proxmox not being able to access the VM data.
>>
>> Marcus Haarmann
>>
>>
>> Von: "Gilberto Nunes" <***@gmail.com>
>> An: "pve-user" <pve-***@pve.proxmox.com>
>> Gesendet: Freitag, 5. Oktober 2018 14:31:20
>> Betreff: Re: [PVE-User] Proxmox CEPH 6 servers failures!
>>
>> Nice.. Perhaps if I create a VM in Proxmox01 and Proxmox02, and join this
>> VM into Cluster Ceph, can I solve to quorum problem?
>> ---
>> Gilberto Nunes Ferreira
>>
>> (47) 3025-5907
>> (47) 99676-7530 - Whatsapp / Telegram
>>
>> Skype: gilberto.nunes36
>>
>>
>>
>>
>>
>> Em sex, 5 de out de 2018 às 09:23, dorsy <***@yahoo.com> escreveu:
>>
>> > Your question has already been answered. You need majority to have
>> quorum.
>> >
>> > On 2018. 10. 05. 14:10, Gilberto Nunes wrote:
>> > > Hi
>> > > Perhaps this can help:
>> > >
>> > > https://imageshack.com/a/img921/6208/X7ha8R.png
>> > >
>> > > I was thing about it, and perhaps if I deploy a VM in both side, with
>> > > Proxmox and add this VM to the CEPH cluster, maybe this can help!
>> > >
>> > > thanks
>> > > ---
>> > > Gilberto Nunes Ferreira
>> > >
>> > > (47) 3025-5907
>> > > (47) 99676-7530 - Whatsapp / Telegram
>> > >
>> > > Skype: gilberto.nunes36
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > Em sex, 5 de out de 2018 às 03:55, Alexandre DERUMIER <
>> > ***@odiso.com>
>> > > escreveu:
>> > >
>> > >> Hi,
>> > >>
>> > >> Can you resend your schema, because it's impossible to read.
>> > >>
>> > >>
>> > >> but you need to have to quorum on monitor to have the cluster
>> working.
>> > >>
>> > >>
>> > >> ----- Mail original -----
>> > >> De: "Gilberto Nunes" <***@gmail.com>
>> > >> À: "proxmoxve" <pve-***@pve.proxmox.com>
>> > >> Envoyé: Jeudi 4 Octobre 2018 22:05:16
>> > >> Objet: [PVE-User] Proxmox CEPH 6 servers failures!
>> > >>
>> > >> Hi there
>> > >>
>> > >> I have something like this:
>> > >>
>> > >> CEPH01 ----|
>> > >> |----- CEPH04
>> > >> |
>> > >> |
>> > >> CEPH02
>> ----|-----------------------------------------------------|----
>> > >> CEPH05
>> > >> | Optic Fiber
>> > >> |
>> > >> CEPH03 ----|
>> > >> |--- CEPH06
>> > >>
>> > >> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and
>> CEPH03
>> > >> remains, the entire cluster fail!
>> > >> I find out the cause!
>> > >>
>> > >> ceph.conf
>> > >>
>> > >> [global] auth client required = cephx auth cluster required = cephx
>> auth
>> > >> service required = cephx cluster network = 10.10.10.0/24 fsid =
>> > >> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
>> > >> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true
>> osd
>> > >> journal size = 5120 osd pool default min size = 2 osd pool default
>> size
>> > =
>> > >> 3
>> > >> public network = 10.10.10.0/24 [osd] keyring =
>> > >> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host =
>> pve-ceph01
>> > mon
>> > >> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
>> > >> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789 mon
>> osd
>> > >> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03 mon
>> > addr
>> > >> =
>> > >> 10.10.10.120:6789 mon osd allow primary affinity = true
>> > [mon.pve-ceph04]
>> > >> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow
>> primary
>> > >> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
>> > >> 10.10.10.140:6789 mon osd allow primary affinity = true
>> > [mon.pve-ceph06]
>> > >> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow
>> primary
>> > >> affinity = true
>> > >>
>> > >> Any help will be welcome!
>> > >>
>> > >> ---
>> > >> Gilberto Nunes Ferreira
>> > >>
>> > >> (47) 3025-5907
>> > >> (47) 99676-7530 - Whatsapp / Telegram
>> > >>
>> > >> Skype: gilberto.nunes36
>> > >> _______________________________________________
>> > >> pve-user mailing list
>> > >> pve-***@pve.proxmox.com
>> > >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> > >>
>> > >> _______________________________________________
>> > >> pve-user mailing list
>> > >> pve-***@pve.proxmox.com
>> > >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> > >>
>> > > _______________________________________________
>> > > pve-user mailing list
>> > > pve-***@pve.proxmox.com
>> > > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> > _______________________________________________
>> > pve-user mailing list
>> > pve-***@pve.proxmox.com
>> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> >
>> _______________________________________________
>> pve-user mailing list
>> pve-***@pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> _______________________________________________
>> pve-user mailing list
>> pve-***@pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>

Gilberto Nunes

2018-10-05 13:37:31 UTC

Permalink

And what if the same hardware that have Proxmox with running VM make a part
of ceph cluster?? it could work?
---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Em sex, 5 de out de 2018 às 10:08, Gilberto Nunes <
***@gmail.com> escreveu:

> Ok! Now I get it!
> pvecm show me
> pve-ceph01:/etc/pve# pvecm status
> Quorum information
> ------------------
> Date: Fri Oct 5 10:04:57 2018
> Quorum provider: corosync_votequorum
> Nodes: 6
> Node ID: 0x00000001
> Ring ID: 1/32764
> Quorate: Yes
>
> Votequorum information
> ----------------------
> Expected votes: 6
> Highest expected: 6
> Total votes: 6
> Quorum: 4
> Flags: Quorate
>
> Membership information
> ----------------------
> Nodeid Votes Name
> 0x00000001 1 10.10.10.100 (local)
> 0x00000002 1 10.10.10.110
> 0x00000003 1 10.10.10.120
> 0x00000004 1 10.10.10.130
> 0x00000005 1 10.10.10.140
> 0x00000006 1 10.10.10.150
>
> *Quorum: 4*
> So I need 4 server online, at least!
> Now when I loose 3 of 6, I remain, of course, just with 3 and not with 4,
> which is required...
> I will request new server to make quorum. Thanks for clarify this
> situation!
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
>
>
>
>
>
> Em sex, 5 de out de 2018 às 09:53, Gilberto Nunes <
> ***@gmail.com> escreveu:
>
>> Folks...
>>
>> I CEPH servers are in the same network: 10.10.10.0/24...
>> There is a optic channel between the builds: buildA and buildB, just to
>> identified!
>> When I create the cluster in first time, 3 servers going down in buildB,
>> and the remain ceph servers continued to worked properly...
>> I do not understand why now this cant happens anymore!
>> Sorry if I sound like a newbie! I still learn about it!
>> ---
>> Gilberto Nunes Ferreira
>>
>> (47) 3025-5907
>> (47) 99676-7530 - Whatsapp / Telegram
>>
>> Skype: gilberto.nunes36
>>
>>
>>
>>
>>
>> Em sex, 5 de out de 2018 às 09:44, Marcus Haarmann <
>> ***@midoco.de> escreveu:
>>
>>> Gilberto,
>>>
>>> the underlying problem is a ceph problem and not related to VMs or
>>> Proxmox.
>>> The ceph system requires a mayority of monitor nodes to be active.
>>> Your setup seems to have 3 mon nodes, which results in a loss of quorum
>>> when two of these servers are gone.
>>> Check "ceph -s" on each side if you see any reaction of ceph.
>>> If not, probably not enough mons are present.
>>>
>>> Also, when one side is down you should see a non-presence of some OSD
>>> instances.
>>> In this case, ceph might be up but your VMs which are spread over the
>>> OSD disks,
>>> might block because of the non-accessibility of the primary storage.
>>> The distribution of data over the OSD instances is steered by the crush
>>> map.
>>> You should make sure to have enough copies configured and the crush map
>>> set up in a way
>>> that on each side of your cluster is minimum one copy.
>>> In case the crush map is mis-configured, all copies of your data may be
>>> on the wrong side,
>>> esulting in proxmox not being able to access the VM data.
>>>
>>> Marcus Haarmann
>>>
>>>
>>> Von: "Gilberto Nunes" <***@gmail.com>
>>> An: "pve-user" <pve-***@pve.proxmox.com>
>>> Gesendet: Freitag, 5. Oktober 2018 14:31:20
>>> Betreff: Re: [PVE-User] Proxmox CEPH 6 servers failures!
>>>
>>> Nice.. Perhaps if I create a VM in Proxmox01 and Proxmox02, and join
>>> this
>>> VM into Cluster Ceph, can I solve to quorum problem?
>>> ---
>>> Gilberto Nunes Ferreira
>>>
>>> (47) 3025-5907
>>> (47) 99676-7530 - Whatsapp / Telegram
>>>
>>> Skype: gilberto.nunes36
>>>
>>>
>>>
>>>
>>>
>>> Em sex, 5 de out de 2018 às 09:23, dorsy <***@yahoo.com> escreveu:
>>>
>>> > Your question has already been answered. You need majority to have
>>> quorum.
>>> >
>>> > On 2018. 10. 05. 14:10, Gilberto Nunes wrote:
>>> > > Hi
>>> > > Perhaps this can help:
>>> > >
>>> > > https://imageshack.com/a/img921/6208/X7ha8R.png
>>> > >
>>> > > I was thing about it, and perhaps if I deploy a VM in both side,
>>> with
>>> > > Proxmox and add this VM to the CEPH cluster, maybe this can help!
>>> > >
>>> > > thanks
>>> > > ---
>>> > > Gilberto Nunes Ferreira
>>> > >
>>> > > (47) 3025-5907
>>> > > (47) 99676-7530 - Whatsapp / Telegram
>>> > >
>>> > > Skype: gilberto.nunes36
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > Em sex, 5 de out de 2018 às 03:55, Alexandre DERUMIER <
>>> > ***@odiso.com>
>>> > > escreveu:
>>> > >
>>> > >> Hi,
>>> > >>
>>> > >> Can you resend your schema, because it's impossible to read.
>>> > >>
>>> > >>
>>> > >> but you need to have to quorum on monitor to have the cluster
>>> working.
>>> > >>
>>> > >>
>>> > >> ----- Mail original -----
>>> > >> De: "Gilberto Nunes" <***@gmail.com>
>>> > >> À: "proxmoxve" <pve-***@pve.proxmox.com>
>>> > >> Envoyé: Jeudi 4 Octobre 2018 22:05:16
>>> > >> Objet: [PVE-User] Proxmox CEPH 6 servers failures!
>>> > >>
>>> > >> Hi there
>>> > >>
>>> > >> I have something like this:
>>> > >>
>>> > >> CEPH01 ----|
>>> > >> |----- CEPH04
>>> > >> |
>>> > >> |
>>> > >> CEPH02
>>> ----|-----------------------------------------------------|----
>>> > >> CEPH05
>>> > >> | Optic Fiber
>>> > >> |
>>> > >> CEPH03 ----|
>>> > >> |--- CEPH06
>>> > >>
>>> > >> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and
>>> CEPH03
>>> > >> remains, the entire cluster fail!
>>> > >> I find out the cause!
>>> > >>
>>> > >> ceph.conf
>>> > >>
>>> > >> [global] auth client required = cephx auth cluster required = cephx
>>> auth
>>> > >> service required = cephx cluster network = 10.10.10.0/24 fsid =
>>> > >> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
>>> > >> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true
>>> osd
>>> > >> journal size = 5120 osd pool default min size = 2 osd pool default
>>> size
>>> > =
>>> > >> 3
>>> > >> public network = 10.10.10.0/24 [osd] keyring =
>>> > >> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host =
>>> pve-ceph01
>>> > mon
>>> > >> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
>>> > >> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789
>>> mon osd
>>> > >> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03
>>> mon
>>> > addr
>>> > >> =
>>> > >> 10.10.10.120:6789 mon osd allow primary affinity = true
>>> > [mon.pve-ceph04]
>>> > >> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow
>>> primary
>>> > >> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
>>> > >> 10.10.10.140:6789 mon osd allow primary affinity = true
>>> > [mon.pve-ceph06]
>>> > >> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow
>>> primary
>>> > >> affinity = true
>>> > >>
>>> > >> Any help will be welcome!
>>> > >>
>>> > >> ---
>>> > >> Gilberto Nunes Ferreira
>>> > >>
>>> > >> (47) 3025-5907
>>> > >> (47) 99676-7530 - Whatsapp / Telegram
>>> > >>
>>> > >> Skype: gilberto.nunes36
>>> > >> _______________________________________________
>>> > >> pve-user mailing list
>>> > >> pve-***@pve.proxmox.com
>>> > >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>> > >>
>>> > >> _______________________________________________
>>> > >> pve-user mailing list
>>> > >> pve-***@pve.proxmox.com
>>> > >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>> > >>
>>> > > _______________________________________________
>>> > > pve-user mailing list
>>> > > pve-***@pve.proxmox.com
>>> > > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>> > _______________________________________________
>>> > pve-user mailing list
>>> > pve-***@pve.proxmox.com
>>> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>> >
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-***@pve.proxmox.com
>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-***@pve.proxmox.com
>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>
>>

Marcus Haarmann

2018-10-05 14:45:56 UTC

Permalink

This is corosync you are talking about. Also there, a quorum is needed to work properly.
It needs to be configured in the same way as ceph.
You will always need a majority (e.g 4 out of 6, 3 out of 6 wont do).

You main problem can be that you might lose one location and the part which has the majority of servers
is down.
In my opinion, in your situation a 7th server would get you to 7 active servers, 4 needed,
so 3 can be offline (remember to check your crush map so you will have a working ceph cluster
on the remaining servers).
Depending on which side is getting offline, only one side will be able to operate without the other,
but the other side won't.

Marcus Haarmann

Von: "Gilberto Nunes" <***@gmail.com>
An: "pve-user" <pve-***@pve.proxmox.com>
Gesendet: Freitag, 5. Oktober 2018 15:08:24
Betreff: Re: [PVE-User] Proxmox CEPH 6 servers failures!

Ok! Now I get it!
pvecm show me
pve-ceph01:/etc/pve# pvecm status
Quorum information
------------------
Date: Fri Oct 5 10:04:57 2018
Quorum provider: corosync_votequorum
Nodes: 6
Node ID: 0x00000001
Ring ID: 1/32764
Quorate: Yes

Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 6
Quorum: 4
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.10.10.100 (local)
0x00000002 1 10.10.10.110
0x00000003 1 10.10.10.120
0x00000004 1 10.10.10.130
0x00000005 1 10.10.10.140
0x00000006 1 10.10.10.150

*Quorum: 4*
So I need 4 server online, at least!
Now when I loose 3 of 6, I remain, of course, just with 3 and not with 4,
which is required...
I will request new server to make quorum. Thanks for clarify this situation!
---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Em sex, 5 de out de 2018 às 09:53, Gilberto Nunes <
***@gmail.com> escreveu:

> Folks...
>
> I CEPH servers are in the same network: 10.10.10.0/24...
> There is a optic channel between the builds: buildA and buildB, just to
> identified!
> When I create the cluster in first time, 3 servers going down in buildB,
> and the remain ceph servers continued to worked properly...
> I do not understand why now this cant happens anymore!
> Sorry if I sound like a newbie! I still learn about it!
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
>
>
>
>
>
> Em sex, 5 de out de 2018 às 09:44, Marcus Haarmann <
> ***@midoco.de> escreveu:
>
>> Gilberto,
>>
>> the underlying problem is a ceph problem and not related to VMs or
>> Proxmox.
>> The ceph system requires a mayority of monitor nodes to be active.
>> Your setup seems to have 3 mon nodes, which results in a loss of quorum
>> when two of these servers are gone.
>> Check "ceph -s" on each side if you see any reaction of ceph.
>> If not, probably not enough mons are present.
>>
>> Also, when one side is down you should see a non-presence of some OSD
>> instances.
>> In this case, ceph might be up but your VMs which are spread over the OSD
>> disks,
>> might block because of the non-accessibility of the primary storage.
>> The distribution of data over the OSD instances is steered by the crush
>> map.
>> You should make sure to have enough copies configured and the crush map
>> set up in a way
>> that on each side of your cluster is minimum one copy.
>> In case the crush map is mis-configured, all copies of your data may be
>> on the wrong side,
>> esulting in proxmox not being able to access the VM data.
>>
>> Marcus Haarmann
>>
>>
>> Von: "Gilberto Nunes" <***@gmail.com>
>> An: "pve-user" <pve-***@pve.proxmox.com>
>> Gesendet: Freitag, 5. Oktober 2018 14:31:20
>> Betreff: Re: [PVE-User] Proxmox CEPH 6 servers failures!
>>
>> Nice.. Perhaps if I create a VM in Proxmox01 and Proxmox02, and join this
>> VM into Cluster Ceph, can I solve to quorum problem?
>> ---
>> Gilberto Nunes Ferreira
>>
>> (47) 3025-5907
>> (47) 99676-7530 - Whatsapp / Telegram
>>
>> Skype: gilberto.nunes36
>>
>>
>>
>>
>>
>> Em sex, 5 de out de 2018 às 09:23, dorsy <***@yahoo.com> escreveu:
>>
>> > Your question has already been answered. You need majority to have
>> quorum.
>> >
>> > On 2018. 10. 05. 14:10, Gilberto Nunes wrote:
>> > > Hi
>> > > Perhaps this can help:
>> > >
>> > > https://imageshack.com/a/img921/6208/X7ha8R.png
>> > >
>> > > I was thing about it, and perhaps if I deploy a VM in both side, with
>> > > Proxmox and add this VM to the CEPH cluster, maybe this can help!
>> > >
>> > > thanks
>> > > ---
>> > > Gilberto Nunes Ferreira
>> > >
>> > > (47) 3025-5907
>> > > (47) 99676-7530 - Whatsapp / Telegram
>> > >
>> > > Skype: gilberto.nunes36
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > Em sex, 5 de out de 2018 às 03:55, Alexandre DERUMIER <
>> > ***@odiso.com>
>> > > escreveu:
>> > >
>> > >> Hi,
>> > >>
>> > >> Can you resend your schema, because it's impossible to read.
>> > >>
>> > >>
>> > >> but you need to have to quorum on monitor to have the cluster
>> working.
>> > >>
>> > >>
>> > >> ----- Mail original -----
>> > >> De: "Gilberto Nunes" <***@gmail.com>
>> > >> À: "proxmoxve" <pve-***@pve.proxmox.com>
>> > >> Envoyé: Jeudi 4 Octobre 2018 22:05:16
>> > >> Objet: [PVE-User] Proxmox CEPH 6 servers failures!
>> > >>
>> > >> Hi there
>> > >>
>> > >> I have something like this:
>> > >>
>> > >> CEPH01 ----|
>> > >> |----- CEPH04
>> > >> |
>> > >> |
>> > >> CEPH02
>> ----|-----------------------------------------------------|----
>> > >> CEPH05
>> > >> | Optic Fiber
>> > >> |
>> > >> CEPH03 ----|
>> > >> |--- CEPH06
>> > >>
>> > >> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and
>> CEPH03
>> > >> remains, the entire cluster fail!
>> > >> I find out the cause!
>> > >>
>> > >> ceph.conf
>> > >>
>> > >> [global] auth client required = cephx auth cluster required = cephx
>> auth
>> > >> service required = cephx cluster network = 10.10.10.0/24 fsid =
>> > >> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
>> > >> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true
>> osd
>> > >> journal size = 5120 osd pool default min size = 2 osd pool default
>> size
>> > =
>> > >> 3
>> > >> public network = 10.10.10.0/24 [osd] keyring =
>> > >> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host =
>> pve-ceph01
>> > mon
>> > >> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
>> > >> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789 mon
>> osd
>> > >> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03 mon
>> > addr
>> > >> =
>> > >> 10.10.10.120:6789 mon osd allow primary affinity = true
>> > [mon.pve-ceph04]
>> > >> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow
>> primary
>> > >> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
>> > >> 10.10.10.140:6789 mon osd allow primary affinity = true
>> > [mon.pve-ceph06]
>> > >> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow
>> primary
>> > >> affinity = true
>> > >>
>> > >> Any help will be welcome!
>> > >>
>> > >> ---
>> > >> Gilberto Nunes Ferreira
>> > >>
>> > >> (47) 3025-5907
>> > >> (47) 99676-7530 - Whatsapp / Telegram
>> > >>
>> > >> Skype: gilberto.nunes36
>> > >> _______________________________________________
>> > >> pve-user mailing list
>> > >> pve-***@pve.proxmox.com
>> > >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> > >>
>> > >> _______________________________________________
>> > >> pve-user mailing list
>> > >> pve-***@pve.proxmox.com
>> > >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> > >>
>> > > _______________________________________________
>> > > pve-user mailing list
>> > > pve-***@pve.proxmox.com
>> > > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> > _______________________________________________
>> > pve-user mailing list
>> > pve-***@pve.proxmox.com
>> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> >
>> _______________________________________________
>> pve-user mailing list
>> pve-***@pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> _______________________________________________
>> pve-user mailing list
>> pve-***@pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>
_______________________________________________
pve-user mailing list
pve-***@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Gilberto Nunes

2018-10-05 15:48:24 UTC

Permalink

I have 6 monitors.
What if I reduce it to 5? Or 4? Would help??
---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Em sex, 5 de out de 2018 às 11:46, Marcus Haarmann <
***@midoco.de> escreveu:

> This is corosync you are talking about. Also there, a quorum is needed to
> work properly.
> It needs to be configured in the same way as ceph.
> You will always need a majority (e.g 4 out of 6, 3 out of 6 wont do).
>
> You main problem can be that you might lose one location and the part
> which has the majority of servers
> is down.
> In my opinion, in your situation a 7th server would get you to 7 active
> servers, 4 needed,
> so 3 can be offline (remember to check your crush map so you will have a
> working ceph cluster
> on the remaining servers).
> Depending on which side is getting offline, only one side will be able to
> operate without the other,
> but the other side won't.
>
> Marcus Haarmann
>
>
> Von: "Gilberto Nunes" <***@gmail.com>
> An: "pve-user" <pve-***@pve.proxmox.com>
> Gesendet: Freitag, 5. Oktober 2018 15:08:24
> Betreff: Re: [PVE-User] Proxmox CEPH 6 servers failures!
>
> Ok! Now I get it!
> pvecm show me
> pve-ceph01:/etc/pve# pvecm status
> Quorum information
> ------------------
> Date: Fri Oct 5 10:04:57 2018
> Quorum provider: corosync_votequorum
> Nodes: 6
> Node ID: 0x00000001
> Ring ID: 1/32764
> Quorate: Yes
>
> Votequorum information
> ----------------------
> Expected votes: 6
> Highest expected: 6
> Total votes: 6
> Quorum: 4
> Flags: Quorate
>
> Membership information
> ----------------------
> Nodeid Votes Name
> 0x00000001 1 10.10.10.100 (local)
> 0x00000002 1 10.10.10.110
> 0x00000003 1 10.10.10.120
> 0x00000004 1 10.10.10.130
> 0x00000005 1 10.10.10.140
> 0x00000006 1 10.10.10.150
>
> *Quorum: 4*
> So I need 4 server online, at least!
> Now when I loose 3 of 6, I remain, of course, just with 3 and not with 4,
> which is required...
> I will request new server to make quorum. Thanks for clarify this
> situation!
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
>
>
>
>
>
> Em sex, 5 de out de 2018 às 09:53, Gilberto Nunes <
> ***@gmail.com> escreveu:
>
> > Folks...
> >
> > I CEPH servers are in the same network: 10.10.10.0/24...
> > There is a optic channel between the builds: buildA and buildB, just to
> > identified!
> > When I create the cluster in first time, 3 servers going down in buildB,
> > and the remain ceph servers continued to worked properly...
> > I do not understand why now this cant happens anymore!
> > Sorry if I sound like a newbie! I still learn about it!
> > ---
> > Gilberto Nunes Ferreira
> >
> > (47) 3025-5907
> > (47) 99676-7530 - Whatsapp / Telegram
> >
> > Skype: gilberto.nunes36
> >
> >
> >
> >
> >
> > Em sex, 5 de out de 2018 às 09:44, Marcus Haarmann <
> > ***@midoco.de> escreveu:
> >
> >> Gilberto,
> >>
> >> the underlying problem is a ceph problem and not related to VMs or
> >> Proxmox.
> >> The ceph system requires a mayority of monitor nodes to be active.
> >> Your setup seems to have 3 mon nodes, which results in a loss of quorum
> >> when two of these servers are gone.
> >> Check "ceph -s" on each side if you see any reaction of ceph.
> >> If not, probably not enough mons are present.
> >>
> >> Also, when one side is down you should see a non-presence of some OSD
> >> instances.
> >> In this case, ceph might be up but your VMs which are spread over the
> OSD
> >> disks,
> >> might block because of the non-accessibility of the primary storage.
> >> The distribution of data over the OSD instances is steered by the crush
> >> map.
> >> You should make sure to have enough copies configured and the crush map
> >> set up in a way
> >> that on each side of your cluster is minimum one copy.
> >> In case the crush map is mis-configured, all copies of your data may be
> >> on the wrong side,
> >> esulting in proxmox not being able to access the VM data.
> >>
> >> Marcus Haarmann
> >>
> >>
> >> Von: "Gilberto Nunes" <***@gmail.com>
> >> An: "pve-user" <pve-***@pve.proxmox.com>
> >> Gesendet: Freitag, 5. Oktober 2018 14:31:20
> >> Betreff: Re: [PVE-User] Proxmox CEPH 6 servers failures!
> >>
> >> Nice.. Perhaps if I create a VM in Proxmox01 and Proxmox02, and join
> this
> >> VM into Cluster Ceph, can I solve to quorum problem?
> >> ---
> >> Gilberto Nunes Ferreira
> >>
> >> (47) 3025-5907
> >> (47) 99676-7530 - Whatsapp / Telegram
> >>
> >> Skype: gilberto.nunes36
> >>
> >>
> >>
> >>
> >>
> >> Em sex, 5 de out de 2018 às 09:23, dorsy <***@yahoo.com> escreveu:
> >>
> >> > Your question has already been answered. You need majority to have
> >> quorum.
> >> >
> >> > On 2018. 10. 05. 14:10, Gilberto Nunes wrote:
> >> > > Hi
> >> > > Perhaps this can help:
> >> > >
> >> > > https://imageshack.com/a/img921/6208/X7ha8R.png
> >> > >
> >> > > I was thing about it, and perhaps if I deploy a VM in both side,
> with
> >> > > Proxmox and add this VM to the CEPH cluster, maybe this can help!
> >> > >
> >> > > thanks
> >> > > ---
> >> > > Gilberto Nunes Ferreira
> >> > >
> >> > > (47) 3025-5907
> >> > > (47) 99676-7530 - Whatsapp / Telegram
> >> > >
> >> > > Skype: gilberto.nunes36
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > Em sex, 5 de out de 2018 às 03:55, Alexandre DERUMIER <
> >> > ***@odiso.com>
> >> > > escreveu:
> >> > >
> >> > >> Hi,
> >> > >>
> >> > >> Can you resend your schema, because it's impossible to read.
> >> > >>
> >> > >>
> >> > >> but you need to have to quorum on monitor to have the cluster
> >> working.
> >> > >>
> >> > >>
> >> > >> ----- Mail original -----
> >> > >> De: "Gilberto Nunes" <***@gmail.com>
> >> > >> À: "proxmoxve" <pve-***@pve.proxmox.com>
> >> > >> Envoyé: Jeudi 4 Octobre 2018 22:05:16
> >> > >> Objet: [PVE-User] Proxmox CEPH 6 servers failures!
> >> > >>
> >> > >> Hi there
> >> > >>
> >> > >> I have something like this:
> >> > >>
> >> > >> CEPH01 ----|
> >> > >> |----- CEPH04
> >> > >> |
> >> > >> |
> >> > >> CEPH02
> >> ----|-----------------------------------------------------|----
> >> > >> CEPH05
> >> > >> | Optic Fiber
> >> > >> |
> >> > >> CEPH03 ----|
> >> > >> |--- CEPH06
> >> > >>
> >> > >> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and
> >> CEPH03
> >> > >> remains, the entire cluster fail!
> >> > >> I find out the cause!
> >> > >>
> >> > >> ceph.conf
> >> > >>
> >> > >> [global] auth client required = cephx auth cluster required =
> cephx
> >> auth
> >> > >> service required = cephx cluster network = 10.10.10.0/24 fsid =
> >> > >> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
> >> > >> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true
> >> osd
> >> > >> journal size = 5120 osd pool default min size = 2 osd pool default
> >> size
> >> > =
> >> > >> 3
> >> > >> public network = 10.10.10.0/24 [osd] keyring =
> >> > >> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host =
> >> pve-ceph01
> >> > mon
> >> > >> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
> >> > >> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789
> mon
> >> osd
> >> > >> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03
> mon
> >> > addr
> >> > >> =
> >> > >> 10.10.10.120:6789 mon osd allow primary affinity = true
> >> > [mon.pve-ceph04]
> >> > >> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow
> >> primary
> >> > >> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
> >> > >> 10.10.10.140:6789 mon osd allow primary affinity = true
> >> > [mon.pve-ceph06]
> >> > >> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow
> >> primary
> >> > >> affinity = true
> >> > >>
> >> > >> Any help will be welcome!
> >> > >>
> >> > >> ---
> >> > >> Gilberto Nunes Ferreira
> >> > >>
> >> > >> (47) 3025-5907
> >> > >> (47) 99676-7530 - Whatsapp / Telegram
> >> > >>
> >> > >> Skype: gilberto.nunes36
> >> > >> _______________________________________________
> >> > >> pve-user mailing list
> >> > >> pve-***@pve.proxmox.com
> >> > >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >> > >>
> >> > >> _______________________________________________
> >> > >> pve-user mailing list
> >> > >> pve-***@pve.proxmox.com
> >> > >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >> > >>
> >> > > _______________________________________________
> >> > > pve-user mailing list
> >> > > pve-***@pve.proxmox.com
> >> > > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >> > _______________________________________________
> >> > pve-user mailing list
> >> > pve-***@pve.proxmox.com
> >> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >> >
> >> _______________________________________________
> >> pve-user mailing list
> >> pve-***@pve.proxmox.com
> >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >> _______________________________________________
> >> pve-user mailing list
> >> pve-***@pve.proxmox.com
> >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >>
> >
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>

Woods, Ken A (DNR)

2018-10-05 16:35:41 UTC

Permalink

Gilberto,

I have a questions, which I think many of us have, given your recent and not-so-recent history. Please don’t take them as insults, they’re not intended as such. I’m just trying to figure out how to best help you solve the problems you keep having.

Have you read any documentation ?
At all? Even just a quick-start guide? If so, did you retain any of it? (Odd numbers, quorum, etc)

Or—-do you fire off an email to the list without first trying to find the solution yourself?

Additionally, how many times does it take for you to receive the same answer before you believe it?
Have you considered buying a full service maintenance subscription?

Thanks, I’m pretty sure if we can figure out how you think about these issues, we can better help you. .......Because at this point, I’m ready to start telling you to STFU&RTFM.

Compassionately,

Ken

> On Oct 5, 2018, at 07:49, Gilberto Nunes <***@gmail.com> wrote:
>
> I have 6 monitors.
> What if I reduce it to 5? Or 4? Would help??
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
>
>
>
>
>
> Em sex, 5 de out de 2018 às 11:46, Marcus Haarmann <
> ***@midoco.de> escreveu:
>
>> This is corosync you are talking about. Also there, a quorum is needed to
>> work properly.
>> It needs to be configured in the same way as ceph.
>> You will always need a majority (e.g 4 out of 6, 3 out of 6 wont do).
>>
>> You main problem can be that you might lose one location and the part
>> which has the majority of servers
>> is down.
>> In my opinion, in your situation a 7th server would get you to 7 active
>> servers, 4 needed,
>> so 3 can be offline (remember to check your crush map so you will have a
>> working ceph cluster
>> on the remaining servers).
>> Depending on which side is getting offline, only one side will be able to
>> operate without the other,
>> but the other side won't.
>>
>> Marcus Haarmann
>>
>>
>> Von: "Gilberto Nunes" <***@gmail.com>
>> An: "pve-user" <pve-***@pve.proxmox.com>
>> Gesendet: Freitag, 5. Oktober 2018 15:08:24
>> Betreff: Re: [PVE-User] Proxmox CEPH 6 servers failures!
>>
>> Ok! Now I get it!
>> pvecm show me
>> pve-ceph01:/etc/pve# pvecm status
>> Quorum information
>> ------------------
>> Date: Fri Oct 5 10:04:57 2018
>> Quorum provider: corosync_votequorum
>> Nodes: 6
>> Node ID: 0x00000001
>> Ring ID: 1/32764
>> Quorate: Yes
>>
>> Votequorum information
>> ----------------------
>> Expected votes: 6
>> Highest expected: 6
>> Total votes: 6
>> Quorum: 4
>> Flags: Quorate
>>
>> Membership information
>> ----------------------
>> Nodeid Votes Name
>> 0x00000001 1 10.10.10.100 (local)
>> 0x00000002 1 10.10.10.110
>> 0x00000003 1 10.10.10.120
>> 0x00000004 1 10.10.10.130
>> 0x00000005 1 10.10.10.140
>> 0x00000006 1 10.10.10.150
>>
>> *Quorum: 4*
>> So I need 4 server online, at least!
>> Now when I loose 3 of 6, I remain, of course, just with 3 and not with 4,
>> which is required...
>> I will request new server to make quorum. Thanks for clarify this
>> situation!
>> ---
>> Gilberto Nunes Ferreira
>>
>> (47) 3025-5907
>> (47) 99676-7530 - Whatsapp / Telegram
>>
>> Skype: gilberto.nunes36
>>
>>
>>
>>
>>
>> Em sex, 5 de out de 2018 às 09:53, Gilberto Nunes <
>> ***@gmail.com> escreveu:
>>
>>> Folks...
>>>
>>> I CEPH servers are in the same network: 10.10.10.0/24...
>>> There is a optic channel between the builds: buildA and buildB, just to
>>> identified!
>>> When I create the cluster in first time, 3 servers going down in buildB,
>>> and the remain ceph servers continued to worked properly...
>>> I do not understand why now this cant happens anymore!
>>> Sorry if I sound like a newbie! I still learn about it!
>>> ---
>>> Gilberto Nunes Ferreira
>>>
>>> (47) 3025-5907
>>> (47) 99676-7530 - Whatsapp / Telegram
>>>
>>> Skype: gilberto.nunes36
>>>
>>>
>>>
>>>
>>>
>>> Em sex, 5 de out de 2018 às 09:44, Marcus Haarmann <
>>> ***@midoco.de> escreveu:
>>>
>>>> Gilberto,
>>>>
>>>> the underlying problem is a ceph problem and not related to VMs or
>>>> Proxmox.
>>>> The ceph system requires a mayority of monitor nodes to be active.
>>>> Your setup seems to have 3 mon nodes, which results in a loss of quorum
>>>> when two of these servers are gone.
>>>> Check "ceph -s" on each side if you see any reaction of ceph.
>>>> If not, probably not enough mons are present.
>>>>
>>>> Also, when one side is down you should see a non-presence of some OSD
>>>> instances.
>>>> In this case, ceph might be up but your VMs which are spread over the
>> OSD
>>>> disks,
>>>> might block because of the non-accessibility of the primary storage.
>>>> The distribution of data over the OSD instances is steered by the crush
>>>> map.
>>>> You should make sure to have enough copies configured and the crush map
>>>> set up in a way
>>>> that on each side of your cluster is minimum one copy.
>>>> In case the crush map is mis-configured, all copies of your data may be
>>>> on the wrong side,
>>>> esulting in proxmox not being able to access the VM data.
>>>>
>>>> Marcus Haarmann
>>>>
>>>>
>>>> Von: "Gilberto Nunes" <***@gmail.com>
>>>> An: "pve-user" <pve-***@pve.proxmox.com>
>>>> Gesendet: Freitag, 5. Oktober 2018 14:31:20
>>>> Betreff: Re: [PVE-User] Proxmox CEPH 6 servers failures!
>>>>
>>>> Nice.. Perhaps if I create a VM in Proxmox01 and Proxmox02, and join
>> this
>>>> VM into Cluster Ceph, can I solve to quorum problem?
>>>> ---
>>>> Gilberto Nunes Ferreira
>>>>
>>>> (47) 3025-5907
>>>> (47) 99676-7530 - Whatsapp / Telegram
>>>>
>>>> Skype: gilberto.nunes36
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Em sex, 5 de out de 2018 às 09:23, dorsy <***@yahoo.com> escreveu:
>>>>>
>>>>> Your question has already been answered. You need majority to have
>>>> quorum.
>>>>>
>>>>>> On 2018. 10. 05. 14:10, Gilberto Nunes wrote:
>>>>>> Hi
>>>>>> Perhaps this can help:
>>>>>>
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__imageshack.com_a_img921_6208_X7ha8R.png&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=ol07vaB33zwEaLWY7eR90cAScnrpD7QJI5G1zpMMlKI&e=
>>>>>>
>>>>>> I was thing about it, and perhaps if I deploy a VM in both side,
>> with
>>>>>> Proxmox and add this VM to the CEPH cluster, maybe this can help!
>>>>>>
>>>>>> thanks
>>>>>> ---
>>>>>> Gilberto Nunes Ferreira
>>>>>>
>>>>>> (47) 3025-5907
>>>>>> (47) 99676-7530 - Whatsapp / Telegram
>>>>>>
>>>>>> Skype: gilberto.nunes36
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Em sex, 5 de out de 2018 às 03:55, Alexandre DERUMIER <
>>>>> ***@odiso.com>
>>>>>> escreveu:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Can you resend your schema, because it's impossible to read.
>>>>>>>
>>>>>>>
>>>>>>> but you need to have to quorum on monitor to have the cluster
>>>> working.
>>>>>>>
>>>>>>>
>>>>>>> ----- Mail original -----
>>>>>>> De: "Gilberto Nunes" <***@gmail.com>
>>>>>>> À: "proxmoxve" <pve-***@pve.proxmox.com>
>>>>>>> Envoyé: Jeudi 4 Octobre 2018 22:05:16
>>>>>>> Objet: [PVE-User] Proxmox CEPH 6 servers failures!
>>>>>>>
>>>>>>> Hi there
>>>>>>>
>>>>>>> I have something like this:
>>>>>>>
>>>>>>> CEPH01 ----|
>>>>>>> |----- CEPH04
>>>>>>> |
>>>>>>> |
>>>>>>> CEPH02
>>>> ----|-----------------------------------------------------|----
>>>>>>> CEPH05
>>>>>>> | Optic Fiber
>>>>>>> |
>>>>>>> CEPH03 ----|
>>>>>>> |--- CEPH06
>>>>>>>
>>>>>>> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and
>>>> CEPH03
>>>>>>> remains, the entire cluster fail!
>>>>>>> I find out the cause!
>>>>>>>
>>>>>>> ceph.conf
>>>>>>>
>>>>>>> [global] auth client required = cephx auth cluster required =
>> cephx
>>>> auth
>>>>>>> service required = cephx cluster network = 10.10.10.0/24 fsid =
>>>>>>> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
>>>>>>> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true
>>>> osd
>>>>>>> journal size = 5120 osd pool default min size = 2 osd pool default
>>>> size
>>>>> =
>>>>>>> 3
>>>>>>> public network = 10.10.10.0/24 [osd] keyring =
>>>>>>> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host =
>>>> pve-ceph01
>>>>> mon
>>>>>>> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
>>>>>>> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789
>> mon
>>>> osd
>>>>>>> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03
>> mon
>>>>> addr
>>>>>>> =
>>>>>>> 10.10.10.120:6789 mon osd allow primary affinity = true
>>>>> [mon.pve-ceph04]
>>>>>>> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow
>>>> primary
>>>>>>> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
>>>>>>> 10.10.10.140:6789 mon osd allow primary affinity = true
>>>>> [mon.pve-ceph06]
>>>>>>> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow
>>>> primary
>>>>>>> affinity = true
>>>>>>>
>>>>>>> Any help will be welcome!
>>>>>>>
>>>>>>> ---
>>>>>>> Gilberto Nunes Ferreira
>>>>>>>
>>>>>>> (47) 3025-5907
>>>>>>> (47) 99676-7530 - Whatsapp / Telegram
>>>>>>>
>>>>>>> Skype: gilberto.nunes36
>>>>>>> _______________________________________________
>>>>>>> pve-user mailing list
>>>>>>> pve-***@pve.proxmox.com
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> pve-user mailing list
>>>>>>> pve-***@pve.proxmox.com
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
>>>>>> _______________________________________________
>>>>>> pve-user mailing list
>>>>>> pve-***@pve.proxmox.com
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
>>>>> _______________________________________________
>>>>> pve-user mailing list
>>>>> pve-***@pve.proxmox.com
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
>>>> _______________________________________________
>>>> pve-user mailing list
>>>> pve-***@pve.proxmox.com
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
>>>> _______________________________________________
>>>> pve-user mailing list
>>>> pve-***@pve.proxmox.com
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
>> _______________________________________________
>> pve-user mailing list
>> pve-***@pve.proxmox.com
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
>> _______________________________________________
>> pve-user mailing list
>> pve-***@pve.proxmox.com
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=

Gilberto Nunes

2018-10-05 16:54:01 UTC

Permalink

>> Have you read any documentation ?
>> At all? Even just a quick-start guide? If so, did you retain any of
it? (Odd numbers, quorum, etc)
Yes, I did some research and read the docs... In fact, I just missing the
Odd numbers!
Sorry if I sent a lot of mail to the list about my inquires...
Thanks
---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Em sex, 5 de out de 2018 às 13:35, Woods, Ken A (DNR) <***@alaska.gov>
escreveu:

> Gilberto,
>
> I have a questions, which I think many of us have, given your recent and
> not-so-recent history. Please don’t take them as insults, they’re not
> intended as such. I’m just trying to figure out how to best help you solve
> the problems you keep having.
>
> Have you read any documentation ?
> At all? Even just a quick-start guide? If so, did you retain any of it?
> (Odd numbers, quorum, etc)
>
> Or—-do you fire off an email to the list without first trying to find the
> solution yourself?
>
> Additionally, how many times does it take for you to receive the same
> answer before you believe it?
> Have you considered buying a full service maintenance subscription?
>
> Thanks, I’m pretty sure if we can figure out how you think about these
> issues, we can better help you. .......Because at this point, I’m ready to
> start telling you to STFU&RTFM.
>
> Compassionately,
>
> Ken
>
>
>
> > On Oct 5, 2018, at 07:49, Gilberto Nunes <***@gmail.com>
> wrote:
> >
> > I have 6 monitors.
> > What if I reduce it to 5? Or 4? Would help??
> > ---
> > Gilberto Nunes Ferreira
> >
> > (47) 3025-5907
> > (47) 99676-7530 - Whatsapp / Telegram
> >
> > Skype: gilberto.nunes36
> >
> >
> >
> >
> >
> > Em sex, 5 de out de 2018 às 11:46, Marcus Haarmann <
> > ***@midoco.de> escreveu:
> >
> >> This is corosync you are talking about. Also there, a quorum is needed
> to
> >> work properly.
> >> It needs to be configured in the same way as ceph.
> >> You will always need a majority (e.g 4 out of 6, 3 out of 6 wont do).
> >>
> >> You main problem can be that you might lose one location and the part
> >> which has the majority of servers
> >> is down.
> >> In my opinion, in your situation a 7th server would get you to 7 active
> >> servers, 4 needed,
> >> so 3 can be offline (remember to check your crush map so you will have a
> >> working ceph cluster
> >> on the remaining servers).
> >> Depending on which side is getting offline, only one side will be able
> to
> >> operate without the other,
> >> but the other side won't.
> >>
> >> Marcus Haarmann
> >>
> >>
> >> Von: "Gilberto Nunes" <***@gmail.com>
> >> An: "pve-user" <pve-***@pve.proxmox.com>
> >> Gesendet: Freitag, 5. Oktober 2018 15:08:24
> >> Betreff: Re: [PVE-User] Proxmox CEPH 6 servers failures!
> >>
> >> Ok! Now I get it!
> >> pvecm show me
> >> pve-ceph01:/etc/pve# pvecm status
> >> Quorum information
> >> ------------------
> >> Date: Fri Oct 5 10:04:57 2018
> >> Quorum provider: corosync_votequorum
> >> Nodes: 6
> >> Node ID: 0x00000001
> >> Ring ID: 1/32764
> >> Quorate: Yes
> >>
> >> Votequorum information
> >> ----------------------
> >> Expected votes: 6
> >> Highest expected: 6
> >> Total votes: 6
> >> Quorum: 4
> >> Flags: Quorate
> >>
> >> Membership information
> >> ----------------------
> >> Nodeid Votes Name
> >> 0x00000001 1 10.10.10.100 (local)
> >> 0x00000002 1 10.10.10.110
> >> 0x00000003 1 10.10.10.120
> >> 0x00000004 1 10.10.10.130
> >> 0x00000005 1 10.10.10.140
> >> 0x00000006 1 10.10.10.150
> >>
> >> *Quorum: 4*
> >> So I need 4 server online, at least!
> >> Now when I loose 3 of 6, I remain, of course, just with 3 and not with
> 4,
> >> which is required...
> >> I will request new server to make quorum. Thanks for clarify this
> >> situation!
> >> ---
> >> Gilberto Nunes Ferreira
> >>
> >> (47) 3025-5907
> >> (47) 99676-7530 - Whatsapp / Telegram
> >>
> >> Skype: gilberto.nunes36
> >>
> >>
> >>
> >>
> >>
> >> Em sex, 5 de out de 2018 às 09:53, Gilberto Nunes <
> >> ***@gmail.com> escreveu:
> >>
> >>> Folks...
> >>>
> >>> I CEPH servers are in the same network: 10.10.10.0/24...
> >>> There is a optic channel between the builds: buildA and buildB, just to
> >>> identified!
> >>> When I create the cluster in first time, 3 servers going down in
> buildB,
> >>> and the remain ceph servers continued to worked properly...
> >>> I do not understand why now this cant happens anymore!
> >>> Sorry if I sound like a newbie! I still learn about it!
> >>> ---
> >>> Gilberto Nunes Ferreira
> >>>
> >>> (47) 3025-5907
> >>> (47) 99676-7530 - Whatsapp / Telegram
> >>>
> >>> Skype: gilberto.nunes36
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Em sex, 5 de out de 2018 às 09:44, Marcus Haarmann <
> >>> ***@midoco.de> escreveu:
> >>>
> >>>> Gilberto,
> >>>>
> >>>> the underlying problem is a ceph problem and not related to VMs or
> >>>> Proxmox.
> >>>> The ceph system requires a mayority of monitor nodes to be active.
> >>>> Your setup seems to have 3 mon nodes, which results in a loss of
> quorum
> >>>> when two of these servers are gone.
> >>>> Check "ceph -s" on each side if you see any reaction of ceph.
> >>>> If not, probably not enough mons are present.
> >>>>
> >>>> Also, when one side is down you should see a non-presence of some OSD
> >>>> instances.
> >>>> In this case, ceph might be up but your VMs which are spread over the
> >> OSD
> >>>> disks,
> >>>> might block because of the non-accessibility of the primary storage.
> >>>> The distribution of data over the OSD instances is steered by the
> crush
> >>>> map.
> >>>> You should make sure to have enough copies configured and the crush
> map
> >>>> set up in a way
> >>>> that on each side of your cluster is minimum one copy.
> >>>> In case the crush map is mis-configured, all copies of your data may
> be
> >>>> on the wrong side,
> >>>> esulting in proxmox not being able to access the VM data.
> >>>>
> >>>> Marcus Haarmann
> >>>>
> >>>>
> >>>> Von: "Gilberto Nunes" <***@gmail.com>
> >>>> An: "pve-user" <pve-***@pve.proxmox.com>
> >>>> Gesendet: Freitag, 5. Oktober 2018 14:31:20
> >>>> Betreff: Re: [PVE-User] Proxmox CEPH 6 servers failures!
> >>>>
> >>>> Nice.. Perhaps if I create a VM in Proxmox01 and Proxmox02, and join
> >> this
> >>>> VM into Cluster Ceph, can I solve to quorum problem?
> >>>> ---
> >>>> Gilberto Nunes Ferreira
> >>>>
> >>>> (47) 3025-5907
> >>>> (47) 99676-7530 - Whatsapp / Telegram
> >>>>
> >>>> Skype: gilberto.nunes36
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> Em sex, 5 de out de 2018 às 09:23, dorsy <***@yahoo.com>
> escreveu:
> >>>>>
> >>>>> Your question has already been answered. You need majority to have
> >>>> quorum.
> >>>>>
> >>>>>> On 2018. 10. 05. 14:10, Gilberto Nunes wrote:
> >>>>>> Hi
> >>>>>> Perhaps this can help:
> >>>>>>
> >>>>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__imageshack.com_a_img921_6208_X7ha8R.png&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=ol07vaB33zwEaLWY7eR90cAScnrpD7QJI5G1zpMMlKI&e=
> >>>>>>
> >>>>>> I was thing about it, and perhaps if I deploy a VM in both side,
> >> with
> >>>>>> Proxmox and add this VM to the CEPH cluster, maybe this can help!
> >>>>>>
> >>>>>> thanks
> >>>>>> ---
> >>>>>> Gilberto Nunes Ferreira
> >>>>>>
> >>>>>> (47) 3025-5907
> >>>>>> (47) 99676-7530 - Whatsapp / Telegram
> >>>>>>
> >>>>>> Skype: gilberto.nunes36
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Em sex, 5 de out de 2018 às 03:55, Alexandre DERUMIER <
> >>>>> ***@odiso.com>
> >>>>>> escreveu:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> Can you resend your schema, because it's impossible to read.
> >>>>>>>
> >>>>>>>
> >>>>>>> but you need to have to quorum on monitor to have the cluster
> >>>> working.
> >>>>>>>
> >>>>>>>
> >>>>>>> ----- Mail original -----
> >>>>>>> De: "Gilberto Nunes" <***@gmail.com>
> >>>>>>> À: "proxmoxve" <pve-***@pve.proxmox.com>
> >>>>>>> Envoyé: Jeudi 4 Octobre 2018 22:05:16
> >>>>>>> Objet: [PVE-User] Proxmox CEPH 6 servers failures!
> >>>>>>>
> >>>>>>> Hi there
> >>>>>>>
> >>>>>>> I have something like this:
> >>>>>>>
> >>>>>>> CEPH01 ----|
> >>>>>>> |----- CEPH04
> >>>>>>> |
> >>>>>>> |
> >>>>>>> CEPH02
> >>>> ----|-----------------------------------------------------|----
> >>>>>>> CEPH05
> >>>>>>> | Optic Fiber
> >>>>>>> |
> >>>>>>> CEPH03 ----|
> >>>>>>> |--- CEPH06
> >>>>>>>
> >>>>>>> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and
> >>>> CEPH03
> >>>>>>> remains, the entire cluster fail!
> >>>>>>> I find out the cause!
> >>>>>>>
> >>>>>>> ceph.conf
> >>>>>>>
> >>>>>>> [global] auth client required = cephx auth cluster required =
> >> cephx
> >>>> auth
> >>>>>>> service required = cephx cluster network = 10.10.10.0/24 fsid =
> >>>>>>> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
> >>>>>>> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true
> >>>> osd
> >>>>>>> journal size = 5120 osd pool default min size = 2 osd pool default
> >>>> size
> >>>>> =
> >>>>>>> 3
> >>>>>>> public network = 10.10.10.0/24 [osd] keyring =
> >>>>>>> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host =
> >>>> pve-ceph01
> >>>>> mon
> >>>>>>> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
> >>>>>>> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789
> >> mon
> >>>> osd
> >>>>>>> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03
> >> mon
> >>>>> addr
> >>>>>>> =
> >>>>>>> 10.10.10.120:6789 mon osd allow primary affinity = true
> >>>>> [mon.pve-ceph04]
> >>>>>>> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow
> >>>> primary
> >>>>>>> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
> >>>>>>> 10.10.10.140:6789 mon osd allow primary affinity = true
> >>>>> [mon.pve-ceph06]
> >>>>>>> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow
> >>>> primary
> >>>>>>> affinity = true
> >>>>>>>
> >>>>>>> Any help will be welcome!
> >>>>>>>
> >>>>>>> ---
> >>>>>>> Gilberto Nunes Ferreira
> >>>>>>>
> >>>>>>> (47) 3025-5907
> >>>>>>> (47) 99676-7530 - Whatsapp / Telegram
> >>>>>>>
> >>>>>>> Skype: gilberto.nunes36
> >>>>>>> _______________________________________________
> >>>>>>> pve-user mailing list
> >>>>>>> pve-***@pve.proxmox.com
> >>>>>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> pve-user mailing list
> >>>>>>> pve-***@pve.proxmox.com
> >>>>>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
> >>>>>> _______________________________________________
> >>>>>> pve-user mailing list
> >>>>>> pve-***@pve.proxmox.com
> >>>>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
> >>>>> _______________________________________________
> >>>>> pve-user mailing list
> >>>>> pve-***@pve.proxmox.com
> >>>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
> >>>> _______________________________________________
> >>>> pve-user mailing list
> >>>> pve-***@pve.proxmox.com
> >>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
> >>>> _______________________________________________
> >>>> pve-user mailing list
> >>>> pve-***@pve.proxmox.com
> >>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
> >> _______________________________________________
> >> pve-user mailing list
> >> pve-***@pve.proxmox.com
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
> >> _______________________________________________
> >> pve-user mailing list
> >> pve-***@pve.proxmox.com
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
> > _______________________________________________
> > pve-user mailing list
> > pve-***@pve.proxmox.com
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__pve.proxmox.com_cgi-2Dbin_mailman_listinfo_pve-2Duser&d=DwIGaQ&c=teXCf5DW4bHgLDM-H5_GmQ&r=THf3d3FQjCY5FQHo3goSprNAh9vsOWPUM7J0jwvvVwM&m=MgD89RsU1x3jskwZGQbL6-1NxgHQ1p8eVOUTn80Qrs0&s=JjGOAMHuh_uB4EgSPjevuD3d-A4OKgqg6WKszbSuZyg&e=
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>

Lindsay Mathieson

2018-10-05 16:16:21 UTC

Permalink

Your Ceph cluster requires quorum to operate and that is based on your
monitor nodes, not the OSD ones, which your diagram earlier doesn't detail.

How many monitor nodes do you have, and where are they located?

nb. You should only have an odd number of monitor nodes.

On 5/10/2018 10:53 PM, Gilberto Nunes wrote:
> Folks...
>
> I CEPH servers are in the same network: 10.10.10.0/24...
> There is a optic channel between the builds: buildA and buildB, just to
> identified!
> When I create the cluster in first time, 3 servers going down in buildB,
> and the remain ceph servers continued to worked properly...
> I do not understand why now this cant happens anymore!
> Sorry if I sound like a newbie! I still learn about it!
> ---
> Gilberto Nunes Ferreira
>
> (47) 3025-5907
> (47) 99676-7530 - Whatsapp / Telegram
>
> Skype: gilberto.nunes36
>
>
>
>
>
> Em sex, 5 de out de 2018 às 09:44, Marcus Haarmann <
> ***@midoco.de> escreveu:
>
>> Gilberto,
>>
>> the underlying problem is a ceph problem and not related to VMs or
>> Proxmox.
>> The ceph system requires a mayority of monitor nodes to be active.
>> Your setup seems to have 3 mon nodes, which results in a loss of quorum
>> when two of these servers are gone.
>> Check "ceph -s" on each side if you see any reaction of ceph.
>> If not, probably not enough mons are present.
>>
>> Also, when one side is down you should see a non-presence of some OSD
>> instances.
>> In this case, ceph might be up but your VMs which are spread over the OSD
>> disks,
>> might block because of the non-accessibility of the primary storage.
>> The distribution of data over the OSD instances is steered by the crush
>> map.
>> You should make sure to have enough copies configured and the crush map
>> set up in a way
>> that on each side of your cluster is minimum one copy.
>> In case the crush map is mis-configured, all copies of your data may be on
>> the wrong side,
>> esulting in proxmox not being able to access the VM data.
>>
>> Marcus Haarmann
>>
>>
>> Von: "Gilberto Nunes" <***@gmail.com>
>> An: "pve-user" <pve-***@pve.proxmox.com>
>> Gesendet: Freitag, 5. Oktober 2018 14:31:20
>> Betreff: Re: [PVE-User] Proxmox CEPH 6 servers failures!
>>
>> Nice.. Perhaps if I create a VM in Proxmox01 and Proxmox02, and join this
>> VM into Cluster Ceph, can I solve to quorum problem?
>> ---
>> Gilberto Nunes Ferreira
>>
>> (47) 3025-5907
>> (47) 99676-7530 - Whatsapp / Telegram
>>
>> Skype: gilberto.nunes36
>>
>>
>>
>>
>>
>> Em sex, 5 de out de 2018 às 09:23, dorsy <***@yahoo.com> escreveu:
>>
>>> Your question has already been answered. You need majority to have
>> quorum.
>>> On 2018. 10. 05. 14:10, Gilberto Nunes wrote:
>>>> Hi
>>>> Perhaps this can help:
>>>>
>>>> https://imageshack.com/a/img921/6208/X7ha8R.png
>>>>
>>>> I was thing about it, and perhaps if I deploy a VM in both side, with
>>>> Proxmox and add this VM to the CEPH cluster, maybe this can help!
>>>>
>>>> thanks
>>>> ---
>>>> Gilberto Nunes Ferreira
>>>>
>>>> (47) 3025-5907
>>>> (47) 99676-7530 - Whatsapp / Telegram
>>>>
>>>> Skype: gilberto.nunes36
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Em sex, 5 de out de 2018 às 03:55, Alexandre DERUMIER <
>>> ***@odiso.com>
>>>> escreveu:
>>>>
>>>>> Hi,
>>>>>
>>>>> Can you resend your schema, because it's impossible to read.
>>>>>
>>>>>
>>>>> but you need to have to quorum on monitor to have the cluster
>> working.
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "Gilberto Nunes" <***@gmail.com>
>>>>> À: "proxmoxve" <pve-***@pve.proxmox.com>
>>>>> Envoyé: Jeudi 4 Octobre 2018 22:05:16
>>>>> Objet: [PVE-User] Proxmox CEPH 6 servers failures!
>>>>>
>>>>> Hi there
>>>>>
>>>>> I have something like this:
>>>>>
>>>>> CEPH01 ----|
>>>>> |----- CEPH04
>>>>> |
>>>>> |
>>>>> CEPH02
>> ----|-----------------------------------------------------|----
>>>>> CEPH05
>>>>> | Optic Fiber
>>>>> |
>>>>> CEPH03 ----|
>>>>> |--- CEPH06
>>>>>
>>>>> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and
>> CEPH03
>>>>> remains, the entire cluster fail!
>>>>> I find out the cause!
>>>>>
>>>>> ceph.conf
>>>>>
>>>>> [global] auth client required = cephx auth cluster required = cephx
>> auth
>>>>> service required = cephx cluster network = 10.10.10.0/24 fsid =
>>>>> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
>>>>> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true osd
>>>>> journal size = 5120 osd pool default min size = 2 osd pool default
>> size
>>> =
>>>>> 3
>>>>> public network = 10.10.10.0/24 [osd] keyring =
>>>>> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host = pve-ceph01
>>> mon
>>>>> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
>>>>> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789 mon
>> osd
>>>>> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03 mon
>>> addr
>>>>> =
>>>>> 10.10.10.120:6789 mon osd allow primary affinity = true
>>> [mon.pve-ceph04]
>>>>> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow primary
>>>>> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
>>>>> 10.10.10.140:6789 mon osd allow primary affinity = true
>>> [mon.pve-ceph06]
>>>>> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow primary
>>>>> affinity = true
>>>>>
>>>>> Any help will be welcome!
>>>>>
>>>>> ---
>>>>> Gilberto Nunes Ferreira
>>>>>
>>>>> (47) 3025-5907
>>>>> (47) 99676-7530 - Whatsapp / Telegram
>>>>>
>>>>> Skype: gilberto.nunes36
>>>>> _______________________________________________
>>>>> pve-user mailing list
>>>>> pve-***@pve.proxmox.com
>>>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>>>
>>>>> _______________________________________________
>>>>> pve-user mailing list
>>>>> pve-***@pve.proxmox.com
>>>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>>>
>>>> _______________________________________________
>>>> pve-user mailing list
>>>> pve-***@pve.proxmox.com
>>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>> _______________________________________________
>>> pve-user mailing list
>>> pve-***@pve.proxmox.com
>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>>
>> _______________________________________________
>> pve-user mailing list
>> pve-***@pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> _______________________________________________
>> pve-user mailing list
>> pve-***@pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

--
Lindsay

Gilberto Nunes

2018-10-05 16:26:39 UTC

Permalink

>> How many monitor nodes do you have, and where are they located?

Before
SIDE-A
pve-ceph01 - 1 mon
pve-ceph02 - 1 mon
pve-ceph03 - 1 mon

SIDE-B
pve-ceph04 - 1 mon
pve-ceph05 - 1 mon
pve-ceph06 - 1 mon

Now
SIDE-A
pve-ceph01 - 1 mon
pve-ceph02 - 1 mon
pve-ceph03 - 1 mon

SIDE-B
pve-ceph04 - 1 mon
pve-ceph05 - 1 mon
pve-ceph06 - < I remove this monitor >

https://imageshack.com/a/img923/4214/i2ugyC.png

---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Em sex, 5 de out de 2018 às 13:16, Lindsay Mathieson <
***@gmail.com> escreveu:

> Your Ceph cluster requires quorum to operate and that is based on your
> monitor nodes, not the OSD ones, which your diagram earlier doesn't detail.
>
> How many monitor nodes do you have, and where are they located?
>
> nb. You should only have an odd number of monitor nodes.
>
>
> On 5/10/2018 10:53 PM, Gilberto Nunes wrote:
> > Folks...
> >
> > I CEPH servers are in the same network: 10.10.10.0/24...
> > There is a optic channel between the builds: buildA and buildB, just to
> > identified!
> > When I create the cluster in first time, 3 servers going down in buildB,
> > and the remain ceph servers continued to worked properly...
> > I do not understand why now this cant happens anymore!
> > Sorry if I sound like a newbie! I still learn about it!
> > ---
> > Gilberto Nunes Ferreira
> >
> > (47) 3025-5907
> > (47) 99676-7530 - Whatsapp / Telegram
> >
> > Skype: gilberto.nunes36
> >
> >
> >
> >
> >
> > Em sex, 5 de out de 2018 às 09:44, Marcus Haarmann <
> > ***@midoco.de> escreveu:
> >
> >> Gilberto,
> >>
> >> the underlying problem is a ceph problem and not related to VMs or
> >> Proxmox.
> >> The ceph system requires a mayority of monitor nodes to be active.
> >> Your setup seems to have 3 mon nodes, which results in a loss of quorum
> >> when two of these servers are gone.
> >> Check "ceph -s" on each side if you see any reaction of ceph.
> >> If not, probably not enough mons are present.
> >>
> >> Also, when one side is down you should see a non-presence of some OSD
> >> instances.
> >> In this case, ceph might be up but your VMs which are spread over the
> OSD
> >> disks,
> >> might block because of the non-accessibility of the primary storage.
> >> The distribution of data over the OSD instances is steered by the crush
> >> map.
> >> You should make sure to have enough copies configured and the crush map
> >> set up in a way
> >> that on each side of your cluster is minimum one copy.
> >> In case the crush map is mis-configured, all copies of your data may be
> on
> >> the wrong side,
> >> esulting in proxmox not being able to access the VM data.
> >>
> >> Marcus Haarmann
> >>
> >>
> >> Von: "Gilberto Nunes" <***@gmail.com>
> >> An: "pve-user" <pve-***@pve.proxmox.com>
> >> Gesendet: Freitag, 5. Oktober 2018 14:31:20
> >> Betreff: Re: [PVE-User] Proxmox CEPH 6 servers failures!
> >>
> >> Nice.. Perhaps if I create a VM in Proxmox01 and Proxmox02, and join
> this
> >> VM into Cluster Ceph, can I solve to quorum problem?
> >> ---
> >> Gilberto Nunes Ferreira
> >>
> >> (47) 3025-5907
> >> (47) 99676-7530 - Whatsapp / Telegram
> >>
> >> Skype: gilberto.nunes36
> >>
> >>
> >>
> >>
> >>
> >> Em sex, 5 de out de 2018 às 09:23, dorsy <***@yahoo.com> escreveu:
> >>
> >>> Your question has already been answered. You need majority to have
> >> quorum.
> >>> On 2018. 10. 05. 14:10, Gilberto Nunes wrote:
> >>>> Hi
> >>>> Perhaps this can help:
> >>>>
> >>>> https://imageshack.com/a/img921/6208/X7ha8R.png
> >>>>
> >>>> I was thing about it, and perhaps if I deploy a VM in both side, with
> >>>> Proxmox and add this VM to the CEPH cluster, maybe this can help!
> >>>>
> >>>> thanks
> >>>> ---
> >>>> Gilberto Nunes Ferreira
> >>>>
> >>>> (47) 3025-5907
> >>>> (47) 99676-7530 - Whatsapp / Telegram
> >>>>
> >>>> Skype: gilberto.nunes36
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Em sex, 5 de out de 2018 às 03:55, Alexandre DERUMIER <
> >>> ***@odiso.com>
> >>>> escreveu:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> Can you resend your schema, because it's impossible to read.
> >>>>>
> >>>>>
> >>>>> but you need to have to quorum on monitor to have the cluster
> >> working.
> >>>>>
> >>>>> ----- Mail original -----
> >>>>> De: "Gilberto Nunes" <***@gmail.com>
> >>>>> À: "proxmoxve" <pve-***@pve.proxmox.com>
> >>>>> Envoyé: Jeudi 4 Octobre 2018 22:05:16
> >>>>> Objet: [PVE-User] Proxmox CEPH 6 servers failures!
> >>>>>
> >>>>> Hi there
> >>>>>
> >>>>> I have something like this:
> >>>>>
> >>>>> CEPH01 ----|
> >>>>> |----- CEPH04
> >>>>> |
> >>>>> |
> >>>>> CEPH02
> >> ----|-----------------------------------------------------|----
> >>>>> CEPH05
> >>>>> | Optic Fiber
> >>>>> |
> >>>>> CEPH03 ----|
> >>>>> |--- CEPH06
> >>>>>
> >>>>> Sometime, when Optic Fiber not work, and just CEPH01, CEPH02 and
> >> CEPH03
> >>>>> remains, the entire cluster fail!
> >>>>> I find out the cause!
> >>>>>
> >>>>> ceph.conf
> >>>>>
> >>>>> [global] auth client required = cephx auth cluster required = cephx
> >> auth
> >>>>> service required = cephx cluster network = 10.10.10.0/24 fsid =
> >>>>> e67534b4-0a66-48db-ad6f-aa0868e962d8 keyring =
> >>>>> /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true osd
> >>>>> journal size = 5120 osd pool default min size = 2 osd pool default
> >> size
> >>> =
> >>>>> 3
> >>>>> public network = 10.10.10.0/24 [osd] keyring =
> >>>>> /var/lib/ceph/osd/ceph-$id/keyring [mon.pve-ceph01] host = pve-ceph01
> >>> mon
> >>>>> addr = 10.10.10.100:6789 mon osd allow primary affinity = true
> >>>>> [mon.pve-ceph02] host = pve-ceph02 mon addr = 10.10.10.110:6789 mon
> >> osd
> >>>>> allow primary affinity = true [mon.pve-ceph03] host = pve-ceph03 mon
> >>> addr
> >>>>> =
> >>>>> 10.10.10.120:6789 mon osd allow primary affinity = true
> >>> [mon.pve-ceph04]
> >>>>> host = pve-ceph04 mon addr = 10.10.10.130:6789 mon osd allow primary
> >>>>> affinity = true [mon.pve-ceph05] host = pve-ceph05 mon addr =
> >>>>> 10.10.10.140:6789 mon osd allow primary affinity = true
> >>> [mon.pve-ceph06]
> >>>>> host = pve-ceph06 mon addr = 10.10.10.150:6789 mon osd allow primary
> >>>>> affinity = true
> >>>>>
> >>>>> Any help will be welcome!
> >>>>>
> >>>>> ---
> >>>>> Gilberto Nunes Ferreira
> >>>>>
> >>>>> (47) 3025-5907
> >>>>> (47) 99676-7530 - Whatsapp / Telegram
> >>>>>
> >>>>> Skype: gilberto.nunes36
> >>>>> _______________________________________________
> >>>>> pve-user mailing list
> >>>>> pve-***@pve.proxmox.com
> >>>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >>>>>
> >>>>> _______________________________________________
> >>>>> pve-user mailing list
> >>>>> pve-***@pve.proxmox.com
> >>>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >>>>>
> >>>> _______________________________________________
> >>>> pve-user mailing list
> >>>> pve-***@pve.proxmox.com
> >>>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >>> _______________________________________________
> >>> pve-user mailing list
> >>> pve-***@pve.proxmox.com
> >>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >>>
> >> _______________________________________________
> >> pve-user mailing list
> >> pve-***@pve.proxmox.com
> >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >> _______________________________________________
> >> pve-user mailing list
> >> pve-***@pve.proxmox.com
> >> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> >>
> > _______________________________________________
> > pve-user mailing list
> > pve-***@pve.proxmox.com
> > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
>
> --
> Lindsay
>
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>