Discussion:
[PVE-User] pveceph createosd after destroyed osd
Mark Adams
2018-07-03 00:05:51 UTC
Permalink
Currently running the newest 5.2-1 version, I had a test cluster which was
working fine. I since added more disks, first stopping, then setting out,
then destroying each osd so I could recreate it all from scratch.

However, when adding a new osd (either via GUI or pveceph CLI) it seems to
show a successful create, however does not show in the gui as an osd under
the host.

It's like the osd information is being stored by proxmox/ceph somewhere
else and not being correctly removed and recreated?

I can see that the newly created disk (after it being destroyed) is
down/out.

Is this by design? is there a way to force the disk back? shouldn't it show
in the gui once you create it again?

Thanks!
Woods, Ken A (DNR)
2018-07-03 00:34:15 UTC
Permalink
http://docs.ceph.com/docs/mimic/rados/operations/add-or-rm-osds/#removing-osds-manual

Are you sure you followed the directions?

________________________________
From: pve-user <pve-user-***@pve.proxmox.com> on behalf of Mark Adams <***@openvs.co.uk>
Sent: Monday, July 2, 2018 4:05:51 PM
To: pve-***@pve.proxmox.com
Subject: [PVE-User] pveceph createosd after destroyed osd

Currently running the newest 5.2-1 version, I had a test cluster which was
working fine. I since added more disks, first stopping, then setting out,
then destroying each osd so I could recreate it all from scratch.

However, when adding a new osd (either via GUI or pveceph CLI) it seems to
show a successful create, however does not show in the gui as an osd under
the host.

It's like the osd information is being stored by proxmox/ceph somewhere
else and not being correctly removed and recreated?

I can see that the newly created disk (after it being destroyed) is
down/out.

Is this by design? is there a way to force the disk back? shouldn't it show
in the gui once you create it again?

Thanks!
_______________________________________________
pve-user mailing list
pve-***@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Mark Adams
2018-07-03 00:41:39 UTC
Permalink
Hi, Thanks for your response!

No, I didn't do any of that on the cli - I just did stop in the webgui,
then out, then destroy.

Note that there was no VM's or data at all on this test ceph cluster - I
had deleted it all before doing this. I was basically just removing it all
so the OSD numbers looked "nicer" for the final setup.

It's not a huge deal, I can just reinstall proxmox. But it concerns me that
it seems so fragile using the webgui to do this. I want to know where I
went wrong? Is there somewhere that a signature is being stored so when you
try to add that same drive again (even though I ticked "remove partitions")
it doesn't add back in to the ceph cluster in the next sequential order
from the last current "live" or "valid" drive?

Is it just a rule that you never actually remove drives? you just set them
stopped/out?

Regards,
Mark



On 3 July 2018 at 01:34, Woods, Ken A (DNR) <***@alaska.gov> wrote:

> http://docs.ceph.com/docs/mimic/rados/operations/add-or-
> rm-osds/#removing-osds-manual
>
> Are you sure you followed the directions?
>
> ________________________________
> From: pve-user <pve-user-***@pve.proxmox.com> on behalf of Mark Adams
> <***@openvs.co.uk>
> Sent: Monday, July 2, 2018 4:05:51 PM
> To: pve-***@pve.proxmox.com
> Subject: [PVE-User] pveceph createosd after destroyed osd
>
> Currently running the newest 5.2-1 version, I had a test cluster which was
> working fine. I since added more disks, first stopping, then setting out,
> then destroying each osd so I could recreate it all from scratch.
>
> However, when adding a new osd (either via GUI or pveceph CLI) it seems to
> show a successful create, however does not show in the gui as an osd under
> the host.
>
> It's like the osd information is being stored by proxmox/ceph somewhere
> else and not being correctly removed and recreated?
>
> I can see that the newly created disk (after it being destroyed) is
> down/out.
>
> Is this by design? is there a way to force the disk back? shouldn't it show
> in the gui once you create it again?
>
> Thanks!
>
>
Woods, Ken A (DNR)
2018-07-03 00:48:30 UTC
Permalink
You're thinking "proxmox". Try thinking "ceph" instead. Sure, ceph runs with proxmox, but what you're really doing is using a pretty GUI that sits on top of debian, running ceph and kvm.


Anyway, perhaps the GUI does all the steps needed? Perhaps not.


If it were me, I'd NOT reinstall, as that's likely not going to fix the issue.


Follow the directions in the page I linked and see if that helps.

________________________________
From: pve-user <pve-user-***@pve.proxmox.com> on behalf of Mark Adams <***@openvs.co.uk>
Sent: Monday, July 2, 2018 4:41:39 PM
To: PVE User List
Subject: Re: [PVE-User] pveceph createosd after destroyed osd

Hi, Thanks for your response!

No, I didn't do any of that on the cli - I just did stop in the webgui,
then out, then destroy.

Note that there was no VM's or data at all on this test ceph cluster - I
had deleted it all before doing this. I was basically just removing it all
so the OSD numbers looked "nicer" for the final setup.

It's not a huge deal, I can just reinstall proxmox. But it concerns me that
it seems so fragile using the webgui to do this. I want to know where I
went wrong? Is there somewhere that a signature is being stored so when you
try to add that same drive again (even though I ticked "remove partitions")
it doesn't add back in to the ceph cluster in the next sequential order
from the last current "live" or "valid" drive?

Is it just a rule that you never actually remove drives? you just set them
stopped/out?

Regards,
Mark



On 3 July 2018 at 01:34, Woods, Ken A (DNR) <***@alaska.gov> wrote:

> http://docs.ceph.com/docs/mimic/rados/operations/add-or-
> rm-osds/#removing-osds-manual
>
> Are you sure you followed the directions?
>
> ________________________________
> From: pve-user <pve-user-***@pve.proxmox.com> on behalf of Mark Adams
> <***@openvs.co.uk>
> Sent: Monday, July 2, 2018 4:05:51 PM
> To: pve-***@pve.proxmox.com
> Subject: [PVE-User] pveceph createosd after destroyed osd
>
> Currently running the newest 5.2-1 version, I had a test cluster which was
> working fine. I since added more disks, first stopping, then setting out,
> then destroying each osd so I could recreate it all from scratch.
>
> However, when adding a new osd (either via GUI or pveceph CLI) it seems to
> show a successful create, however does not show in the gui as an osd under
> the host.
>
> It's like the osd information is being stored by proxmox/ceph somewhere
> else and not being correctly removed and recreated?
>
> I can see that the newly created disk (after it being destroyed) is
> down/out.
>
> Is this by design? is there a way to force the disk back? shouldn't it show
> in the gui once you create it again?
>
> Thanks!
>
>
_______________________________________________
pve-user mailing list
pve-***@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Woods, Ken A (DNR)
2018-07-03 00:50:34 UTC
Permalink
1.

ceph osd purge {id} --yes-i-really-mean-it


2. Navigate to the host where you keep the master copy of the cluster’s ceph.conf file.

ssh {admin-host}
cd /etc/ceph
vim ceph.conf


3. Remove the OSD entry from your ceph.conf file (if it exists).

[osd.1]
host = {hostname}


________________________________
From: Woods, Ken A (DNR)
Sent: Monday, July 2, 2018 4:48:30 PM
To: PVE User List
Subject: Re: [PVE-User] pveceph createosd after destroyed osd


You're thinking "proxmox". Try thinking "ceph" instead. Sure, ceph runs with proxmox, but what you're really doing is using a pretty GUI that sits on top of debian, running ceph and kvm.


Anyway, perhaps the GUI does all the steps needed? Perhaps not.


If it were me, I'd NOT reinstall, as that's likely not going to fix the issue.


Follow the directions in the page I linked and see if that helps.

________________________________
From: pve-user <pve-user-***@pve.proxmox.com> on behalf of Mark Adams <***@openvs.co.uk>
Sent: Monday, July 2, 2018 4:41:39 PM
To: PVE User List
Subject: Re: [PVE-User] pveceph createosd after destroyed osd

Hi, Thanks for your response!

No, I didn't do any of that on the cli - I just did stop in the webgui,
then out, then destroy.

Note that there was no VM's or data at all on this test ceph cluster - I
had deleted it all before doing this. I was basically just removing it all
so the OSD numbers looked "nicer" for the final setup.

It's not a huge deal, I can just reinstall proxmox. But it concerns me that
it seems so fragile using the webgui to do this. I want to know where I
went wrong? Is there somewhere that a signature is being stored so when you
try to add that same drive again (even though I ticked "remove partitions")
it doesn't add back in to the ceph cluster in the next sequential order
from the last current "live" or "valid" drive?

Is it just a rule that you never actually remove drives? you just set them
stopped/out?

Regards,
Mark



On 3 July 2018 at 01:34, Woods, Ken A (DNR) <***@alaska.gov> wrote:

> http://docs.ceph.com/docs/mimic/rados/operations/add-or-
> rm-osds/#removing-osds-manual
>
> Are you sure you followed the directions?
>
> ________________________________
> From: pve-user <pve-user-***@pve.proxmox.com> on behalf of Mark Adams
> <***@openvs.co.uk>
> Sent: Monday, July 2, 2018 4:05:51 PM
> To: pve-***@pve.proxmox.com
> Subject: [PVE-User] pveceph createosd after destroyed osd
>
> Currently running the newest 5.2-1 version, I had a test cluster which was
> working fine. I since added more disks, first stopping, then setting out,
> then destroying each osd so I could recreate it all from scratch.
>
> However, when adding a new osd (either via GUI or pveceph CLI) it seems to
> show a successful create, however does not show in the gui as an osd under
> the host.
>
> It's like the osd information is being stored by proxmox/ceph somewhere
> else and not being correctly removed and recreated?
>
> I can see that the newly created disk (after it being destroyed) is
> down/out.
>
> Is this by design? is there a way to force the disk back? shouldn't it show
> in the gui once you create it again?
>
> Thanks!
>
>
_______________________________________________
pve-user mailing list
pve-***@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Alwin Antreich
2018-07-03 09:07:43 UTC
Permalink
On Tue, Jul 03, 2018 at 01:05:51AM +0100, Mark Adams wrote:
> Currently running the newest 5.2-1 version, I had a test cluster which was
> working fine. I since added more disks, first stopping, then setting out,
> then destroying each osd so I could recreate it all from scratch.
>
> However, when adding a new osd (either via GUI or pveceph CLI) it seems to
> show a successful create, however does not show in the gui as an osd under
> the host.
>
> It's like the osd information is being stored by proxmox/ceph somewhere
> else and not being correctly removed and recreated?
>
> I can see that the newly created disk (after it being destroyed) is
> down/out.
>
> Is this by design? is there a way to force the disk back? shouldn't it show
> in the gui once you create it again?
>
Did you zero your disks after removal? On the first ~200 MB of the disk,
there are leftovers that need to be zeroed before use. After that the
OSD should be added fine.

--
Cheers,
Alwin
Mark Adams
2018-07-03 11:18:53 UTC
Permalink
Hi Alwin, please see my response below.

On 3 July 2018 at 10:07, Alwin Antreich <***@proxmox.com> wrote:

> On Tue, Jul 03, 2018 at 01:05:51AM +0100, Mark Adams wrote:
> > Currently running the newest 5.2-1 version, I had a test cluster which
> was
> > working fine. I since added more disks, first stopping, then setting out,
> > then destroying each osd so I could recreate it all from scratch.
> >
> > However, when adding a new osd (either via GUI or pveceph CLI) it seems
> to
> > show a successful create, however does not show in the gui as an osd
> under
> > the host.
> >
> > It's like the osd information is being stored by proxmox/ceph somewhere
> > else and not being correctly removed and recreated?
> >
> > I can see that the newly created disk (after it being destroyed) is
> > down/out.
> >
> > Is this by design? is there a way to force the disk back? shouldn't it
> show
> > in the gui once you create it again?
> >
> Did you zero your disks after removal? On the first ~200 MB of the disk,
> there are leftovers that need to be zeroed before use. After that the
> OSD should be added fine.
>
>
I hadn't done this, no - it has helped with the majority of disks thanks
and I can now re-add them (I also had to remove the folders from
/var/lib/ceph/osd which had other osd names - not sure if the destroy
process is supposed to remove them also?)

However I have a strange problem on the 2nd host, where it will not make
osd.12 ... I get no error output from the gui or pveceph createosd /dev/sda
- it just doesn't appear as an osd.

It successfully partitions the disk, but doesn't create a folder in
/var/lib/ceph/osd/ for the osd mount. I can see there is lock files in
/var/lib/ceph/tmp/ ... which I would think should only be there whilst the
creation is taking place?

journalctl -xe is showing me the problem I think, "command_with_stdin:
Error EEXIST: entity osd.12 exists but key does not match"

Where is this key? how should I be clearing it out so it will create?

Thanks,
Mark


--
> Cheers,
> Alwin
>
>
Alwin Antreich
2018-07-03 15:16:05 UTC
Permalink
On Tue, Jul 03, 2018 at 12:18:53PM +0100, Mark Adams wrote:
> Hi Alwin, please see my response below.
>
> On 3 July 2018 at 10:07, Alwin Antreich <***@proxmox.com> wrote:
>
> > On Tue, Jul 03, 2018 at 01:05:51AM +0100, Mark Adams wrote:
> > > Currently running the newest 5.2-1 version, I had a test cluster which
> > was
> > > working fine. I since added more disks, first stopping, then setting out,
> > > then destroying each osd so I could recreate it all from scratch.
> > >
> > > However, when adding a new osd (either via GUI or pveceph CLI) it seems
> > to
> > > show a successful create, however does not show in the gui as an osd
> > under
> > > the host.
> > >
> > > It's like the osd information is being stored by proxmox/ceph somewhere
> > > else and not being correctly removed and recreated?
> > >
> > > I can see that the newly created disk (after it being destroyed) is
> > > down/out.
> > >
> > > Is this by design? is there a way to force the disk back? shouldn't it
> > show
> > > in the gui once you create it again?
> > >
> > Did you zero your disks after removal? On the first ~200 MB of the disk,
> > there are leftovers that need to be zeroed before use. After that the
> > OSD should be added fine.
> >
> >
> I hadn't done this, no - it has helped with the majority of disks thanks
> and I can now re-add them (I also had to remove the folders from
> /var/lib/ceph/osd which had other osd names - not sure if the destroy
> process is supposed to remove them also?)
They will not interfere. ;)

>
> However I have a strange problem on the 2nd host, where it will not make
> osd.12 ... I get no error output from the gui or pveceph createosd /dev/sda
> - it just doesn't appear as an osd.
>
> It successfully partitions the disk, but doesn't create a folder in
> /var/lib/ceph/osd/ for the osd mount. I can see there is lock files in
> /var/lib/ceph/tmp/ ... which I would think should only be there whilst the
> creation is taking place?
From the OSD creation or different lock? Shouldn't make troubles either.

>
> journalctl -xe is showing me the problem I think, "command_with_stdin:
> Error EEXIST: entity osd.12 exists but key does not match"
>
> Where is this key? how should I be clearing it out so it will create?
>
'ceph auth list' will show you all keys in ceph, there will be an orphan
osd.12. Removal 'ceph auth del <osd.id>'.

--
Cheers,
Alwin
Mark Adams
2018-07-05 09:26:34 UTC
Permalink
Hi Anwin;

Thanks for that - It's all working now! Just to confirm though, shouldn't
the destroy button handle some of these actions? or is it left out on
purpose?

Regards,
Mark

On 3 July 2018 at 16:16, Alwin Antreich <***@proxmox.com> wrote:

> On Tue, Jul 03, 2018 at 12:18:53PM +0100, Mark Adams wrote:
> > Hi Alwin, please see my response below.
> >
> > On 3 July 2018 at 10:07, Alwin Antreich <***@proxmox.com> wrote:
> >
> > > On Tue, Jul 03, 2018 at 01:05:51AM +0100, Mark Adams wrote:
> > > > Currently running the newest 5.2-1 version, I had a test cluster
> which
> > > was
> > > > working fine. I since added more disks, first stopping, then setting
> out,
> > > > then destroying each osd so I could recreate it all from scratch.
> > > >
> > > > However, when adding a new osd (either via GUI or pveceph CLI) it
> seems
> > > to
> > > > show a successful create, however does not show in the gui as an osd
> > > under
> > > > the host.
> > > >
> > > > It's like the osd information is being stored by proxmox/ceph
> somewhere
> > > > else and not being correctly removed and recreated?
> > > >
> > > > I can see that the newly created disk (after it being destroyed) is
> > > > down/out.
> > > >
> > > > Is this by design? is there a way to force the disk back? shouldn't
> it
> > > show
> > > > in the gui once you create it again?
> > > >
> > > Did you zero your disks after removal? On the first ~200 MB of the
> disk,
> > > there are leftovers that need to be zeroed before use. After that the
> > > OSD should be added fine.
> > >
> > >
> > I hadn't done this, no - it has helped with the majority of disks thanks
> > and I can now re-add them (I also had to remove the folders from
> > /var/lib/ceph/osd which had other osd names - not sure if the destroy
> > process is supposed to remove them also?)
> They will not interfere. ;)
>
> >
> > However I have a strange problem on the 2nd host, where it will not make
> > osd.12 ... I get no error output from the gui or pveceph createosd
> /dev/sda
> > - it just doesn't appear as an osd.
> >
> > It successfully partitions the disk, but doesn't create a folder in
> > /var/lib/ceph/osd/ for the osd mount. I can see there is lock files in
> > /var/lib/ceph/tmp/ ... which I would think should only be there whilst
> the
> > creation is taking place?
> From the OSD creation or different lock? Shouldn't make troubles either.
>
> >
> > journalctl -xe is showing me the problem I think, "command_with_stdin:
> > Error EEXIST: entity osd.12 exists but key does not match"
> >
> > Where is this key? how should I be clearing it out so it will create?
> >
> 'ceph auth list' will show you all keys in ceph, there will be an orphan
> osd.12. Removal 'ceph auth del <osd.id>'.
>
>
Alwin Antreich
2018-07-05 10:04:19 UTC
Permalink
On Thu, Jul 05, 2018 at 10:26:34AM +0100, Mark Adams wrote:
> Hi Anwin;
>
> Thanks for that - It's all working now! Just to confirm though, shouldn't
> the destroy button handle some of these actions? or is it left out on
> purpose?
>
> Regards,
> Mark
>
I am not sure, what you mean exactly but the destroyosd (CLI/GUI) is
doing more then those two steps.


--
Cheers,
Alwin
Mark Adams
2018-07-05 10:05:52 UTC
Permalink
On 5 July 2018 at 11:04, Alwin Antreich <***@proxmox.com> wrote:

> On Thu, Jul 05, 2018 at 10:26:34AM +0100, Mark Adams wrote:
> > Hi Anwin;
> >
> > Thanks for that - It's all working now! Just to confirm though, shouldn't
> > the destroy button handle some of these actions? or is it left out on
> > purpose?
> >
> > Regards,
> > Mark
> >
> I am not sure, what you mean exactly but the destroyosd (CLI/GUI) is
> doing more then those two steps.
>
>
Yes I realise it is, what I'm saying is should it also be doing those
steps?
Alwin Antreich
2018-07-05 10:53:50 UTC
Permalink
On Thu, Jul 05, 2018 at 11:05:52AM +0100, Mark Adams wrote:
> On 5 July 2018 at 11:04, Alwin Antreich <***@proxmox.com> wrote:
>
> > On Thu, Jul 05, 2018 at 10:26:34AM +0100, Mark Adams wrote:
> > > Hi Anwin;
> > >
> > > Thanks for that - It's all working now! Just to confirm though, shouldn't
> > > the destroy button handle some of these actions? or is it left out on
> > > purpose?
> > >
> > > Regards,
> > > Mark
> > >
> > I am not sure, what you mean exactly but the destroyosd (CLI/GUI) is
> > doing more then those two steps.
> >
> >
> Yes I realise it is, what I'm saying is should it also be doing those
> steps?
Well, it is doing those too. Just with the failed creation of the OSD
not all entries are set and the destroy might fail on some (eg. no
service, no mount).

The osd create/destroy is up for a change anyway with the move from
ceph-disk (deprecated in Mimic) to ceph-volume. Sure room for
improvement. ;)


--
Cheers,
Alwin
Yannis Milios
2018-07-05 11:00:58 UTC
Permalink
> Yes I realise it is, what I'm saying is should it also be doing those
> steps?

Usually you don't have to, but as things often can go wrong you *may* have
to do things manually sometimes.
GUI is great and saves lots of work, however knowing how to manually solve
problems when they arise via the CLI in my opinion is also a
must.Especially when you deal with a complicated storage like Ceph ....

Y

On Thu, Jul 5, 2018 at 11:53 AM Alwin Antreich <***@proxmox.com>
wrote:

> On Thu, Jul 05, 2018 at 11:05:52AM +0100, Mark Adams wrote:
> > On 5 July 2018 at 11:04, Alwin Antreich <***@proxmox.com> wrote:
> >
> > > On Thu, Jul 05, 2018 at 10:26:34AM +0100, Mark Adams wrote:
> > > > Hi Anwin;
> > > >
> > > > Thanks for that - It's all working now! Just to confirm though,
> shouldn't
> > > > the destroy button handle some of these actions? or is it left out on
> > > > purpose?
> > > >
> > > > Regards,
> > > > Mark
> > > >
> > > I am not sure, what you mean exactly but the destroyosd (CLI/GUI) is
> > > doing more then those two steps.
> > >
> > >
> > Yes I realise it is, what I'm saying is should it also be doing those
> > steps?
> Well, it is doing those too. Just with the failed creation of the OSD
> not all entries are set and the destroy might fail on some (eg. no
> service, no mount).
>
> The osd create/destroy is up for a change anyway with the move from
> ceph-disk (deprecated in Mimic) to ceph-volume. Sure room for
> improvement. ;)
>
>
> --
> Cheers,
> Alwin
>
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
Mark Adams
2018-07-05 11:16:00 UTC
Permalink
Hi Alwin, Yannis,

Yes I definitely agree with you there Yannis - Good to know how to resolve
via cli when things don't seem right. It's also good to know that things
can go wrong sometimes via the gui.

Alwin - Any improvements are good of course!

Regards,
Mark

On 5 July 2018 at 12:00, Yannis Milios <***@gmail.com> wrote:

> > Yes I realise it is, what I'm saying is should it also be doing those
> > steps?
>
> Usually you don't have to, but as things often can go wrong you *may* have
> to do things manually sometimes.
> GUI is great and saves lots of work, however knowing how to manually solve
> problems when they arise via the CLI in my opinion is also a
> must.Especially when you deal with a complicated storage like Ceph ....
>
> Y


> On Thu, Jul 5, 2018 at 11:53 AM Alwin Antreich <***@proxmox.com>
> wrote:
>
> > On Thu, Jul 05, 2018 at 11:05:52AM +0100, Mark Adams wrote:
> > > On 5 July 2018 at 11:04, Alwin Antreich <***@proxmox.com>
> wrote:
> > >
> > > > On Thu, Jul 05, 2018 at 10:26:34AM +0100, Mark Adams wrote:
> > > > > Hi Anwin;
> > > > >
> > > > > Thanks for that - It's all working now! Just to confirm though,
> > shouldn't
> > > > > the destroy button handle some of these actions? or is it left out
> on
> > > > > purpose?
> > > > >
> > > > > Regards,
> > > > > Mark
> > > > >
> > > > I am not sure, what you mean exactly but the destroyosd (CLI/GUI) is
> > > > doing more then those two steps.
> > > >
> > > >
> > > Yes I realise it is, what I'm saying is should it also be doing those
> > > steps?
> > Well, it is doing those too. Just with the failed creation of the OSD
> > not all entries are set and the destroy might fail on some (eg. no
> > service, no mount).
> >
> > The osd create/destroy is up for a change anyway with the move from
> > ceph-disk (deprecated in Mimic) to ceph-volume. Sure room for
> > improvement. ;)
> >
>
Loading...