[PVE-User] Custom storage in ProxMox 5

Discussion:

Lindsay Mathieson

2018-03-30 01:20:01 UTC

I was working on a custom storage plugin (lizardfs) for VE 4.x, looking to revisit it. Has the API changed much (or at all) for PX 5? Is there any documentation for it?

Thanks,

--
Lindsay Mathieson

Lindsay Mathieson

2018-03-30 01:35:17 UTC

Permalink

nb: still no way to integrate them into the WebUI?

--
Lindsay Mathieson

________________________________
From: Lindsay Mathieson <***@gmail.com>
Sent: Friday, March 30, 2018 11:20:01 AM
To: pve-***@pve.proxmox.com
Subject: Custom storage in ProxMox 5

I was working on a custom storage plugin (lizardfs) for VE 4.x, looking to revisit it. Has the API changed much (or at all) for PX 5? Is there any documentation for it?

Thanks,

--
Lindsay Mathieson

a***@extremeshok.com

2018-03-30 01:53:10 UTC

Permalink

Don't waste your time with lizardfs.

Proxmox 5+ has proper Ceph and ZFS support.

Ceph does everything and more, and ZFS is about the de-facto container
storage medium.

On 03/30/2018 03:20 AM, Lindsay Mathieson wrote:
> I was working on a custom storage plugin (lizardfs) for VE 4.x, looking to revisit it. Has the API changed much (or at all) for PX 5? Is there any documentation for it?
>
> Thanks,
>
> --
> Lindsay Mathieson
>
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Lindsay Mathieson

2018-03-30 03:05:11 UTC

Permalink

Ceph has rather larger overheads, much bigger PITA to admin, does not perform as well on whitebox hardware – in fact the Ceph crowd std reply to issues is to spend big on enterprise hardware and is far less flexible.

Can Ceph run multiple replication and ec levels for different files on the same volume? Can you change goal settings on the fly? Near instantaneous spanshots and restores at any level of the filesystem you choose? Run on commodity hardware with trivial adding of disks as required?

ZFS is not a distributed filesystem, so don’t know why you bring it up. Though I am using ZFS as the underlying filesystem.

--
Lindsay Mathieson

________________________________
From: pve-user <pve-user-***@pve.proxmox.com> on behalf of ***@extremeshok.com <***@extremeshok.com>
Sent: Friday, March 30, 2018 11:53:10 AM
To: pve-***@pve.proxmox.com
Subject: Re: [PVE-User] Custom storage in ProxMox 5

Don't waste your time with lizardfs.

Proxmox 5+ has proper Ceph and ZFS support.

Ceph does everything and more, and ZFS is about the de-facto container
storage medium.

On 03/30/2018 03:20 AM, Lindsay Mathieson wrote:
> I was working on a custom storage plugin (lizardfs) for VE 4.x, looking to revisit it. Has the API changed much (or at all) for PX 5? Is there any documentation for it?
>
> Thanks,
>
> --
> Lindsay Mathieson
>
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

_______________________________________________
pve-user mailing list
pve-***@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Alwin Antreich

2018-03-30 07:47:16 UTC

Permalink

Hi Lindsay,

On Fri, Mar 30, 2018 at 03:05:11AM +0000, Lindsay Mathieson wrote:
> Ceph has rather larger overheads, much bigger PITA to admin, does not perform as well on whitebox hardware – in fact the Ceph crowd std reply to issues is to spend big on enterprise hardware and is far less flexible.
I really don't think that holds true. It is very straightforward to
install and maintain. As all distributed systems, there is some
complexitiy involved, that an admin has to know.

Our stack has its own toolset, but you can also use ceph-deploy or
better ceph-ansible for ceph installation and maintanance.
https://pve.proxmox.com/pve-docs/chapter-pveceph.html

The hardware topic is always a pitty. People that are new to ceph (or
any other system) usually don't understand the workings of it and have
high expectations about the performance they can achieve. When people
find out that it is to the contrary of their expectations, they show up
on the mailing lists and forums.

In the end it comes down to proper planning, hardware purchase and
testing, that leeds to the verdict, buy enterprise hardware if you want
to have enterprise performance. Irrespective of distributed system.

>
> Can Ceph run multiple replication and ec levels for different files on the same volume? Can you change goal settings on the fly? Near instantaneous spanshots and restores at any level of the filesystem you choose? Run on commodity hardware with trivial adding of disks as required?
Yes, but as ceph is working differently, these terminologies may not do
or mean the same.

>
>
>
> ZFS is not a distributed filesystem, so don’t know why you bring it up. Though I am using ZFS as the underlying filesystem.
While not distributed, in some cases, the storage replication (pvesr)
might be a consideration worth.
https://pve.proxmox.com/pve-docs/chapter-pvesr.html

>
>
>
> --
> Lindsay Mathieson
>
>
>
> ________________________________
> From: pve-user <pve-user-***@pve.proxmox.com> on behalf of ***@extremeshok.com <***@extremeshok.com>
> Sent: Friday, March 30, 2018 11:53:10 AM
> To: pve-***@pve.proxmox.com
> Subject: Re: [PVE-User] Custom storage in ProxMox 5
>
> Don't waste your time with lizardfs.
>
> Proxmox 5+ has proper Ceph and ZFS support.
>
> Ceph does everything and more, and ZFS is about the de-facto container
> storage medium.
>
>
> On 03/30/2018 03:20 AM, Lindsay Mathieson wrote:
> > I was working on a custom storage plugin (lizardfs) for VE 4.x, looking to revisit it. Has the API changed much (or at all) for PX 5? Is there any documentation for it?
Check out the other storage plugins, AFAIC, there is no specific
documentation.

For completness, for our PVE API, check out the following links.
https://pve.proxmox.com/pve-docs/api-viewer/index.html
https://pve.proxmox.com/wiki/Proxmox_VE_API

> >
> > Thanks,
> >
> > --
> > Lindsay Mathieson
> >
--
Cheers,
Alwin

Alexandre DERUMIER

2018-03-30 09:40:16 UTC

Permalink

Hi,

>>Ceph has rather larger overheads

Agree. they are overhead, but performance increase with each release.
I think the biggest problem is that you can reach more than 70-90k iops with 1 vm disk currently.
and maybe latency could be improve too.

>>much bigger PITA to admin
don't agree. I'm running 5 ceph cluster (around 200TB ssd). almost 0 maintenance.

>>does not perform as well on whitebox hardware
define whitebox hardware ?
the only thing is to no use consumer ssd (because they sucks with direct io)

>>an Ceph run multiple replication and ec levels for different files on the same volume?
you can manage it by pool. (as block storage).
>> Near instantaneous spanshots
yes

>>and restores

a lit bit slower to rollback

>>at any level of the filesystem you choose?

why are you talking about filesystem ? Are you mounting lizard inside your vm? or for hosting vm disk ?

>>Run on commodity hardware with trivial adding of disks as required?
yes
----- Mail original -----
De: "Lindsay Mathieson" <***@gmail.com>
À: "proxmoxve" <pve-***@pve.proxmox.com>
Envoyé: Vendredi 30 Mars 2018 05:05:11
Objet: Re: [PVE-User] Custom storage in ProxMox 5

Ceph has rather larger overheads, much bigger PITA to admin, does not perform as well on whitebox hardware – in fact the Ceph crowd std reply to issues is to spend big on enterprise hardware and is far less flexible.

Can Ceph run multiple replication and ec levels for different files on the same volume? Can you change goal settings on the fly? Near instantaneous spanshots and restores at any level of the filesystem you choose? Run on commodity hardware with trivial adding of disks as required?

ZFS is not a distributed filesystem, so don’t know why you bring it up. Though I am using ZFS as the underlying filesystem.

--
Lindsay Mathieson

________________________________
From: pve-user <pve-user-***@pve.proxmox.com> on behalf of ***@extremeshok.com <***@extremeshok.com>
Sent: Friday, March 30, 2018 11:53:10 AM
To: pve-***@pve.proxmox.com
Subject: Re: [PVE-User] Custom storage in ProxMox 5

Don't waste your time with lizardfs.

Proxmox 5+ has proper Ceph and ZFS support.

Ceph does everything and more, and ZFS is about the de-facto container
storage medium.

On 03/30/2018 03:20 AM, Lindsay Mathieson wrote:
> I was working on a custom storage plugin (lizardfs) for VE 4.x, looking to revisit it. Has the API changed much (or at all) for PX 5? Is there any documentation for it?
>
> Thanks,
>
> --
> Lindsay Mathieson
>
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

_______________________________________________
pve-user mailing list
pve-***@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
pve-***@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Lindsay Mathieson

2018-03-30 23:58:06 UTC

Permalink

On 30/03/2018 7:40 PM, Alexandre DERUMIER wrote:
> Hi,
>
>>> Ceph has rather larger overheads
> Agree. they are overhead, but performance increase with each release.
> I think the biggest problem is that you can reach more than 70-90k iops with 1 vm disk currently.
> and maybe latency could be improve too.

The performance I got with Ceph was sub optimal - as mentioned earlier,
if you throw lots of money at Enterprise hardware & SSD's then its ok,
but that sort of expenditure was not possible for our SMB. Something not
mentioned is that it does not do well on small setups. A bare minimum is
5 nodes with multiple OSD's.

>
>>> much bigger PITA to admin
> don't agree. I'm running 5 ceph cluster (around 200TB ssd). almost 0 maintenance.

Adding/replacing/removing OSD's can be a nightmare. The potential to
trash your pools is only one mistep away. Adding extra nodes with
lizardfs is trivial and lightweight. It took me minutes to add several
desktop stations as chunkservers in our office. Judicious use of labels
and goals allows us to distribute data amoungst them as desired by
performance and space. The three high performance compute nodes get a
copy of all chunks, which speeds up local reads.

>
>>> does not perform as well on whitebox hardware
> define whitebox hardware ?

Consumer grade drives and disk controllers. Maybe a Ozzie phrase?

> the only thing is to no use consumer ssd (because they sucks with direct io)

True, they are shocking. Also the lack of power loss protection.

>
>>> an Ceph run multiple replication and ec levels for different files on the same volume?
> you can manage it by pool. (as block storage).

With lizardfs I can set individual replication levels and ec modes per
file (VM). Immensely useful for different classes of VM's. I have server
VM's on replica level 4, desktop VM's replica 2 (higher performance),
archived data ec(5,2) (fast read, slow write). I use ZFS underneath for
transparent compression.

Halo writes are in testing for the next release which allows even more
performance trade offs for VM's where minor data loss is not critical.

As with ZFS, chunks are checksummed and also versioned.

All this in a single namespace (lizardfs fusemount), similar to a single
Ceph pool.

A qemu block driver is in the works, which should step round fuse
performance limitations.

>>> Near instantaneous spanshots
> yes
>
>>> and restores
> a lit bit slower to rollback

A *lot* slower for me - snapshot restores were taking literally hours,
and killed ceph performance in the process. It made snapshots useless
for us.

>
>>> at any level of the filesystem you choose?
> why are you talking about filesystem ? Are you mounting lizard inside your vm? or for hosting vm disk ?

lizardfs exposes a posix filesystem via fuse, similar to CephFS. It
replication goals can be set per file, unlike ceph, which are per pool.
They can be changed per file on the fly.

lizardfs is not without its limitations of course.

* Its fuse based, which is a performance hit. A block level qemu
driver is in the works.
* Its metadata server is Active/Passive, not active/active and fall
over has to be managed by custom scripts using cluster tools such as
keepalived, corosync etc. A built in fallover tool is in the next
release.

--
Lindsay

Mark Schouten

2018-04-03 08:43:29 UTC

Permalink

On Sat, 2018-03-31 at 09:58 +1000, Lindsay Mathieson wrote:
> The performance I got with Ceph was sub optimal - as mentioned
> earlier,Â
> if you throw lots of money at Enterprise hardware & SSD's then its
> ok,Â
> but that sort of expenditure was not possible for our SMB. Something
> notÂ
> mentioned is that it does not do well on small setups. A bare minimum
> isÂ
> 5 nodes with multiple OSD's.

Samsung PM863(a) disks are not that expensive and perform very well.
Also, a three-node setup with a single OSD per node works perfectly.

> Adding/replacing/removing OSD's can be a nightmare. The potential toÂ
> trash your pools is only one mistep away.

I really don't see how trashing a pool is one misstep away. That would
really take an enormous idiot.

I don't know Lizardfs, so I have no opinion about that. But it seems
your experiences with Ceph are not in sync with any other experience I
hear/read.

--
Kerio Operator in de Cloud? https://www.kerioindecloud.nl/
Mark Schouten | Tuxis Internet Engineering
KvK: 61527076 | http://www.tuxis.nl/
T: 0318 200208 | ***@tuxis.nl

Ian Coetzee

2018-04-03 10:01:00 UTC

Permalink

Hi Mark,

In a way I would agree with you on the total idiot part, speaking from
experience.

https://pve.proxmox.com/pipermail/pve-user/2018-January/169179.html

Where I nuked our whole ceph cluster with a single command (although a
warning would have been nice)

My experience with Ceph so far is that it quite enjoyable to work with,
chows a bit of memory on the OSD servers and chows on available disk space
for a tradeoff of reliability.

For the moment we are running on LVM storage again until a later stage when
we can implement ceph again.

*Kind regards*

Eneko Lacunza

2018-04-04 07:50:42 UTC

Permalink

Hi,

El 30/03/18 a las 05:05, Lindsay Mathieson escribió:
> Ceph has rather larger overheads, much bigger PITA to admin, does not perform as well on whitebox hardware – in fact the Ceph crowd std reply to issues is to spend big on enterprise hardware and is far less flexible.
Nonsense. We use whiteboxes, desktop HDD, no RAID cards, etc. with
Ceph/Proxmox and works really well. Effort to maintain it is... near
zero. Effort to deploy just 15 minutes, thanks to Proxmox integration.

> Can Ceph run multiple replication and ec levels for different files on the same volume? Can you change goal settings on the fly? Near instantaneous spanshots and restores at any level of the filesystem you choose? Run on commodity hardware with trivial adding of disks as required?
You don't do files on Ceph, you just do volumes (vdisks). Ceph is not a
filesystem (although has a filesystem - CephFS). We use commodity
hardware and add disks trivially.

I don't say Ceph may be for every use cases; but don't talk lightly
about something you don't know.

Cheers
Eneko
>
>
>
> ZFS is not a distributed filesystem, so don’t know why you bring it up. Though I am using ZFS as the underlying filesystem.
>
>
>
> --
> Lindsay Mathieson
>
>
>
> ________________________________
> From: pve-user <pve-user-***@pve.proxmox.com> on behalf of ***@extremeshok.com <***@extremeshok.com>
> Sent: Friday, March 30, 2018 11:53:10 AM
> To: pve-***@pve.proxmox.com
> Subject: Re: [PVE-User] Custom storage in ProxMox 5
>
> Don't waste your time with lizardfs.
>
> Proxmox 5+ has proper Ceph and ZFS support.
>
> Ceph does everything and more, and ZFS is about the de-facto container
> storage medium.
>
>
> On 03/30/2018 03:20 AM, Lindsay Mathieson wrote:
>> I was working on a custom storage plugin (lizardfs) for VE 4.x, looking to revisit it. Has the API changed much (or at all) for PX 5? Is there any documentation for it?
>>
>> Thanks,
>>
>> --
>> Lindsay Mathieson
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-***@pve.proxmox.com
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

Dietmar Maurer

2018-04-04 08:24:48 UTC

Permalink

Hi all,

I think there are many opinions when it comes up to storage technologies, and
that is the reason why there are so many different storage projects out there.

And for that reason, we have a plugin system for different storage types :-)

> On April 4, 2018 at 9:50 AM Eneko Lacunza <***@binovo.es> wrote:
>
>
> Hi,
>
> El 30/03/18 a las 05:05, Lindsay Mathieson escribió:
> > Ceph has rather larger overheads, much bigger PITA to admin, does not
> > perform as well on whitebox hardware – in fact the Ceph crowd std reply to
> > issues is to spend big on enterprise hardware and is far less flexible.
> Nonsense. We use whiteboxes, desktop HDD, no RAID cards, etc. with
> Ceph/Proxmox and works really well. Effort to maintain it is... near
> zero. Effort to deploy just 15 minutes, thanks to Proxmox integration.

Thomas Lamprecht

2018-03-30 07:10:42 UTC

Permalink

Hi,

Am 03/30/2018 um 03:20 AM schrieb Lindsay Mathieson:
> I was working on a custom storage plugin (lizardfs) for VE 4.x, looking to revisit it. Has the API changed much (or at all) for PX 5? Is there any documentation for it?
>

The base work for per-storage bandwidth limiting was added,
see commit:

commit 9edb99a5a763f03e031ffdce151c739d6ffaca0c
Author: Wolfgang Bumiller <***@proxmox.com>
Date: Tue Jan 30 11:46:19 2018 +0100

add Storage::get_bandwidth_limit helper

the changes should be trivial though (see commit), only restore bwlimit
is implemented and exposed on newest WebUI.

Besides that I did not found or remembered anything big, API-wise.

A good heuristic for "where there changes for all/most plugins" is running:
# git log --stat --oneline PVE/Storage/

And look for changes which affect a lot or all plugin classes, with this
I found the aforementioned commit above.

> nb: still no way to integrate them into the WebUI?

No not really, I'm afraid, but doing so on top should be much less work
now, with the big storage edit refactoring.

cheers,
Thomas

Lindsay Mathieson

2018-03-30 23:20:48 UTC

Permalink

On 30/03/2018 5:10 PM, Thomas Lamprecht wrote:
> the changes should be trivial though (see commit), only restore
> bwlimit is implemented and exposed on newest WebUI.
>
> Besides that I did not found or remembered anything big, API-wise.

Thanks Thomas, good to know.

--
Lindsay

l***@ulrar.net

2018-04-03 10:18:26 UTC

Permalink

Hey Lindsay,

A bit off topic, but we're still using glusterFS here,
and for the most part still happy with it
(but I still haven't had the courrage to update past 3.7.15).

Since I remember you using glusterFS too, how would you recommand
swithing to lizardFS ? It does sound good, but the metadata server is
what always discouraged me.
I saw a talk by the developers at last year's FOSDEM and apparently the
enterprise version has automatic failover builtin, as I understood it,
any experience with that + proxmox ? GlusterFS runs nicely on the same
nodes as proxmox, that's a big advantage I think.

On Fri, Mar 30, 2018 at 01:20:01AM +0000, Lindsay Mathieson wrote:
> I was working on a custom storage plugin (lizardfs) for VE 4.x, looking to revisit it. Has the API changed much (or at all) for PX 5? Is there any documentation for it?
>
> Thanks,
>
> --
> Lindsay Mathieson
>
> _______________________________________________
> pve-user mailing list
> pve-***@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

--
PGP Fingerprint : 0x624E42C734DAC346