[pve-devel] [PATCH manager] add wipe_disk option when destroying ceph disk

Tue Oct 23 16:54:54 CEST 2018

Additional, this is also tracked on ceph's side.
https://tracker.ceph.com/issues/22354

On Tue, Oct 23, 2018 at 04:49:27PM +0200, Alwin Antreich wrote:
> On Tue, Oct 23, 2018 at 04:19:36PM +0200, Thomas Lamprecht wrote:
> > On 10/23/18 4:02 PM, Alwin Antreich wrote:
> > > Nice, was on my list too. ;) Some comments inline.
> > > 
> > > On Tue, Oct 23, 2018 at 03:33:44PM +0200, David Limbeck wrote:
> > >> this allows the disk to be reused as ceph disk by zeroing the first 200M
> > >> of the destroyed disk
> > >>
> > >> Signed-off-by: David Limbeck <d.limbeck at proxmox.com>
> > >> ---
> > >>  PVE/API2/Ceph.pm         | 22 ++++++++++++++++++++++
> > >>  www/manager6/ceph/OSD.js | 18 +++++++++++++++++-
> > >>  2 files changed, 39 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/PVE/API2/Ceph.pm b/PVE/API2/Ceph.pm
> > >> index 69489a70..6dce2f01 100644
> > >> --- a/PVE/API2/Ceph.pm
> > >> +++ b/PVE/API2/Ceph.pm
> > >> @@ -347,6 +347,12 @@ __PACKAGE__->register_method ({
> > >>  		optional => 1,
> > >>  		default => 0,
> > >>  	    },
> > >> +	    wipe_disk => {
> > >> +		description => 'Wipe first 200M of disk to make it reusable as a ceph OSD.',
> > >> +		type => 'boolean',
> > >> +		optional => 1,
> > >> +		default => 0,
> > >> +	    },
> > > I suggest to not expose this as a separate option, as the 'cleanup'
> > > should do this in one go. If I want to set the cleanup option, I
> > > definitely want the wipe too.
> > 
> > Is the data destroyed too, or could I re-add it (somehow) if
> > it wasn't wiped? If that's the case we may want to have the option
> > but yes, normally you're right.
> The command is 'destroyosd', doesn't that imply that the data is
> destroyed already? And the cleanup flag zaps the device.
> 
> The OSD has to be set to down & out prior the destroy. With or witout a
> wipe, it is removed from the ceph cluster. It contains "old data" that is
> not usable or only with forensic efforts. ;)
> 
> > 
> > > 
> > >>  	},
> > >>      },
> > >>      returns => { type => 'string' },
> > >> @@ -434,6 +440,15 @@ __PACKAGE__->register_method ({
> > >>  		}
> > >>  	    }
> > >>  
> > >> +	    my $disks_to_wipe = {};
> > >> +	    if ($param->{wipe_disk}) {
> > >> +		foreach my $part (@$partitions_to_remove) {
> > >> +		    next if !$part || (! -b $part );
> > >> +		    my $devpath = PVE::Diskmanage::get_blockdev($part);
> > >> +		    $disks_to_wipe->{$devpath} = 1;
> > >> +		}
> > >> +	    }
> > >> +
> > >>  	    print "Unmount OSD $osdsection from  $mountpoint\n";
> > >>  	    eval { run_command(['/bin/umount', $mountpoint]); };
> > >>  	    if (my $err = $@) {
> > >> @@ -443,6 +458,13 @@ __PACKAGE__->register_method ({
> > >>  		foreach my $part (@$partitions_to_remove) {
> > >>  		    $remove_partition->($part);
> > >>  		}
> > >> +		if ($param->{wipe_disk}) {
> > >> +		    foreach my $devpath (keys %$disks_to_wipe) {
> > >> +			print "wipe disk: $devpath\n";
> > >> +			eval { run_command(['/bin/dd', 'if=/dev/zero', "of=${devpath}", 'bs=1M', 'count=200']); };
> > > The dd needs the fdatasync option and maybe additionally as input
> > > /dev/urandom, as some disks or NVMe will not write the data out.
> > 
> > citation? If I write zeros to a blockdev I actually want zeros to be
> > written. If the blockdev internally optimizes that by not writing all
> > zeros out but only marks the area I wrote as zeroed does not matter,
> > for the kernel/user. The read after that operation *must* return zero -
> > else it's not a storage device but garbage...
> That should have happen, but with 'if=/dev/zero' creating a new OSD
> failed and fdatasync or urandom helped.
> 
> > 
> > urandom has a (minimal) chance to write something that looks like a
> > OSD (or whatever) again. Also we do not want to interfere with the
> > blockdevs features (i.e., write optimizations)
> True. Merely a suggestion, as those optimizations seem to not help.
>