[pve-devel] [PATCH manager] add wipe_disk option when destroying ceph disk

Tue Oct 23 16:49:27 CEST 2018

On Tue, Oct 23, 2018 at 04:19:36PM +0200, Thomas Lamprecht wrote:
> On 10/23/18 4:02 PM, Alwin Antreich wrote:
> > Nice, was on my list too. ;) Some comments inline.
> > 
> > On Tue, Oct 23, 2018 at 03:33:44PM +0200, David Limbeck wrote:
> >> this allows the disk to be reused as ceph disk by zeroing the first 200M
> >> of the destroyed disk
> >>
> >> Signed-off-by: David Limbeck <d.limbeck at proxmox.com>
> >> ---
> >>  PVE/API2/Ceph.pm         | 22 ++++++++++++++++++++++
> >>  www/manager6/ceph/OSD.js | 18 +++++++++++++++++-
> >>  2 files changed, 39 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/PVE/API2/Ceph.pm b/PVE/API2/Ceph.pm
> >> index 69489a70..6dce2f01 100644
> >> --- a/PVE/API2/Ceph.pm
> >> +++ b/PVE/API2/Ceph.pm
> >> @@ -347,6 +347,12 @@ __PACKAGE__->register_method ({
> >>  		optional => 1,
> >>  		default => 0,
> >>  	    },
> >> +	    wipe_disk => {
> >> +		description => 'Wipe first 200M of disk to make it reusable as a ceph OSD.',
> >> +		type => 'boolean',
> >> +		optional => 1,
> >> +		default => 0,
> >> +	    },
> > I suggest to not expose this as a separate option, as the 'cleanup'
> > should do this in one go. If I want to set the cleanup option, I
> > definitely want the wipe too.
> 
> Is the data destroyed too, or could I re-add it (somehow) if
> it wasn't wiped? If that's the case we may want to have the option
> but yes, normally you're right.
The command is 'destroyosd', doesn't that imply that the data is
destroyed already? And the cleanup flag zaps the device.

The OSD has to be set to down & out prior the destroy. With or witout a
wipe, it is removed from the ceph cluster. It contains "old data" that is
not usable or only with forensic efforts. ;)

> 
> > 
> >>  	},
> >>      },
> >>      returns => { type => 'string' },
> >> @@ -434,6 +440,15 @@ __PACKAGE__->register_method ({
> >>  		}
> >>  	    }
> >>  
> >> +	    my $disks_to_wipe = {};
> >> +	    if ($param->{wipe_disk}) {
> >> +		foreach my $part (@$partitions_to_remove) {
> >> +		    next if !$part || (! -b $part );
> >> +		    my $devpath = PVE::Diskmanage::get_blockdev($part);
> >> +		    $disks_to_wipe->{$devpath} = 1;
> >> +		}
> >> +	    }
> >> +
> >>  	    print "Unmount OSD $osdsection from  $mountpoint\n";
> >>  	    eval { run_command(['/bin/umount', $mountpoint]); };
> >>  	    if (my $err = $@) {
> >> @@ -443,6 +458,13 @@ __PACKAGE__->register_method ({
> >>  		foreach my $part (@$partitions_to_remove) {
> >>  		    $remove_partition->($part);
> >>  		}
> >> +		if ($param->{wipe_disk}) {
> >> +		    foreach my $devpath (keys %$disks_to_wipe) {
> >> +			print "wipe disk: $devpath\n";
> >> +			eval { run_command(['/bin/dd', 'if=/dev/zero', "of=${devpath}", 'bs=1M', 'count=200']); };
> > The dd needs the fdatasync option and maybe additionally as input
> > /dev/urandom, as some disks or NVMe will not write the data out.
> 
> citation? If I write zeros to a blockdev I actually want zeros to be
> written. If the blockdev internally optimizes that by not writing all
> zeros out but only marks the area I wrote as zeroed does not matter,
> for the kernel/user. The read after that operation *must* return zero -
> else it's not a storage device but garbage...
That should have happen, but with 'if=/dev/zero' creating a new OSD
failed and fdatasync or urandom helped.

> 
> urandom has a (minimal) chance to write something that looks like a
> OSD (or whatever) again. Also we do not want to interfere with the
> blockdevs features (i.e., write optimizations)
True. Merely a suggestion, as those optimizations seem to not help.