[pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?

Stefan Priebe - Profihost AG s.priebe at profihost.ag
Tue Jan 8 21:59:44 CET 2019


Hi Alexandre,


Am 08.01.19 um 21:55 schrieb Alexandre DERUMIER:
>>> But, file_set_contents - which save_clusterfw_conf uses - does this already[0],
>>> so maybe this is the "high-level fuse rename isn't atomic" bug again...
>>> May need to take a closer look tomorrow.
> 
> mmm, ok.
> 
> In my case, it was with a simple file copy (cp /tmp/cluster.fw /etc/pve/firewall/cluster.fw). (I manage cluster.fw through puppet for multiple cluster).
> can reproduce it too with a simple vi edition.
> 
> I think others users could trigger this too, if they use scripts to automate ipset (blacklist, ....).
> 
> Maybe could it be better to only disable firewall when option is setup with "enabled:0", and if cluster.fw is missing, don't change any rules.
> ²²²


for those cases i use something like (pseudocode - i use salt not puppet):

- manage copy of file
- if file has changed trigger:
   - mv -v $managedfile $realfile


Greets,
Stefan

> 
> 
> 
> ----- Mail original -----
> De: "Thomas Lamprecht" <thomas at lamprecht.org>
> À: "pve-devel" <pve-devel at pve.proxmox.com>, "aderumier" <aderumier at odiso.com>
> Envoyé: Mardi 8 Janvier 2019 20:58:51
> Objet: Re: [pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?
> 
> Hi, 
> 
> On 1/8/19 7:37 PM, Alexandre DERUMIER wrote: 
>> I'm able to reproduce with: 
>> --------------------------- 
>> on 1 host: 
>>
>> cluster.fw: 
>> [OPTIONS] 
>>
>> enable: 1 
>> policy_in: ACCEPT 
>>
>>
>>
>>
>> #!/usr/bin/perl 
>>
>> use IO::File; 
>> use PVE::Firewall; 
>> use Data::Dumper; 
>> use Time::HiRes qw ( time alarm sleep usleep ); 
>>
>> while(1){ 
>>
>> $filename = "/etc/pve/firewall/cluster.fw"; 
>>
>> if (my $fh = IO::File->new($filename, O_RDONLY)) { 
>>
>> $cluster_conf = PVE::Firewall::parse_clusterfw_config($filename, $fh, $verbose); 
>> my $cluster_options = $cluster_conf->{options}; 
>>
>> if (!$cluster_options->{enable}) { 
>> print Dumper($cluster_options); 
>> die "error\n"; 
>> } 
>>
>> } 
>> usleep(100); 
>> }; 
>>
>>
>> the script is running fine. 
>>
>>
>> on another host, edit the file (simple open/write), 
>> then the script on first host, return 
>>
>> $VAR1 = {}; 
>> error 
> 
> that is expected, AFAICT, a modify operation shouldn't be: 
> * read FILE -> modify -> write FILE 
> but rather: 
> * read FILE -> modify -> write FILE.TMP -> move FILE.TMP to FILE 
> if it's wanted that always a valid content is read. Else yes, you may have a small 
> time window where the file is truncated. 
> 
> But, file_set_contents - which save_clusterfw_conf uses - does this already[0], 
> so maybe this is the "high-level fuse rename isn't atomic" bug again... 
> May need to take a closer look tomorrow. 
> 
> [0]: https://git.proxmox.com/?p=pve-common.git;a=blob;f=src/PVE/Tools.pm;h=accf6539da94d2b5d5b6f4539310fe5c4d526c7e;hb=HEAD#l213 
> 
>>
>> ----- Mail original ----- 
>> De: "aderumier" <aderumier at odiso.com> 
>> À: "pve-devel" <pve-devel at pve.proxmox.com> 
>> Envoyé: Mardi 8 Janvier 2019 19:15:06 
>> Objet: [pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ? 
>>
>> Hi, 
>> I'm currently debugging a possible firewalling problem. 
>> I'm running some cephfs client in vm, firewalled by proxmox. 
>> cephfs client are really sensitive to network problem, and mainly with packets logss or dropped packets. 
>>
>> I'm really not sure, but I have currently puppet updating my cluster.fw, at regular interval, 
>> and sometimes, I have all the vm on a specific host (or multiple hosts), at the same time, have a small disconnect (maybe some second). 
>>
>>
>> I would like to known, if cluster.fw replication is atomic in /etc/pve/ ? 
>> or if they are any chance, that during file replication, the firewall try to read the file, 
>> it could be empty ? 
>>
>>
>> I just wonder (I'm really really not sure) if I could trigger this: 
>>
>>
>> sub update { 
>> my $code = sub { 
>>
>> my $cluster_conf = load_clusterfw_conf(); 
>> my $cluster_options = $cluster_conf->{options}; 
>>
>> if (!$cluster_options->{enable}) { 
>> PVE::Firewall::remove_pvefw_chains(); 
>> return; 
>> } 
>>
>>
>> cluster.conf not readable/absent/.... , and remove_pvefw_chains called. 
>> then after some seconds, rules are applied again. 
>>
>>
>> I'm going to add some log to try to reproduce it. (BTW, it could be great to logs rules changed, maybe an audit log with a diff could be great) 
>> _______________________________________________ 
>> pve-devel mailing list 
>> pve-devel at pve.proxmox.com 
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
>>
>> _______________________________________________ 
>> pve-devel mailing list 
>> pve-devel at pve.proxmox.com 
>> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 



More information about the pve-devel mailing list