[pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?

Thomas Lamprecht thomas at lamprecht.org
Tue Jan 8 20:58:51 CET 2019


Hi,

On 1/8/19 7:37 PM, Alexandre DERUMIER wrote:
> I'm able to reproduce with:
> ---------------------------
> on 1 host:
>
> cluster.fw:
> [OPTIONS]
>
> enable: 1
> policy_in: ACCEPT
>
>
>
>
> #!/usr/bin/perl
>
> use IO::File;
> use PVE::Firewall;
> use Data::Dumper;
> use Time::HiRes qw ( time alarm sleep usleep );
>
> while(1){
>
>     $filename = "/etc/pve/firewall/cluster.fw";
>
>     if (my $fh = IO::File->new($filename, O_RDONLY)) {
>
>          $cluster_conf = PVE::Firewall::parse_clusterfw_config($filename, $fh, $verbose);
>         my $cluster_options = $cluster_conf->{options};
>
>         if (!$cluster_options->{enable}) {
>            print Dumper($cluster_options);
>            die "error\n";
>         }
>
>     } 
>     usleep(100);
> };
>
>
> the script is running fine.
>
>
> on another host, edit the file (simple open/write),
> then the script on first host, return
>
> $VAR1 = {};
> error

that is expected, AFAICT,  a modify operation shouldn't be:
* read FILE -> modify -> write FILE
but rather:
* read FILE -> modify -> write FILE.TMP -> move FILE.TMP to FILE
if it's wanted that always a valid content is read. Else yes, you may have a small
time window where the file is truncated.

But, file_set_contents - which save_clusterfw_conf uses - does this already[0],
so maybe this is the "high-level fuse rename isn't atomic" bug again...
May need to take a closer look tomorrow.

[0]: https://git.proxmox.com/?p=pve-common.git;a=blob;f=src/PVE/Tools.pm;h=accf6539da94d2b5d5b6f4539310fe5c4d526c7e;hb=HEAD#l213

>
> ----- Mail original -----
> De: "aderumier" <aderumier at odiso.com>
> À: "pve-devel" <pve-devel at pve.proxmox.com>
> Envoyé: Mardi 8 Janvier 2019 19:15:06
> Objet: [pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?
>
> Hi, 
> I'm currently debugging a possible firewalling problem. 
> I'm running some cephfs client in vm, firewalled by proxmox. 
> cephfs client are really sensitive to network problem, and mainly with packets logss or dropped packets. 
>
> I'm really not sure, but I have currently puppet updating my cluster.fw, at regular interval, 
> and sometimes, I have all the vm on a specific host (or multiple hosts), at the same time, have a small disconnect (maybe some second). 
>
>
> I would like to known, if cluster.fw replication is atomic in /etc/pve/ ? 
> or if they are any chance, that during file replication, the firewall try to read the file, 
> it could be empty ? 
>
>
> I just wonder (I'm really really not sure) if I could trigger this: 
>
>
> sub update { 
> my $code = sub { 
>
> my $cluster_conf = load_clusterfw_conf(); 
> my $cluster_options = $cluster_conf->{options}; 
>
> if (!$cluster_options->{enable}) { 
> PVE::Firewall::remove_pvefw_chains(); 
> return; 
> } 
>
>
> cluster.conf not readable/absent/.... , and remove_pvefw_chains called. 
> then after some seconds, rules are applied again. 
>
>
> I'm going to add some log to try to reproduce it. (BTW, it could be great to logs rules changed, maybe an audit log with a diff could be great) 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel






More information about the pve-devel mailing list