[pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?
Thomas Lamprecht
thomas at lamprecht.org
Tue Jan 8 20:58:51 CET 2019
Hi,
On 1/8/19 7:37 PM, Alexandre DERUMIER wrote:
> I'm able to reproduce with:
> ---------------------------
> on 1 host:
>
> cluster.fw:
> [OPTIONS]
>
> enable: 1
> policy_in: ACCEPT
>
>
>
>
> #!/usr/bin/perl
>
> use IO::File;
> use PVE::Firewall;
> use Data::Dumper;
> use Time::HiRes qw ( time alarm sleep usleep );
>
> while(1){
>
> $filename = "/etc/pve/firewall/cluster.fw";
>
> if (my $fh = IO::File->new($filename, O_RDONLY)) {
>
> $cluster_conf = PVE::Firewall::parse_clusterfw_config($filename, $fh, $verbose);
> my $cluster_options = $cluster_conf->{options};
>
> if (!$cluster_options->{enable}) {
> print Dumper($cluster_options);
> die "error\n";
> }
>
> }
> usleep(100);
> };
>
>
> the script is running fine.
>
>
> on another host, edit the file (simple open/write),
> then the script on first host, return
>
> $VAR1 = {};
> error
that is expected, AFAICT, a modify operation shouldn't be:
* read FILE -> modify -> write FILE
but rather:
* read FILE -> modify -> write FILE.TMP -> move FILE.TMP to FILE
if it's wanted that always a valid content is read. Else yes, you may have a small
time window where the file is truncated.
But, file_set_contents - which save_clusterfw_conf uses - does this already[0],
so maybe this is the "high-level fuse rename isn't atomic" bug again...
May need to take a closer look tomorrow.
[0]: https://git.proxmox.com/?p=pve-common.git;a=blob;f=src/PVE/Tools.pm;h=accf6539da94d2b5d5b6f4539310fe5c4d526c7e;hb=HEAD#l213
>
> ----- Mail original -----
> De: "aderumier" <aderumier at odiso.com>
> À: "pve-devel" <pve-devel at pve.proxmox.com>
> Envoyé: Mardi 8 Janvier 2019 19:15:06
> Objet: [pve-devel] firewall : possible bug/race when cluster.fw is replicated and rules are updated ?
>
> Hi,
> I'm currently debugging a possible firewalling problem.
> I'm running some cephfs client in vm, firewalled by proxmox.
> cephfs client are really sensitive to network problem, and mainly with packets logss or dropped packets.
>
> I'm really not sure, but I have currently puppet updating my cluster.fw, at regular interval,
> and sometimes, I have all the vm on a specific host (or multiple hosts), at the same time, have a small disconnect (maybe some second).
>
>
> I would like to known, if cluster.fw replication is atomic in /etc/pve/ ?
> or if they are any chance, that during file replication, the firewall try to read the file,
> it could be empty ?
>
>
> I just wonder (I'm really really not sure) if I could trigger this:
>
>
> sub update {
> my $code = sub {
>
> my $cluster_conf = load_clusterfw_conf();
> my $cluster_options = $cluster_conf->{options};
>
> if (!$cluster_options->{enable}) {
> PVE::Firewall::remove_pvefw_chains();
> return;
> }
>
>
> cluster.conf not readable/absent/.... , and remove_pvefw_chains called.
> then after some seconds, rules are applied again.
>
>
> I'm going to add some log to try to reproduce it. (BTW, it could be great to logs rules changed, maybe an audit log with a diff could be great)
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
More information about the pve-devel
mailing list