[pve-devel] [PATCH docs] pveceph: document the change of ceph networks
Aaron Lauterer
a.lauterer at proxmox.com
Fri Jan 2 17:58:53 CET 2026
thanks for the feedback
a v2 is now available
https://lore.proxmox.com/pve-devel/20260102165754.650450-1-a.lauterer@proxmox.com/T/#u
On 2026-01-02 16:03, Maximiliano Sandoval wrote:
> Aaron Lauterer <a.lauterer at proxmox.com> writes:
>
> Some small points below:
>
>> ceph networks (public, cluster) can be changed on the fly in a running
>> cluster. But the procedure, especially for the ceph public network is
>> a bit more involved. By documenting it, we will hopefully reduce the
>> number of issues our users run into when they try to attempt a network
>> change on their own.
>>
>> Signed-off-by: Aaron Lauterer <a.lauterer at proxmox.com>
>> ---
>> Before I apply this commit I would like to get at least one T-b where you tested
>> both scenarios to make sure the instructions are clear to follow and that I
>> didn't miss anything.
>>
>> pveceph.adoc | 186 +++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 186 insertions(+)
>>
>> diff --git a/pveceph.adoc b/pveceph.adoc
>> index 63c5ca9..c4a4f91 100644
>> --- a/pveceph.adoc
>> +++ b/pveceph.adoc
>> @@ -1192,6 +1192,192 @@ ceph osd unset noout
>> You can now start up the guests. Highly available guests will change their state
>> to 'started' when they power on.
>>
>> +
>> +[[pveceph_network_change]]
>> +Network Changes
>> +~~~~~~~~~~~~~~~
>> +
>> +It is possible to change the networks used by Ceph in a HCI setup without any
>> +downtime if *both the old and new networks can be configured at the same time*.
>> +
>> +The procedure differs depending on which network you want to change.
>> +
>> +After the new network has been configured on all hosts, make sure you test it
>> +before proceeding with the changes. One way is to ping all hosts on the new
>> +network. If you use a large MTU, make sure to also test that it works. For
>> +example by sending ping packets that will result in a final packet at the max
>> +MTU size.
>> +
>> +To test an MTU of 9000, you will need the following packet sizes:
>> +
>> +[horizontal]
>> +IPv4:: The overhead of IP and ICMP is '28' bytes; the resulting packet size for
>> +the ping then is '8972' bytes.
>
> I would personally mention that this is "generally" the case, as one
> could be dealing with bigger headers, e.g. when q-in-q is used.
>
>> +IPv6:: The overhead is '48' bytes and the resulting packet size is
>> +'8952' bytes.
>> +
>> +The resulting ping command will look like this for an IPv4:
>> +[source,bash]
>> +----
>> +ping -M do -s 8972 {target IP}
>> +----
>> +
>> +When you are switching between IPv4 and IPv6 networks, you need to make sure
>> +that the following options in the `ceph.conf` file are correctly set to `true`
>> +or `false`. These config options configure if Ceph services should bind to IPv4
>> +or IPv6 addresses.
>> +----
>> +ms_bind_ipv4 = true
>> +ms_bind_ipv6 = false
>> +----
>> +
>> +[[pveceph_network_change_public]]
>> +Change the Ceph Public Network
>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +The Ceph Public network is the main communication channel in a Ceph cluster
>> +between the different services and clients (for example, a VM). Changing it to
>> +a different network is not as simple as changing the Ceph Cluster network. The
>> +main reason is that besides the configuration in the `ceph.conf` file, the Ceph
>> +MONs (monitors) have an internal configuration where they keep track of all the
>> +other MONs that are part of the cluster, the 'monmap'.
>> +
>> +Therefore, the procedure to change the Ceph Public network is a bit more
>> +involved:
>> +
>> +1. Change `public_network` in the `ceph.conf` file
>
> This is mentioned in the warning below, but maybe more emphasis could be
> made here to only touch this one value.
>
> Additionally, please use the full path here. There are versions at
> /etc/pve and /etc/ceph and this is the first time in this new section
> where one needs to modify one (even if it is mentioned below in the
> expanded version).
>
>> +2. Restart non MON services: OSDs, MGRs and MDS on one host
>> +3. Wait until Ceph is back to 'Health_OK'
>
> Should be HEALTH_OK instead.
>
>> +4. Verify services are using the new network
>> +5. Continue restarting services on the next host
>> +6. Destroy one MON
>> +7. Recreate MON
>> +8. Wait until Ceph is back to 'Health_OK'
>
> Should be HEALTH_OK instead.
>
>> +9. Continue destroying and recreating MONs
>> +
>> +You first need to edit the `/etc/pve/ceph.conf` file. Change the
>> +`public_network` line to match the new subnet.
>> +
>> +----
>> +cluster_network = 10.9.9.30/24
>> +----
>> +
>> +WARNING: Do not change the `mon_host` line or any `[mon.HOSTNAME]` sections.
>> +These will be updated automatically when the MONs are destroyed and recreated.
>> +
>> +NOTE: Don't worry if the host bits (for example, the last octet) are set by
>> +default, the netmask in CIDR notation defines the network part.
>> +
>> +After you have changed the network, you need to restart the non MON services in
>> +the cluster for the changes to take effect. Do so one node at a time! To restart all
>> +non MON services on one node, you can use the following commands on that node.
>> +Ceph has `systemd` targets for each type of service.
>> +
>> +[source,bash]
>> +----
>> +systemctl restart ceph-osd.target
>> +systemctl restart ceph-mgr.target
>> +systemctl restart ceph-mds.target
>> +----
>> +NOTE: You will only have MDS' (Metadata Server) if you use CephFS.
>> +
>> +NOTE: After the first OSD service got restarted, the GUI will complain that
>> +the OSD is not reachable anymore. This is not an issue,; VMs can still reach
>
> Is the double punctuation here intentional?
>
>> +them. The reason for the message is that the MGR service cannot reach the OSD
>> +anymore. The error will vanish after the MGR services get restarted.
>> +
>> +WARNING: Do not restart OSDs on multiple hosts at the same time. Chances are
>> +that for some PGs (placement groups), 2 out of the (default) 3 replicas will
>> +be down. This will result in I/O being halted until the minimum required number
>> +(`min_size`) of replicas is available again.
>> +
>> +To verify that the services are listening on the new network, you can run the
>> +following command on each node:
>> +
>> +[source,bash]
>> +----
>> +ss -tulpn | grep ceph
>> +----
>> +
>> +NOTE: Since OSDs will also listen on the Ceph Cluster network, expect to see that
>> +network too in the output of `ss -tulpn`.
>> +
>> +Once the Ceph cluster is back in a fully healthy state ('Health_OK'), and the
>
> Same here, HEALTH_OK.
>
>> +services are listening on the new network, continue to restart the services on
>> +the host.
>> +
>> +The last services that need to be moved to the new network are the Ceph MONs
>> +themselves. The easiest way is to destroy and recreate each monitor one by
>> +one. This way, any mention of it in the `ceph.conf` and the monitor internal
>> +`monmap` is handled automatically.
>> +
>> +Destroy the first MON and create it again. Wait a few moments before you
>> +continue on to the next MON in the cluster, and make sure the cluster reports
>> +'Health_OK' before proceeding.
>> +
>> +Once all MONs are recreated, you can verify that any mention of MONs in the
>> +`ceph.conf` file references the new network. That means mainly the `mon_host`
>> +line and the `[mon.HOSTNAME]` sections.
>> +
>> +One final `ss -tulpn | grep ceph` should show that the old network is not used
>> +by any Ceph service anymore.
>> +
>> +[[pveceph_network_change_cluster]]
>> +Change the Ceph Cluster Network
>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +The Ceph Cluster network is used for the replication traffic between the OSDs.
>> +Therefore, it can be beneficial to place it on its own fast physical network.
>> +
>> +The overall procedure is:
>> +
>> +1. Change `cluster_network` in the `ceph.conf` file
>> +2. Restart OSDs on one host
>> +3. Wait until Ceph is back to 'Health_OK'
>> +4. Verify OSDs are using the new network
>> +5. Continue restarting OSDs on the next host
>> +
>> +You first need to edit the `/etc/pve/ceph.conf` file. Change the
>> +`cluster_network` line to match the new subnet.
>> +
>> +----
>> +cluster_network = 10.9.9.30/24
>> +----
>> +
>> +NOTE: Don't worry if the host bits (for example, the last octet) are set by
>> +default; the netmask in CIDR notation defines the network part.
>> +
>> +After you have changed the network, you need to restart the OSDs in the cluster
>> +for the changes to take effect. Do so one node at a time!
>> +To restart all OSDs on one node, you can use the following command on the CLI on
>> +that node:
>> +
>> +[source,bash]
>> +----
>> +systemctl restart ceph-osd.target
>> +----
>> +
>> +WARNING: Do not restart OSDs on multiple hosts at the same time. Chances are
>> +that for some PGs (placement groups), 2 out of the (default) 3 replicas will
>> +be down. This will result in I/O being halted until the minimum required number
>> +(`min_size`) of replicas is available again.
>> +
>> +To verify that the OSD services are listening on the new network, you can either
>> +check the *OSD Details -> Network* tab in the *Ceph -> OSD* panel or by running
>> +the following command on the host:
>> +[source,bash]
>> +----
>> +ss -tulpn | grep ceph-osd
>> +----
>> +
>> +NOTE: Since OSDs will also listen on the Ceph Public network, expect to see that
>> +network too in the output of `ss -tulpn`.
>> +
>> +Once the Ceph cluster is back in a fully healthy state ('Health_OK'), and the
>
> Same, should be HEALTH_OK.
>
>> +OSDs are listening on the new network, continue to restart the OSDs on the next
>> +host.
>> +
>> +
>> [[pve_ceph_mon_and_ts]]
>> Ceph Monitoring and Troubleshooting
>> -----------------------------------
>
More information about the pve-devel
mailing list