[PVE-User] HDD errors in VMs
Emmanuel Kasper
e.kasper at proxmox.com
Mon Jan 4 13:49:07 CET 2016
On 01/04/2016 12:30 PM, Michael Pöllinger wrote:
> Oh sorry.
>
> I missed the links:
> https://forum.proxmox.com/threads/task-xxx-blocked-for-more-than-120-seconds.25167/
> There are numerous discussions about it atm.
>
Hi Michael !
task-xxx-blocked-for-more-than-120-seconds:
this message just means: a process could not do IO for 120s
it is usually not a kernel bug but most of the times *an operative warning*
this can have many causes, but the most frequent reason I know:
* you're trying to do to much I/O on a single drive. Typical case: 8
VMs running on a SATA drive, all running /etc/cron.daily/mlocate at 06:01
solution: introduce time delta in your daily jobs
* a VM or host with a big amount of RAM ( >= 32GB) doing heavy writes
alls the time, but a big amount of unused ram ( like 5 GB used)
The heavy writes will fill up the page cache up to to 25 GB, then the
kernel will try to flush the cache do the disk with the highest
scheduling priority causing *all* other processes to wait until this is
done.
solution: reduce the kernel writeback parameters to a lower value
see
https://www.suse.com/documentation/opensuse114/book_tuning/data/cha_tuning_memory_vm.html
for details
you can also force the kernel to drop its page cache like putting this
in a cronjob (though not very clean)
echo 3 > /proc/sys/vm/drop_caches
* a process is running berserk and tries to use an insane amount of ram.
For example Apache in the prefork model, starting a hundreds of
processes for PHP after your website has been slashdotted.
To cope with that the kernel will grap extra memory from the hard drive
( your swapspace ) for the extra processes. But at this is terribly slow
and if this has to be done done hundred of times, the apache processes
will for sure quickly be stuck with the same message
solution: put a http cache server before your web server like Varnish or
nginx in micro caching mode
@gerald:
hopefully the command echo 3 > /proc/sys/vm/drop_caches should disappear
from your process list at some point
Usually this command should not take more than 5-10s to execute.
if not, you don't have enough I/O in your storage system from you're
trying to achieve. Buy some SSDs !
Emmanuel
More information about the pve-user
mailing list