Very slow IO performances: how to diagnose (lemmy.g97.top)

submitted 1 year ago* (last edited 1 year ago) by gabriele97@lemmy.g97.top to c/selfhosted@lemmy.world

4 comments fedilink hide all child comments

EDIT: this is a full benchmark I run on my pool: https://gist.github.com/thegabriele97/9d82ddfbf0f4ec00dbcebc4d6cda29b3.

Hi! I ran into this issue since I started mu homelab adventure a couple of months ago, so I am still very noob, sorry for this.

I decided today to understand what happens and why it happens but I need your help to understand it better.

My homelab consists of a proxmox setup with three 1 TB HDD s in raidz1 (ZFS) (I know the downsides of this and I took my decisions) and 8 GB of RAM, of which 3.5 are assigned to a VM. The remaining parts are used by some LXC containers.

During high worloads (i.e. copying a file, downloading something via torrent/jdownloader) everything is very slow and other services start to be unresponsive due to the high IO delay.

I decided to test the three single devices with this command: fio --ioengine=libaio --filename=/dev/sda --size=4G --time_based --name=fio --group_reporting --runtime=10 --direct=1 --sync=1 --iodepth=1 --rw=randread --bs=4k --numjobs=32

And more or less they (sda, sdb, sdc) give this results:

Jobs: 32 (f=32): [r(32)][100.0%][r=436KiB/s][r=109 IOPS][eta 00m:00s]
fio: (groupid=0, jobs=32): err= 0: pid=3350293: Sat Jun 24 11:07:02 2023
  read: IOPS=119, BW=479KiB/s (490kB/s)(4968KiB/10378msec)
    slat (nsec): min=4410, max=40660, avg=12374.56, stdev=5066.56
    clat (msec): min=17, max=780, avg=260.78, stdev=132.27
     lat (msec): min=17, max=780, avg=260.79, stdev=132.27
    clat percentiles (msec):
     |  1.00th=[   26],  5.00th=[   50], 10.00th=[   80], 20.00th=[  140],
     | 30.00th=[  188], 40.00th=[  230], 50.00th=[  264], 60.00th=[  296],
     | 70.00th=[  326], 80.00th=[  372], 90.00th=[  430], 95.00th=[  477],
     | 99.00th=[  617], 99.50th=[  634], 99.90th=[  768], 99.95th=[  785],
     | 99.99th=[  785]
   bw (  KiB/s): min=  256, max=  904, per=100.00%, avg=484.71, stdev= 6.17, samples=639
   iops        : min=   64, max=  226, avg=121.14, stdev= 1.54, samples=639
  lat (msec)   : 20=0.32%, 50=4.91%, 100=8.13%, 250=32.85%, 500=49.68%
  lat (msec)   : 750=3.86%, 1000=0.24%
  cpu          : usr=0.01%, sys=0.00%, ctx=1246, majf=11, minf=562
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1242,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=479KiB/s (490kB/s), 479KiB/s-479KiB/s (490kB/s-490kB/s), io=4968KiB (5087kB), run=10378-10378msec

Disk stats (read/write):
  sda: ios=1470/89, merge=6/7, ticks=385624/14369, in_queue=405546, util=96.66%

Am I wrong or it is a very bad results? Why? The three identical HDs are this one: https://smarthdd.com/database/APPLE-HDD-HTS541010A9E662/JA0AB560/

I jope you can help me. Thank you!

top 4 comments

sorted by: hot top controversial new old

[-] sun_is_ra@sh.itjust.works 2 points 1 year ago

I don't know about fio but I normally use iotop command to identify the process that is doing too much I/O operations

[-] theterrasque@infosec.pub 1 points 1 year ago

I have a zfs raid1 with 5 disks, and had some very bad performance. I used atop to figure out that one disk was the problem. I replaced that disk, resynced, and now performance is as expected.

[-] terribleplan@lemmy.nrd.li 1 points 1 year ago

At one point I was having intermittent performance issues with my pool, and the issue turned out to be scrubs being too aggressive (even though most all the documentation I read said scrubs should not adversely impact user I/O, they totally did)

[-] qazwsxedcrfv000@lemmy.unknownsys.com 1 points 1 year ago* (last edited 1 year ago)

What record size have you set for your dataset? If you are not doing a lot of small writes or you can tolerate the fragmentation, better set it to 1M.

Also,

...8 GB of RAM, of which 3.5 are assigned to a VM...

Default ZFS installation reserves half of the total system memory for its ARC. In your case that means 4GB. And your VM is taking 3.5GB. Are you running anything else? Also is the assignment to VM dynamic? ZFS will release portion of the reserved RAM when the overall demand gets stringent. And that will have adverse impact on read performance.

this post was submitted on 24 Jun 2023

5 points (100.0% liked)

Selfhosted

40198 readers

444 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago

MODERATORS

HybridSarcasm@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz