btrfs HUGE metadata allocation

Question

This question is almost the same as btrfs: HUGE metadata allocated , except for the magnitude of data size on the partition, and a new version of linux kernel 4.4 .

I'm trying to make an full-image backup of a btrfs partition. btrfs fil usage shows:

Overall:
    Device size:        4.50GiB
    Device allocated:   3.17GiB
    Device unallocated: 1.33GiB
    Device missing:       0.00B
    Used:               1.70GiB
    Free (estimated):   1.58GiB       (min: 936.52MiB)
    Data ratio:            1.00
    Metadata ratio:        2.00
    Global reserve:   512.00MiB       (used: 0.00B)

Data,single: Size:1.85GiB, Used:1.61GiB
    /dev/vda2      1.85GiB

Metadata,DUP: Size:640.00MiB, Used:49.38MiB
    /dev/vda2      1.25GiB

System,DUP: Size:32.00MiB, Used:16.00KiB
    /dev/vda2     64.00MiB

I found that metadata has taken up to 1.25GiB while the real metadata size is only less than 100MiB. As you can see, the whole disk volume is 4.5GiB. Such waste of space is not acceptable.

I've tried to run btrfs balance start -m and btrfs balance start -musage=x where x varies from 0 to 50 but none of them helped.

Is there any way to force btrfs reduce the allocation size of metadata? Or, is there any way to resize a btrfs partition near the size it actually taked?

Ben · Accepted Answer · 2019-05-12T16:44:17.503

Finally I've found this in BTRFS wiki:

A typical size of metadata block group is 256MiB (filesystem smaller than 50GiB) and 1GiB (larger than 50GiB), for data it’s 1GiB. The system block group size is a few megabytes.

So that's why the metadata is taking up 512MiB (= 256MiB * 2 for default RAID1 of metadata). However, chunk size of metadata seems unable to specify by user. See this mail.

I don't know the purpose to default to such huge chunk size (256MiB). It might be for reducing metadata fragmention, considering nowadays large storage capacity. Anyway currently there's no way to deal with it. So just forget about btrfs when you expect extra small partition.

Some extra story about this question:

It was long ago when I tried to make a disk dump of a template VM, which uses btrfs as it's root partition, that brought the situation in this question to me. Although the unallocated part doesn't add to the size after passing throught a compressor, it does increases the raw image size, which is inconvient sometimes. Meanwhile, the unused metadata space is still a significant waste on extra small volume(less than 5GiB).

In the end I have to use ext4 for rootfs, lefting out all fancy features btrfs provides (as well as risk maybe). Hopes one day btrfs becomes mature enough, and exposes the control of chunk size. Then I'll try building a dump again with it.

PS: Mixed chunk is an unperfect way for small volume, as well as uses btrfs backup rather than making a full dump. The former still has the problem of wasting free space, if one can't adjust the data chunk size, and other potential problems. The latter just requires extra space and step for build a VM from the dump, don't run on environment without btrfs support, and needs more space.

score 2 · Answer 2 · answered Mar 18 '18 at 19:23

First of all, BTRFS does allocate metadata (and data) one chunk at the time. Each chunk is 1GB. Even if a chunk of metadata is allocated it does not mean that this chunk is fully utilized. Keep in mind that BTRFS also stores smaller files in the metadata which may contribute to your "high" metadata usage.

By default BTRFS also duplicate metadata to increase the chance that your filesystem can recover in case of a corruption. Data is not duplicated.

You can reduce metadata usage by rebalancing your metadata to single profile at the cost of chance of recovery which may be desirable depending on your use case. You do that like so:

btrfs balance start -dconvert=single /mountpoint

You can also look up mixed block groups which will make BTRFS not allocate separate metadata chunks, but store data and metadata in the same chunks.

Also it is worth mentioning that when you run balance with the usage filter what you are saying is, only balance chunks that have a utilization factor of less than X.

score 2 · Answer 3 · answered May 08 '19 at 10:40

You write the solution on your question, the limit you impose to yourself is quite low:

I've tried to run btrfs balance start -m and btrfs balance start -musage=x where x varies from 0 to 50 but none of them helped.

The 50 indicates that you are allowing to have chunks with 100-50=50% non used space (wasted). If you put 60 you will tell you only wnat chunks with at most 40% wasted space, so chunks with more free space will be merged and freed.

Just use a bigger number, that number indicate how much percent of the chunk space must be used on each chunk, if a chunk has less usage than that percentage it will be merged with others into new chunk, freeing chunks.

Just try with 55, 60, 65, 70 ... 85, 90, 95, 100 till you get the desired result.

Or if you have plenty of time just use 100 directly, that way all chunks will be rellocated and it will use as less chunks as possible.

Putting 100 does not mean each chunk (neither each chunk less 1) will be 100% filled, but it means all chunks except one will be filled to the maximum possible, so it will make as much chunks as possible to be freed, at the cost of moving arround a lot of data / metadata, that is why all people recomend to try with a bigger value in low increments, to move as less data / metadata as possible until user is happy with wasted space.

Hope one day documentation will be more clear for users like me (novice) and not only for (experts) advance people... it took me a while to discover that the chunk size was 1GiB for data... i was writting a small (<1KiB) file on a newly btrfs raid 1 (two devices) and wow, there were 2GiB less on free space (one on each device)... and i was thinking all my data was getting lost, since i write more files and free size did not change... there were all being writted on one whole chunk (really two chunks, one on each device)... till i understand there is a pre-allocation in units of 1GiB.

If one chunk is not filled, it still takes 1GiB of space; so if you have two chunks at 75% filled, you are wasting 25% of two chunks of 1GiB, that is 250MiB on each chunk, so in total 500MiB, since i talk about RAID 1, it also happens the same on both devices so in total the waste is 1GiB of the 4GiB (2 chunks * 1GiB * 2 devices), that is a 25% wasted space.

But since you are putting 50 as the value, you are accepting a 100%-50%=50% wasted space. If you put 75, then 100%-75%=25%, so only 25% wasted space. Ando so on.

If you want to minimize wasted space, use 99 or 100, etc., a high value; but be aware that implies a lot of moves because of CoW (Copy on Write), extra caution if using SSD / NVME / etc., also extra care on USB flash memories/cards/etc.

Hope this heps you and others to understand.

Note: If someone knows how to force Btrfs to not use new chunks until actual chunks are filled, that would be great for me to know! I mean by not manually doing a balance

btrfs HUGE metadata allocation

3 Answers3