Consider this code, running on a Linux system (Compiler Explorer link):
#include <filesystem>
#include <cstdio>
int main()
{
    try
    {
        const char8_t bad_path[] = {0xf0, u8'a', 0};  // invalid utf-8, 0xf0 expects continuation bytes
        std::filesystem::path p(bad_path);
        for (auto c : p.u8string())
        {
            printf("%X ", static_cast<uint8_t>(c));
        }
    }
    catch (const std::exception& e)
    {
        printf("error: %s\n", e.what());
    }
}
It deliberately constructs a std::filesystem::path object using a string with incorrect UTF-8 encoding (0xf0 starts a 4-byte character, but 'a' is not a continuation byte; more info here).
When u8string is called, no exception is thrown; I find this surprising as the documentation at cppreference states:
- The result encoding in the case of u8string() is always UTF-8.
 
Checking the implementation of LLVM's libcxx, I see that indeed, there's no validation performed - the string held internally by std::filesystem::path is just copied into a u8string and returned:
_LIBCPP_INLINE_VISIBILITY _VSTD::u8string u8string() const { return _VSTD::u8string(__pn_.begin(), __pn_.end()); }
The GCC implementation (libstdc++) exhibits the same behavior.
Of course this is a contrived example, as I deliberately construct a path from an invalid string to keep things simple. But to my knowledge, the Linux kernel/filesystems do not enforce that file paths are valid UTF-8 strings, so I could encounter a path like that "in the wild" while e.g. iterating a directory.
Am I right to conclude that std::filesystem::path::u8string actually does not guarantee that a valid UTF-8 string will be returned, despite what the documentation says? If so, what is the motivation behind this design?