Joining str as actual str is a red herring and not what Python itself does: Python operates on mutable bytes, not the str, which also removes the need to know string internals. In specific, str.join converts its arguments to bytes, then pre-allocates and mutates its result.
This directly corresponds to:
- a wrapper to encode/decode strarguments to/frombytes
- summing the lenof elements and separators
- allocating a mutable bytesarrayto construct the result
- copying each element/separator directly into the result
# helper to convert to/from joinable bytes
def str_join(sep: "str", elements: "list[str]") -> "str":
    joined_bytes = bytes_join(
        sep.encode(),
        [elem.encode() for elem in elements],
    )
    return joined_bytes.decode()
# actual joining at bytes level
def bytes_join(sep: "bytes", elements: "list[bytes]") -> "bytes":
    # create a mutable buffer that is long enough to hold the result
    total_length = sum(len(elem) for elem in elements)
    total_length += (len(elements) - 1) * len(sep)
    result = bytearray(total_length)
    # copy all characters from the inputs to the result
    insert_idx = 0
    for elem in elements:
        result[insert_idx:insert_idx+len(elem)] = elem
        insert_idx += len(elem)
        if insert_idx < total_length:
            result[insert_idx:insert_idx+len(sep)] = sep
            insert_idx += len(sep)
    return bytes(result)
print(str_join(" ", ["Hello", "World!"]))
Notably, while the element iteration and element copying basically are two nested loops, they iterate over separate things. The algorithm still touches each character/byte only thrice/once.