I am trying to implement streaming of UTF-8 characters from a file. This is what I've got so far, please excuse the ugly code for now.
use std::fs::File;
use std::io;
use std::io::BufRead;
use std::str;
fn main() -> io::Result<()> {
    let mut reader = io::BufReader::with_capacity(100, File::open("utf8test.txt")?);
    loop {
        let mut consumed = 0;
        {
            let buf = reader.fill_buf()?;
            println!("buf len: {}", buf.len());
            match str::from_utf8(&buf) {
                Ok(s) => {
                    println!("====\n{}", s);
                    consumed = s.len();
                }
                Err(err) => {
                    if err.valid_up_to() == 0 {
                        println!("1. utf8 decoding failed!");
                    } else {
                        match str::from_utf8(&buf[..err.valid_up_to()]) {
                            Ok(s) => {
                                println!("====\n{}", s);
                                consumed = s.len();
                            }
                            _ => println!("2. utf8 decoding failed!"),
                        }
                    }
                }
            }
        }
        if consumed == 0 {
            break;
        }
        reader.consume(consumed);
        println!("consumed {} bytes", consumed);
    }
    Ok(())
}
I have a test file with a multibyte character at offset 98 which fails to decode as it does not fit completely into my (arbitrarily-sized) 100 byte buffer. That's fine, I just ignore it and decode what is valid up to the start of that character.
The problem is that after calling consume(98) on the BufReader, the next call to fill_buf() only returns 2 bytes... it seems to have not bothered to read any more bytes into the buffer. I don't understand why. Maybe I have misinterpreted the documentation.
Here is the sample output:
buf len: 100
====
UTF-8 encoded sample plain-text file
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
consumed 98 bytes
buf len: 2
1. utf8 decoding failed!
It would be nice if from_utf8() would return the partially decoded string and the position of the decoding error so I don't have to call it twice whenever this happens, but there doesn't seem to be such a function in the standard library (that I am aware of).
 
    