tl;dr What is the best "Rust way" to create some byte storage, in this case a Vec<u8>, store that Vec<u8> in struct field that can be accessed with a key value (like a BTreeMap<usize, &Vec<u8>>), and later read those Vec<u8> from some other structs?
Can this be extrapolated to a general good rust design for similar structs that act as storage and cache for blobs of bytes (Vec<u8>, [u8; 16384], etc.) accessible with a key (an usize offset, a u32 index, a String file path, etc.)?
Goal
I'm trying to create a byte storage struct and impl functions that:
- stores 16384 bytes read from disk on demand into "blocks" of
Vec<u8>of capacity 16384 - other
structwill analyze the variousVec<u8>and may need store their own references to those "blocks" - be efficient: have only one copy of a "block" in memory, avoid unnecessary copying, clones, etc.
Unfortunately, for each implementation attempt, I run into difficult problems of borrowing, lifetime ellision, mutability, copying, or other problems.
Reduced Code example
I created a struct BlockReader that
- creates a
Vec<u8>(Vec<u8>::with_capacity(16384)) typed asBlock - reads from a file (using
File::seekandFile::take::read_to_end) and stores 16384 ofu8into aVec<u8> - stores a reference to the
Vec<u8>within aBTreeMaptyped asBlocks
use std::io::Seek;
use std::io::SeekFrom;
use std::io::Read;
use std::fs::File;
use std::collections::BTreeMap;
type Block = Vec<u8>;
type Blocks<'a> = BTreeMap<usize, &'a Block>;
pub struct BlockReader<'a> {
blocks: Blocks<'a>,
file: File,
}
impl<'a> BlockReader<'a> {
/// read a "block" of 16384 `u8` at file offset
/// `offset` which is multiple of 16384
/// if the "block" at the `offset` is cached in
/// `self.blocks` then return a reference to that
/// XXX: assume `self.file` is already `open`ed file
/// handle
fn readblock(& mut self, offset: usize) -> Result<&Block, std::io::Error> {
// the data at this offset is the "cache"
// return reference to that
if self.blocks.contains_key(&offset) {
return Ok(&self.blocks[&offset]);
}
// have not read data at this offset so read
// the "block" of data from the file, store it,
// return a reference
let mut buffer = Block::with_capacity(16384);
self.file.seek(SeekFrom::Start(offset as u64))?;
self.file.read_to_end(&mut buffer);
self.blocks.insert(offset, & buffer);
Ok(&self.blocks[&offset])
}
}
example use-case problem
There have been many problems with each implementation. For example, two calls to BlockReader.readblock by a struct BlockAnalyzer1 have caused endless difficulties:
pub struct BlockAnalyzer1<'b> {
pub blockreader: BlockReader<'b>,
}
impl<'b> BlockAnalyzer1<'b> {
/// contrived example function
pub fn doStuff(&mut self) -> Result<bool, std::io::Error> {
let mut b: &Block;
match self.blockreader.readblock(3 * 16384) {
Ok(val) => {
b = val;
},
Err(err) => {
return Err(err);
}
}
match self.blockreader.readblock(5 * 16384) {
Ok(val) => {
b = val;
},
Err(err) => {
return Err(err);
}
}
Ok(true)
}
}
results in
error[E0597]: `buffer` does not live long enough
--> src/lib.rs:34:36
|
15 | impl<'a> BlockReader<'a> {
| -- lifetime `'a` defined here
...
34 | self.blocks.insert(offset, & buffer);
| ---------------------------^^^^^^^^-
| | |
| | borrowed value does not live long enough
| argument requires that `buffer` is borrowed for `'a`
35 | Ok(&self.blocks[&offset])
36 | }
| - `buffer` dropped here while still borrowed
However, I ran into many other errors for different permutations of this design, another error I ran into, for example
error[E0499]: cannot borrow `self.blockreader` as mutable more than once at a time
--> src/main.rs:543:23
|
463 | impl<'a> BlockUser1<'a> {
| ----------- lifetime `'a` defined here
...
505 | match self.blockreader.readblock(3 * 16384) {
| ---------------------------------------
| |
| first mutable borrow occurs here
| argument requires that `self.blockreader` is borrowed for `'a`
...
543 | match self.blockreader.readblock(5 * 16384) {
| ^^^^^^^^^^^^^^^^ second mutable borrow occurs here
In BlockReader, I've tried permutations of "Block" storage using Vec<u8>, &Vec<u8>, Box<Vec<u8>>, Box<&Vec<u8>>, &Box<&Vec<u8>>, &Pin<&Box<&Vec<u8>>, etc. However, each implementation permutation runs into various confounding problems with borrowing, lifetimes, and mutability.
Again, I'm not looking for the specific fix. I'm looking for a generally good rust-oriented design approach to this general problem: store a blob of bytes managed by some struct, have other struct get references (or pointers, etc.) to a blob of bytes, read that blob of bytes in loops (while possibly storing new blobs of bytes).
The Question For Rust Experts
How would a rust expert approach this problem?
How should I store the Vec<u8> (Block) in BlockReader.blocks, and also allow other Struct to store their own references (or pointers, or references to pointers, or pinned Box pointers, or etc.) to a Block?
Should the other structs copy or clone a Box<Block> or a Pin<Box<Block>> or something else?
Would using a different storage like a fixed sized array; type Block = [u8; 16384]; be easier to pass references for?
Should other Struct like BlockUser1 be given &Block, or Box<Block>, or &Pin<&Box<&Block>, or something else?
Again, each Vec<u8> (Block) is written once (during BlockReader.readblock) and may be read many times by other Structs by calling BlockReader.readblock and later by saving their own reference/pointer/etc. to that Block (ideally, maybe that's not ideal?).