read and sysread have very similar documentation. What are the differences between the two?
Asked
Active
Viewed 3,533 times
21
ikegami
- 367,544
- 15
- 269
- 518
-
grrr :-) that's a good question – G. Cito Mar 30 '16 at 17:01
-
4Credit to @G. Cito for prompting this question. – ikegami Mar 30 '16 at 17:02
1 Answers
27
About read:
readsupports PerlIO layers.readworks with any Perl file handle[1].readbuffers.readobtains data from the system in fixed sized blocks of 8 KiB[2].readmay block if less data than requested is available[3].
About sysread:
sysreaddoesn't support PerlIO layers (meaning it requires a raw a.k.a. binary handle).sysreadonly works with Perl file handles that map to a system file handle/descriptor[4].sysreaddoesn't buffer.sysreadperforms a single system call.sysreadreturns immediately if data is available to be returned, even if the amount of data is less than the amount requested.
Summary and conclusions:
readworks with any Perl file handle, whilesysreadis limited to Perl file handles mapped to a system file handle/descriptor.readisn't compatible withselect[5], whilesysreadis compatible withselect.readcan perform decoding for you, whilesysreadrequires that you do your own decoding.readshould be faster for very small reads, whilesysreadshould be faster for very large reads.
Notes:
These include, for example, tied file handles and those created using
open(my $fh, '<', \$var).Before 5.14, Perl read in 4 KiB blocks. Since 5.14, the size of the blocks is configurable when you build
perl, with a default of 8 KiB.In my experience,
readwill return exactly the amount requested (if possible) when reading from a plain file, but may return less when reading from a pipe. These results are by no means guaranteed.filenoreturns a non-negative number for these. These include, for example, handles that read from plain files, from pipes and from sockets, but not those mentioned in [1].I'm referring to the 4-argument one called by IO::Select.
ikegami
- 367,544
- 15
- 269
- 518
-
1Great summary. - should be in perlfunc. This: "`read` should be faster for small reads, while `sysread` should be faster for large reads." is exactly what is needed. Of course, given the infinite possibilities of the real word, it may not **always** be true but a mostly truthy perlish guideline is what I want. – G. Cito Mar 30 '16 at 17:10
-
1In a response to [another question](http://stackoverflow.com/a/36208336/2019415) I used [`Stream::Reader`](https://metacpan.org/pod/Stream::Reader). As an experiment I replaced `read` with `sysread` in `Reader.pm` and gained 9-10% throughput - it seemed too easy. Besides the obvious bits (buffering, encoding,) is it just a question of benchmarking and testing? Can you speak to any data integrity, failover/robustness elements of this? – G. Cito Mar 30 '16 at 17:19
-
1@G.Cito in reusable code such as Stream::Reader, you have to assume filehandles may have layers, so sysread is not an option. – ysth Mar 30 '16 at 20:10
-
1@G. Cito, Talk of "UTF-8 mode" implies it's not just a possibility that they have layers, but that it's a supported mode of operation. That prevents `sysread` from being a valid option. – ikegami Mar 30 '16 at 20:34
-
3Also, you can `read` from things that aren't actually files (perhaps you opened a filehandle to a scalarref or `TIEHANDLE`d something), but you can only `sysread` something with a positive `fileno()`. – hobbs Mar 30 '16 at 20:51
-
-
@cuonglm, They're not even similar. I think you mean `print` and `syswrite`. Of the two, I've only ever used `print` because it's easier to use. I don't know if there are any other differences. – ikegami Apr 04 '16 at 15:52