Let's assume the input is
<LWS>* <first> <LWS>+ <second> <LWS>+ <integer>
where <LWS> is any whitespace character, including newlines; <first> has one to seven non-whitespace characters; <second> has one to five non-wihitespace characters; <integer> is an optionally signed integer (in hexadecimal if it begins with 0x or 0X, in octal if it begins with 0, or in decimal otherwise); * indicates zero or more of the preceding element; and + indicates one or more of the preceding element.
Let's say you have a structure,
struct record {
char first[8]; /* 7 characters + end-of-string '\0' */
char second[6]; /* 5 characters + end-of-string '\0' */
int number;
};
then you can read the next record from stream in into the structure pointed to by the caller using e.g.
#include <stdlib.h>
#include <stdio.h>
/* Read a record from stream 'in' into *'rec'.
Returns: 0 if success
-1 if invalid parameters
-2 if read error
-3 if non-conforming format
-4 if bug in function
+1 if end of stream (and no data read)
*/
int read_record(FILE *in, struct record *rec)
{
int rc;
/* Invalid parameters? */
if (!in || !rec)
return -1;
/* Try scanning the record. */
rc = fscanf(in, " %7s %5s %d", rec->first, rec->second, &(rec->number));
/* All three fields converted correctly? */
if (rc == 3)
return 0; /* Success! */
/* Only partially converted? */
if (rc > 0)
return -3;
/* Read error? */
if (ferror(in))
return -2;
/* End of input encountered? */
if (feof(in))
return +1;
/* Must be a bug somewhere above. */
return -4;
}
The conversion specifier %7s converts up to seven non-whitespace characters, and %5s up to five; the array (or char pointer) must have room for an additional end-of-string nul byte, '\0', which the scanf() family of functions add automatically.
If you do not specify the length limit, and use %s, the input can overrun the specified buffer. This is a common cause for the common buffer overflow bug.
The return value from the scanf() family of functions is the number of successful conversions (possibly 0), or EOF if an error occurs. Above, we need three conversions to fully scan a record. If we scan just 1 or 2, we have a partial record. Otherwise, we check if a stream error occurred, by checking ferror(). (Note that you want to check ferror() before feof(), because an error condition may also set feof().) If not, we check if the scanning function encountered end-of-stream before anything was converted, using feof().
If none of the above cases were met, then the scanning function returned zero or negative without neither ferror() or feof() returning true. Because the scanning pattern starts with (whitespace and) a conversion specifier, it should never return zero. The only nonpositive return value from the scanf() family of functions is EOF, which should cause feof() to return true. So, if none of the above cases were met, there must be a bug in the code, triggered by some odd corner case in the input.
A program that reads structures from some stream into a dynamically allocated buffer typically implements the following pseudocode:
Set ptr = NULL # Dynamically allocated array
Set num = 0 # Number of entries in array
Set max = 0 # Number of entries allocated for in array
Loop:
If (num >= max):
Calculate new max; num + 1 or larger
Reallocate ptr
If reallocation failed:
Report out of memory
Abort program
End if
End if
rc = read_record(stream, ptr + num)
If rc == 1:
Break out of loop
Else if rc != 0:
Report error (based on rc)
Abort program
End if
End Loop