Ok, so lets do this step by step. You already identified the most important information which is:
- Resolution of 480x320
- 14 frames per second
- 5 second recording time
- 6 different statuses
It's all just a simple multiplication of those numbers. The only catch is, is that you can't simply use the 6 for the number of statues but you need to do something with it. The text tells you that this 6 responds to every single pixel in a single frame for that recording. As such it's (most likely) the color depth. The color depth of a picture is usually given as the number of bits that is used to "encode" the color. It would be quite cumbersome to write down the names of the colors each pixel has (and would take a lot more space).
So what you need to do is convert that decimal number 6 to its binary equivalent. There are various approaches on how to do it but the end result should always be the binary representation 110. With that information you know that the minimum amount of bits to represent the decimal number 6 in binary is 3.
If it was an IRL example you would have to account for data structures and definitions. So e.g. for a bitmap it looks like you would need at least 4 bit. As your exercise doesn't specify any additional information, the person who came up with that question likely wants you to use 3 bits as the color depth.
Now that we have all the details it's just:
color depth * resolution * frames per second * seconds
3 bit * (480 pixels * 320 pixels) * 14 * 5 = 32 256 000 bit
32 256 000 bit / 8 bit = 4 032 000 bytes
What that exercise is doing is checking whenever you have an understanding of the technical terms and are able to convert decimal numbers to binary ones.
For fun you could assume you'd need 105 different kinds of gray. In that case you would end up with 9187 Kilobytes.