Vectorisation
It is important to understand the vectorised nature of R and that you will almost never need a for loop.
What is a vector? For example, the column hgt in your data is essentially a vector. A variable named hgt containing multiple values.
lets recreate an example vector (a variable named x containig multiple values)
x <- c(1, 2, 3, 4, 5)
Many operations in R are vectorised. This means, they are carried out on each element of the vector simultaneously and there is no need to go through each element one at a time.
Here is an example:
x + 1
# 2 3 4 5 6
As a result, we get another vector, where the operation + 1 was carried out on each element of the original vector.
Therefore, you will not need a for loop.
Just replace the + 1 operation with the appropriate operation for your problem.
What you are looking for is:
- to check whether each element in hgtmeets a certain condition, for example> 15
The operation "condition check" is done in R via logical operators such as > == or < or <= or >= or != .
Lets find out the values in x that are > 3.
x > 3
# FALSE FALSE FALSE TRUE TRUE
What we get is yet another vector that contains the result of the condition check for each element of x.
Now there is one other concept that is missing. How to extract certain values from a vector.
This is done via the index operator [ ].
For example, if I wanted to extract values that are bigger than 3, I would write x[x > 3]. Read this in your mind as "Give me the values of x where x is bigger than 3".
Sampling Distribution
I want to point out that you are missing an important step that your teacher is wanting you to do. It is to repeat the sampling process + calculation of the demanded statistic for each sample 1000 times, in order to get to a sampling distribution check this out for a real life hands on example why this should even be important.
(Remember that I told you to almost never use a for loop. Maybe it is appropriate to use one to run the same function 1000 times.)