That's a common misconception and one that used to trip me up. You're right that a functor has two parts: one that maps objects to objects and one that maps functions to functions (more generally, arrows to arrows, but that's not really germane here). Now, when I have
instance Functor F where
fmap f x = ...
You've already correctly surmised that fmap takes functions to functions. However, we already have a way to take objects to objects. Categorically, our objects are not values, they're types. So the "objects to objects" part is something that should map a type to another type. And that thing is called F. The name of our functor literally is the mapping from objects to objects. Given a type a, the type F a is some other type lifted by our functor.
Now that raises the fair question: What is return? It takes an a and produces an F a (for a monad F). More specifically, given a fixed monad F, the signature is
return :: forall a. a -> F a
Now carefully read that. That says "given any type a, I can come up with a function from a to F a". That is, it takes an object in our category (a type) and maps it into an arrow in our category (a function). A mapping from an object to an arrow is called a natural transformation, and that's exactly what return is.
Categorically speaking, a monad is a functor (the underlying type F together with fmap), together with two natural transformations:
return, which is a natural transformation 1 -> F (where 1 is the identity functor), and
join, which is a natural transformation F^2 -> F (where F^2 is F composed with itself)
That is, for any type a, return has type a -> F a and join has type F (F a) -> F a[1].
What would a typeclass with return and fmap look like? I'm not sure. I don't know of a Haskell library that implements that typeclass, and I'm not entirely sure what the laws would look like. A good guess might be fmap f . return === return . f, but we actually get that as a free theorem, so that's not our law. If you or someone else knows of this typeclass somewhere in the Hackage ecosystem, feel free to let me know.
[1] Haskell uses an equivalent definition of "monad" in terms of the bind operator (>>=) rather than join. They're equivalent mathematically, and I opted for the simpler definition here.