The problem is more or less classic.  The overload resolution starts by
building a list of possible functions; in this case, functions named
operator*.  To do this, it adds all operator* functions which are in
scope to the list, and it tries to instantiate all function templates by 
applying type deduction; if type deduction succeeds, it adds the
instantiation of the template to the list.  (A function template is
not a function.  An instantiation of the function template is a
function.)
The rules for template type deduction are different than those used in
overload resolution.  In particular, only a very small set of
conversions are considered.  User defined conversion operators are not
considered.  The result is that in m1 * m2, type deduction for
operator* fails (since it would require a conversion which isn't
considered).  So no instantiation of the function template is added to
the list, and there is no other operator*.
More generally: you're operator T2() wouldn't allow type deduction
even if it were allowed; there are a infinite number of conversions
which would match operator*.  I suspect, in fact, that you've made it
too general; that you want an operator Matrix<M, N, T2>().  (Not that
this will help here, but there are contexts where it might eliminate an
ambiguity.)
You might be able to make it work by defining a:
template<size_t P, tyepname OtherT>
Matrix<M, P, T> operator*( Matrix<N, P, T> const& rhs ) const;
, then doing the conversion inside the operator*.  (I haven't tried it,
and am not sure, but I think your existing operator* should be
considered “more specialized”, and thus be chosen when type
deduction succeeds for both.)
Having said this, I think the way you're doing it is the wrong approach.
Do you really want the return types of m1 * m2 and m2 * m1 to be
different.  For starters, I'd require the client code to make the
conversion explicit (which is the case in your current code); if you do
want to support the implicit conversions, I think you need to make the
operator* a global, use some sort of simple meta-programming to
determine the correct return type (i.e. given Matrices of long and
unsigned, you might want to have a return type of unsigned long,
since this is what mixed type arithmetic with these types gives
otherwise), convert both sides to the target type, and do the arithmetic
on it.  A lot of work for what is probably not a very important or
useful feature.  (Just my opinion, of course.  If your clients really
want the mixed type arithmetic, and are willing to pay for it...)