Here's my factorial function in Fortran.
module facmod
  implicit none
contains
  function factorial (n) result (fac)
    use FMZM 
    integer, intent(in)  :: n
    integer              :: i
    type(IM)             :: fac
    fac = 1
    if(n==0) then
       fac = 1
    elseif(n==1) then
       fac = 1
    elseif(n==2) then
       fac = 2
    elseif(n < 0) then
       write(*,*) 'Error in factorial N=', n
       stop 1
    else
       do i = 1, n
          fac = fac * i
       enddo
    endif
  end function factorial
end module facmod
program main
   use FMZM   
   use facmod, only: factorial
   implicit none
   type(IM) :: res
   integer :: n, lenr
   character (len=:), allocatable :: str
   character(len=1024) :: fmat
   print*,'enter the value of n'
   read*, n
   res = factorial(n)
   lenr = log10(TO_FM(res))+2
   allocate(character(len=lenr) :: str)
   write (fmat, "(A5,I0)") "i", lenr
   call im_form(fmat, res, str)   
   print*, trim( adjustl(str))
 end program main
I compile using FMZM:
gfortran -std=f2008 fac.F90 fmlib.a -o fac
echo -e "1000" | .fac computes easy. However, if I give this echo -e "3600" | .fac, I already get an error on my machine:
  Error in FM.  More than       200000  type (FM), (ZM), (IM) numbers
                have been defined.  Variable  SIZE_OF_START  in file
                FMSAVE.f95  defines this value.
                Possible causes of this error and remedies:
                (1) Make sure all subroutines (also functions that do not
                    return type FM, ZM, or IM function values) have
                        CALL FM_ENTER_USER_ROUTINE
                    at the start and 
                        CALL FM_EXIT_USER_ROUTINE
                    at the end and before any other return, and all
                    functions returning an FM, ZM, or IM function value have
                        CALL FM_ENTER_USER_FUNCTION(F)
                    at the start and 
                        CALL FM_EXIT_USER_FUNCTION(F)
                    at the end and before any other return, where the actual
                    function name replaces  F  above.
                    Otherwise that routine could be leaking memory, and
                    worse, could get wrong results because of deleting some
                    FM, ZM, or IM temporary variables too soon.
                (2) Make sure all subroutines and functions declare any
                    local type FM, ZM, or IM variables as saved.  Otherwise
                    some compilers create new instances of those variables
                    with each call, leaking memory.
                    For example:
                        SUBROUTINE SUB(A,B,C,X,Y,RESULT)
                        TYPE (FM) :: A,B,C,X,Y,RESULT,ERR,TOL,H
                    Here A,B,C,X,Y,RESULT are the input variables and
                    ERR,TOL,H are local variables.  The fix is:
                        SUBROUTINE SUB(A,B,C,X,Y,RESULT)
                        TYPE (FM) :: A,B,C,X,Y,RESULT
                        TYPE (FM), SAVE :: ERR,TOL,H
                (3) Since = assignments for multiple precision variables are
                    the trigger for cleaning up temporary multiple precision
                    variables, a loop with subroutine calls that has no =
                    assignments can run out of space to store temporaries.
                    For example:
                        DO J = 1, N
                           CALL SUB(A,B,C,TO_FM(0),TO_FM(1),RESULT)
                        ENDDO
                    Most compilers will create two temporary variables with
                    each call, to hold the TO_FM values.
                    One fix is to put an assignment into the loop:
                        DO J = 1, N
                           ZERO = TO_FM(0)
                           CALL SUB(A,B,C,ZERO,TO_FM(1),RESULT)
                        ENDDO
                (4) If a routine uses allocatable type FM, ZM, or IM arrays
                    and allocates and deallocates with each call, then after
                    many calls this limit on number of variables could be 
                    exceeded, since new FM variable index numbers are
                    generated for each call to the routine.
                    A fix for this is to call FM_DEALLOCATE before actually
                    deallocating each array, so those index numbers can be
                    re-used.  For example:
                        DEALLOCATE(T)
                    becomes:
                        CALL FM_DEALLOCATE(T)
                        DEALLOCATE(T)
                (5) If none of this helps, try running this program again
                    after increasing the value of  SIZE_OF_START  and
                    re-compiling.
What optimizations or Fortran idioms am I missing that is hurting my performance so much?
For example, in python, I can factorial numbers much larger than 3500:
>>> import math
>>> math.factorial(100000)
Or in Haskell:
Prelude> product [1..100000]
Both these compute, not exactly quickly, but without error.
How can I improve my algorithm or better use existing libraries to improve performance of large integer factorials in Fortran? Is there a more appropriate big integer library than FMZM?