cProfile is the classic profiling tool. The basic way to use it is like so:
python -m cProfile myscript.py
Here I've called it on the test routine of a reference implementation of the mersenne twister that I wrote. 
me@mine $ python -m cProfile mersenne.twister.py 
True
True
1000000
         1003236 function calls in 2.163 CPU seconds
   Ordered by: standard name
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    2.163    2.163 <string>:1(<module>)
        1    0.001    0.001    2.162    2.162 mersenne.twister.py:1(<module>)
        3    0.001    0.000    0.001    0.000 mersenne.twister.py:10(init_gen)
  1000014    1.039    0.000    1.821    0.000 mersenne.twister.py:19(extract_number)
        1    0.000    0.000    0.000    0.000 mersenne.twister.py:3(Twister)
     1603    0.766    0.000    0.782    0.000 mersenne.twister.py:33(generate_numbers)
        1    0.000    0.000    0.000    0.000 mersenne.twister.py:4(__init__)
        1    0.317    0.317    2.161    2.161 mersenne.twister.py:42(_test)
        1    0.001    0.001    2.163    2.163 {execfile}
        1    0.000    0.000    0.000    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1608    0.038    0.000    0.038    0.000 {range}
ncalls is the number of times a function was called. tottime is the total time spent in a function, excluding the time spent in sub-function calls. percall is tottime / ncalls. cumtime is the time spent in the function including the time spent in sub-function calls. And the remaining data is as follows: filename:lineno(func_name).
In most cases, look at ncalls and tottime first. In the above data, you can see that the large majority of the time spent by this program happens in extract_number. Furthermore, we can see that extract_number is called many (1000014) times. So anything I can do to speed up extract_number will significantly speed up the execution of this test code. If it gains me a microsecond, then the gain will be multiplied by 1000014, resulting in a full second gain. 
Then I should work on generate_numbers. Gains there won't matter as much, but they may still be significant, and since that function burns another .7 seconds, there's some benefit to be had. 
That should give you the general idea. Note, however, that the tottime number can sometimes be deceptive, in cases of recursion, for example.