I need to use python to take N number of lines from large txt file. These files are basically tab delimited tables. My task has the following constraints:
- These files may contain headers (some have multi-line headers).
 - Headers need to appear in the output in the same order.
 - Each line can be taken only once.
 - The largest file currently is about 150GB (about 60 000 000 lines).
 - Lines are roughly the same length in a file, but may vary between different files.
 - I will usually be taking 5000 random lines (I may need up to 1 000 000 lines)
 
Currently I have written the following code:
inputSize=os.path.getsize(options.input)
usedPositions=[] #Start positions of the lines already in output
with open(options.input) as input:
    with open(options.output, 'w') as output:
        #Handling of header lines
        for i in range(int(options.header)):
            output.write(input.readline())
            usedPositions.append(input.tell())
        # Find and write all random lines, except last
        for j in range(int(args[0])):
            input.seek(random.randrange(inputSize)) # Seek to random position in file (probably middle of line)
            input.readline() # Read the line (probably incomplete). Next input.readline() results in a complete line.
            while input.tell() in usedPositions: # Take a new line if current one is taken
                input.seek(random.randrange(inputSize))
                input.readline() 
            usedPositions.append(input.tell()) # Add line start position to usedPositions
            randomLine=input.readline() # Complete line
            if len(randomLine) == 0: # Take first line if end of the file is reached
                input.seek(0)
                for i in range(int(options.header)): # Exclude headers
                    input.readline()
                randomLine=input.readline()
            output.write(randomLine)            
This code seems to be working correctly.
I am aware that this code prefers lines that follow the longest lines in input, because seek() is most likely to return a position on the longest line and the next line is written to output. This is irrelevant as lines in the input file are roughly the same length. Also I am aware that this code results in an infinite loop if N is larger than number of lines in input file. I will not implement a check for this, as getting the line count takes a lot of time.
RAM and HDD limitations are irrelevant. I am only concerned about the speed of the program. Is there a way to further optimize this code? Or perhaps there is a better approach?
EDIT: To clarify, the lines in one file have roughly the same length. However, i have multiple files that this script needs to run on and the average length of a line will be different for these files. For example file A may have ~100 characters per line and file B ~50000 characters per line. I do not know the average line length of any file beforehand.