I have downloaded a bunch of videos from coursera.org and have them stored in one particular folder. There are many individual videos in a particular folder (Coursera breaks a lecture into multiple short videos). I would like to have a python script which gives the combined length of all the videos in a particular directory. The video files are .mp4 format.
-
http://stackoverflow.com/a/3844467/735204 video length for a file – Emmett Butler Feb 23 '13 at 13:57
-
see [get dimensions of a video file](http://stackoverflow.com/q/7348505/309483) – Janus Troelsen Feb 23 '13 at 16:37
-
1see [mpeg-2 library to extract video duration](http://stackoverflow.com/q/11615384/309483). the answers are not specific to mpeg-2 at all – Janus Troelsen Feb 23 '13 at 18:20
-
see also [Python native library to read metadata from videos?](http://stackoverflow.com/q/10075176/309483) – Janus Troelsen Feb 24 '13 at 12:37
6 Answers
First, install the ffprobe command (it's part of FFmpeg) with
sudo apt install ffmpeg
then use subprocess.run() to run this bash command:
ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 -- <filename>
(which I got from http://trac.ffmpeg.org/wiki/FFprobeTips#Formatcontainerduration), like this:
from pathlib import Path
import subprocess
def video_length_seconds(filename):
result = subprocess.run(
[
"ffprobe",
"-v",
"error",
"-show_entries",
"format=duration",
"-of",
"default=noprint_wrappers=1:nokey=1",
"--",
filename,
],
capture_output=True,
text=True,
)
try:
return float(result.stdout)
except ValueError:
raise ValueError(result.stderr.rstrip("\n"))
# a single video
video_length_seconds('your_video.webm')
# all mp4 files in the current directory (seconds)
print(sum(video_length_seconds(f) for f in Path(".").glob("*.mp4")))
# all mp4 files in the current directory and all its subdirectories
# `rglob` instead of `glob`
print(sum(video_length_seconds(f) for f in Path(".").rglob("*.mp4")))
# all files in the current directory
print(sum(video_length_seconds(f) for f in Path(".").iterdir() if f.is_file()))
This code requires Python 3.7+ because that's when text= and capture_output= were added to subprocess.run. If you're using an older Python version, check the edit history of this answer.
- 14,854
- 11
- 100
- 103
- Download MediaInfo and install it (don't install the bundled adware)
- Go to the MediaInfo source downloads and in the "Source code, All included" row, choose the link next to "libmediainfo"
- Find
MediaInfoDLL3.pyin the downloaded archive and extract it anywhere. Example location:libmediainfo_0.7.62_AllInclusive.7z\MediaInfoLib\Source\MediaInfoDLL\MediaInfoDLL3.py - Now make a script for testing (sources below) in the same directory.
- Execute the script.
MediaInfo works on POSIX too. The only difference is that an so is loaded instead of a DLL.
Test script (Python 3!)
import os
os.chdir(os.environ["PROGRAMFILES"] + "\\mediainfo")
from MediaInfoDLL3 import MediaInfo, Stream
MI = MediaInfo()
def get_lengths_in_milliseconds_of_directory(prefix):
for f in os.listdir(prefix):
MI.Open(prefix + f)
duration_string = MI.Get(Stream.Video, 0, "Duration")
try:
duration = int(duration_string)
yield duration
print("{} is {} milliseconds long".format(f, duration))
except ValueError:
print("{} ain't no media file!".format(f))
MI.Close()
print(sum(get_lengths_in_milliseconds_of_directory(os.environ["windir"] + "\\Performance\\WinSAT\\"
)), "milliseconds of content in total")
- 20,267
- 14
- 135
- 196
-
Every time I run this routine, `MI.Open(prefix+f)` simply returns an integer `0L`. Am I doing something wrong here? The only thing i change in your code is ` os.environ["windir"] + "\\Performance\\WinSAT\\" `, by a path in my local machine. – hardikudeshi Feb 24 '13 at 09:08
-
@hardikudeshi It's very important that the path you specify ends with `\\ ` (the path seperator) since it is concatenated with the filename. Tell me if that is the problem, please. – Janus Troelsen Feb 24 '13 at 12:33
-
I took care of that, however, I just realized that I have been that I was running this script in Python 2.7 where as the script is made for Python 3, which could be the source of trouble. – hardikudeshi Feb 24 '13 at 12:44
-
@hardikudeshi: There's a Python 2 version of MediaInfoDLL in the source distribution too. – Janus Troelsen Feb 24 '13 at 14:36
In addition to Janus Troelsen's answer above, I would like to point out a small problem I encountered when implementing his answer. I followed his instructions one by one but had different results on windows (7) and linux (ubuntu). His instructions worked perfectly under linux but I had to do a small hack to get it to work on windows. I am using a 32-bit python 2.7.2 interpreter on windows so I utilized MediaInfoDLL.py. But that was not enough to get it to work for me I was receiving this error at this point in the process:
"WindowsError: [Error 193] %1 is not a valid Win32 application".
This meant that I was somehow using a resource that was not 32-bit, it had to be the DLL MediaInfoDLL.py was loading. If you look at the MediaInfo intallation directory you will see 3 dlls MediaInfo.dll is 64-bit while MediaInfo_i386.dll is 32-bit. MediaInfo_i386.dll is the one which I had to use because of my python setup. I went to MediaInfoDLL.py (which I already had included in my project) and changed this line:
MediaInfoDLL_Handler = windll.MediaInfo
to
MediaInfoDLL_Handler = WinDLL("C:\Program Files (x86)\MediaInfo\MediaInfo_i386.dll")
I didn't have to change anything for it to work in linux
- 1,705
- 2
- 19
- 30
Nowadays pymediainfo is available, so Janus Troelsen's answer could be simplified.
You need to install MediaInfo and pip install pymediainfo. Then the following code would print you the total length of all video files:
import os
from pymediainfo import MediaInfo
def get_track_len(file_path):
media_info = MediaInfo.parse(file_path)
for track in media_info.tracks:
if track.track_type == "Video":
return int(track.duration)
return 0
print(sum(get_track_len(f) for f in os.listdir('directory with video files')))
- 138
- 1
- 12
This link shows how to get the length of a video file https://stackoverflow.com/a/3844467/735204
import subprocess
def getLength(filename):
result = subprocess.Popen(["ffprobe", filename],
stdout = subprocess.PIPE, stderr = subprocess.STDOUT)
return [x for x in result.stdout.readlines() if "Duration" in x]
If you're using that function, you can then wrap it up with something like
import os
for f in os.listdir('.'):
print "%s: %s" % (f, getLength(f))
- 137,073
- 23
- 153
- 219
- 5,969
- 2
- 29
- 47
-
Though this is not a complete answer, but this helped me in the crux of the matter. For anybody who looks at this question in the future, I also need to refer to http://stackoverflow.com/questions/2780897/python-sum-up-time in order to sum the individual times. – hardikudeshi Feb 24 '13 at 10:50
-
This will break if the word "Duration" appears anywhere in the metadata of your file, like the title of movie, the filename or the name of one of the streams. – Boris Verkhovskiy Dec 28 '19 at 18:29
-
Doesn't work on Python 3: `TypeError: a bytes-like object is required, not 'str'` – Boris Verkhovskiy Dec 28 '19 at 18:30
Here's my take. I did this on Windows. I took the answer from Federico above, and changed the python program a little bit to traverse a tree of folders with video files. So you need to go above to see Federico's answer, to install MediaInfo and to pip install pymediainfo, and then write this program, summarize.py:
import os
import sys
from pymediainfo import MediaInfo
number_of_video_files = 0
def get_alternate_len(media_info):
myJson = media_info.to_data()
myArray = myJson['tracks']
for track in myArray:
if track['track_type'] == 'General' or track['track_type'] == 'Video':
if 'duration' in track:
return int(track['duration'] / 1000)
return 0
def get_track_len(file_path):
global number_of_video_files
media_info = MediaInfo.parse(file_path)
for track in media_info.tracks:
if track.track_type == "Video":
number_of_video_files += 1
if type(track.duration) == int:
len_in_sec = int(track.duration / 1000)
elif type(track.duration) == str:
len_in_sec = int(float(track.duration) / 1000)
else:
len_in_sec = get_alternate_len(media_info)
if len_in_sec == 0:
print("File path = " + file_path + ", problem in type of track.duration")
return len_in_sec
return 0
sum_in_secs = 0.0
os.chdir(sys.argv[1])
for root, dirs, files in os.walk("."):
for name in files:
sum_in_secs += get_track_len(os.path.join(root, name))
hours = int(sum_in_secs / 3600)
remain = sum_in_secs - hours * 3600
minutes = int(remain / 60)
seconds = remain - minutes * 60
print("Directory: " + sys.argv[1])
print("Total number of video files is " + str(number_of_video_files))
print("Length: %d:%02d:%02d" % (hours, minutes, seconds))
Run it: python summarize.py <DirPath>
Have fun. I found I have about 1800 hours of videos waiting for me to have some free time. Yeah sure
- 155
- 1
- 1
- 11