21

I want to use tree (or similar) to see the directory structure of a given directory as well as whether each subdirectory has files in it. So, how could I use tree but limit the maximum number of files to display in a given subdirectory?

If it can't be done with tree, how could it be done by modifying the Python code from this site?

synaptik
  • 435
  • 2
  • 6
  • 16

6 Answers6

11

One can use tree --filelimit=N to limit number of subdirectories/file to display. Unfortunately, this will not open directory which has more than N sub-directories and files.

For simple cases, when you have multiple directories and most have too many(say > 100) files, you can use tree --filelimit=100.

.
├── A1
│   ├── A2
│   ├── B2
│   ├── C2 [369 entries exceeds filelimit, not opening dir]
│   └── D2 [3976 entries exceeds filelimit, not opening dir]
├── B1
│   └── A2
│       ├── A3.jpeg
│       └── B3.png
└── C1.sh

Note, if A1/C2 has a sub-directory A3, it will not be shown.

P.S. This is not a complete solution but will be quicker for few.

9

Here's a working example with the python code you cited:

Usage: tree.py -f [file limit] <directory>

If a number is specified for -f [file limit] then ... <additional files> is printed and the other files are skipped. Additional directories should not be skipped however. If the file limit is set to 10000 (default) this acts as no limit

#! /usr/bin/env python
# tree.py
#
# Written by Doug Dahms
# modified by glallen @ StackExchange
#
# Prints the tree structure for the path specified on the command line

from os import listdir, sep
from os.path import abspath, basename, isdir
from sys import argv

def tree(dir, padding, print_files=False, limit=10000):
    print padding[:-1] + '+-' + basename(abspath(dir)) + '/'
    padding = padding + ' '
    limit = int(limit)
    files = []
    if print_files:
        files = listdir(dir)
    else:
        files = [x for x in listdir(dir) if isdir(dir + sep + x)]
    count = 0
    for file in files:
        count += 1
        path = dir + sep + file
        if isdir(path):
            print padding + '|'
            if count == len(files):
                tree(path, padding + ' ', print_files, limit)
            else:
                tree(path, padding + '|', print_files, limit)
        else:
            if limit == 10000:
                print padding + '|'
                print padding + '+-' + file
                continue
            elif limit == 0:
                print padding + '|'
                print padding + '+-' + '... <additional files>'
                limit -= 1
            elif limit <= 0:
                continue
            else:
                print padding + '|'
                print padding + '+-' + file
                limit -= 1

def usage():
    return '''Usage: %s [-f] [file-listing-limit(int)] <PATH>
Print tree structure of path specified.
Options:
-f          Print files as well as directories
-f [limit]  Print files as well as directories up to number limit
PATH        Path to process''' % basename(argv[0])

def main():
    if len(argv) == 1:
        print usage()
    elif len(argv) == 2:
        # print just directories
        path = argv[1]
        if isdir(path):
            tree(path, ' ')
        else:
            print 'ERROR: \'' + path + '\' is not a directory'
    elif len(argv) == 3 and argv[1] == '-f':
        # print directories and files
        path = argv[2]
        if isdir(path):
            tree(path, ' ', True)
        else:
            print 'ERROR: \'' + path + '\' is not a directory'
    elif len(argv) == 4 and argv[1] == '-f':
        # print directories and files up to max
        path = argv[3]
        if isdir(path):
            tree(path, ' ', True, argv[2])
        else:
            print 'ERROR: \'' + path + '\' is not a directory'
    else:
        print usage()

if __name__ == '__main__':
    main()

When run, it should produce output similar to:

user@host /usr/share/doc $ python /tmp/recipe-217212-1.py -f 2 . | head -n 40
+-doc/
  |
  +-libgnuradio-fft3.7.2.1/
  | |
  | +-copyright
  | |
  | +-changelog.Debian.gz
  |
  +-libqt4-script/
  | |
  | +-LGPL_EXCEPTION.txt
  | |
  | +-copyright
  | |
  | +-... <additional files>
  |
  +-xscreensaver-gl/
  | |
  | +-copyright
  | |
  | +-changelog.Debian.gz
  | |
  | +-... <additional files>
glallen
  • 2,244
1

Another approach may be to filter the json output provided by tree. For example, tree -J displays:

[
  {"type":"directory","name":"root_directory/","contents":[
    {"type":"directory","name":"child_directory","contents":[
      {"type":"file","name":"file06.txt"},
      {"type":"file","name":"file07.txt"}
    ]},
    {"type":"file","name":"file00.txt"},
    {"type":"file","name":"file01.txt"},
    {"type":"file","name":"file02.txt"},
    {"type":"file","name":"file03.txt"},
    {"type":"file","name":"file04.txt"},
    {"type":"file","name":"file05.txt"}
  ]},
  {"type":"report","directories":2,"files":8}
]

This json can be filtered to truncate long lists of files.

import json

def truncate_directories(json_data, max_files): # Parse JSON data data = json.loads(json_data)

# Iterate through each item in the JSON data
for item in data:
    if item.get('type') == 'directory':
        contents = item.get('contents')
        if contents and len(contents) &gt; max_files:
            # Truncate the contents of the directory
            item['contents'] = contents[:3] + [{&quot;type&quot;: &quot;file&quot;, &quot;name&quot;: &quot;...&quot;}] + contents[-3:]

# Convert the modified data back to JSON format
return json.dumps(data, indent=2)

Example JSON data

json_data = ''' [ {"type":"directory","name":"root_directory/","contents":[ {"type":"directory","name":"child_directory","contents":[ {"type":"file","name":"file06.txt"}, {"type":"file","name":"file07.txt"} ]}, {"type":"file","name":"file00.txt"}, {"type":"file","name":"file01.txt"}, {"type":"file","name":"file02.txt"}, {"type":"file","name":"file03.txt"}, {"type":"file","name":"file04.txt"}, {"type":"file","name":"file05.txt"} ]}, {"type":"report","directories":2,"files":8} ] '''

Set the maximum number of files allowed in a directory

max_files = 3

Truncate directories with too many files

new_json_data = truncate_directories(json_data, max_files)

Print the modified JSON data

print(new_json_data)

[ { "type": "directory", "name": "root_directory/", "contents": [ { "type": "directory", "name": "child_directory", "contents": [ {"type": "file", "name": "file06.txt"}, {"type": "file", "name": "file07.txt"} ] }, {"type": "file", "name": "file00.txt"}, {"type": "file", "name": "..."}, # TRUNCATED FILES {"type": "file", "name": "file05.txt"} ] }, {"type": "report", "directories": 2, "files": 8} ]

In the simulated output, the "root_directory/" contains six files instead of eight. The middle files "file01.txt", "file02.txt", "file03.txt" and "file04.txt" have been replaced with {"type": "file", "name": "..."} to meet the truncation criteria.

If this json is properly output (e.g., as paths or tab indented files), 'tree' should be able to display the custom truncated tree data.

$ tree --help

------- Input options ------- -fromfile Reads paths from files (.=stdin) --fromtabfile Reads trees from tab indented files (.=stdin)

1

Made a few improvements to @glallen's code:

  • make the formatting resemble the tree command
  • default to current working directory
  • ignore files that start with . by default
  • update to python 3
  • formatted the code

Example output:

tree.py -f 5
 └── bq_labeled_patents
     ├── extracted_data_EN.csv
     ├── phase_0
     │   ├── espacenet_en1.pdf
     │   ├── espacenet_en10.pdf
     │   ├── espacenet_en100.pdf
     │   ├── espacenet_en11.pdf
     │   ├── espacenet_en12.pdf
     │   ├── ... <additional files>
     ├── phase_0.json
     └── phase_1
         ├── eu
         │   ├── espacenet_en1.pdf
         │   ├── espacenet_en10.pdf
         │   ├── espacenet_en100.pdf
         │   ├── espacenet_en11.pdf
         │   ├── espacenet_en12.pdf
         │   ├── ... <additional files>
         ├── eu.json
         ├── us
         │   ├── us_001.pdf
         │   ├── us_002.pdf
         │   ├── us_003.pdf
         │   ├── us_004.pdf
         │   ├── us_005.pdf
         │   ├── ... <additional files>
         └── us.json

Updated code:

#! /usr/bin/env python
# tree.py
#
# Written by Doug Dahms
# modified by glallen @ StackExchange
# modified by kym @ StackExchange
# https://superuser.com/q/840152/992568

import argparse import os from os import listdir, sep from os.path import abspath, basename, isdir from sys import argv

def is_hidden(file): return file.startswith(".") or file == "pycache"

def tree( dir, padding, print_files=False, limit=10000, is_last=True, level=0, ignore_hidden=True, ): basename_dir = basename(abspath(dir)) connector = "└── " if is_last else "├── " print(padding + connector + basename_dir) padding = padding + (" " if is_last else "│ ")

files = []
if print_files:
    files = listdir(dir)
else:
    files = [x for x in listdir(dir) if isdir(dir + sep + x)]

if ignore_hidden:
    files = [f for f in files if not is_hidden(f)]

files = sorted(files)
total_files = len(files)
file_count = 0

for i, file in enumerate(files):
    path = dir + sep + file
    if isdir(path):
        tree(
            path,
            padding,
            print_files,
            limit,
            is_last=(i == total_files - 1),
            level=level + 1,
            ignore_hidden=ignore_hidden,
        )
    else:
        file_count += 1
        if file_count &lt;= limit:
            connector = &quot;└── &quot; if i == total_files - 1 else &quot;├── &quot;
            print(padding + connector + file)
        elif file_count == limit + 1:
            connector = &quot;└── &quot; if i == total_files - 1 else &quot;├── &quot;
            print(padding + connector + &quot;... &lt;additional files&gt;&quot;)


def main(): parser = argparse.ArgumentParser( description="Print tree structure of path specified." ) parser.add_argument( "path", nargs="?", default=os.getcwd(), help="Path to process (default: current directory)", ) parser.add_argument( "-f", "--files", nargs="?", const=10000, type=int, help="Print files as well as directories up to number limit", ) parser.add_argument( "--show-hidden", action="store_true", help="Show hidden files (default: hidden files are ignored)", )

args = parser.parse_args()

path = args.path
if isdir(path):
    if args.files is not None:
        tree(path, &quot; &quot;, True, args.files, ignore_hidden=not args.show_hidden)
    else:
        tree(path, &quot; &quot;, ignore_hidden=not args.show_hidden)
else:
    print(&quot;ERROR: '&quot; + path + &quot;' is not a directory&quot;)


if name == "main": main()

kym
  • 199
0

Updated Python 3 version of @glallen's tree.py above.

#!/usr/bin/env python3
# tree.py
#
# Written by Doug Dahms
# modified by glallen @ StackExchange
#
# Prints the tree structure for the path specified on the command line

import os import sys

def tree(directory, padding, print_files=False, limit=10000): print(padding[:-1] + '+-' + os.path.basename(os.path.abspath(directory)) + '/') padding = padding + ' ' limit = int(limit) files = [] if print_files: files = os.listdir(directory) else: files = [x for x in os.listdir(directory) if os.path.isdir(os.path.join(directory, x))] count = 0 for file in files: count += 1 path = os.path.join(directory, file) if os.path.isdir(path): print(padding + '|') if count == len(files): tree(path, padding + ' ', print_files, limit) else: tree(path, padding + '|', print_files, limit) else: if limit == 10000: print(padding + '|') print(padding + '+-' + file) continue elif limit == 0: print(padding + '|') print(padding + '+-' + '... <additional files>') limit -= 1 elif limit <= 0: continue else: print(padding + '|') print(padding + '+-' + file) limit -= 1

def usage(): return '''Usage: {} [-f] [file-listing-limit(int)] <PATH> Print tree structure of path specified. Options: -f Print files as well as directories -f [limit] Print files as well as directories up to number limit PATH Path to process'''.format(os.path.basename(sys.argv[0]))

def main(): if len(sys.argv) == 1: print(usage()) elif len(sys.argv) == 2: # print just directories path = sys.argv[1] if os.path.isdir(path): tree(path, ' ') else: print('ERROR: '' + path + '' is not a directory') elif len(sys.argv) == 3 and sys.argv[1] == '-f': # print directories and files path = sys.argv[2] if os.path.isdir(path): tree(path, ' ', True) else: print('ERROR: '' + path + '' is not a directory') elif len(sys.argv) == 4 and sys.argv[1] == '-f': # print directories and files up to max path = sys.argv[3] if os.path.isdir(path): tree(path, ' ', True, sys.argv[2]) else: print('ERROR: '' + path + '' is not a directory') else: print(usage())

if name == 'main': main()

0

If anyone comes here by google like me, you can also just use awk:

tree -L 3 | awk '
BEGIN {
  last_prefix = ""
  count = 1
}
{
  # Extract indentation
  prefix = $0
  sub(/[^│├└─]+$/, "", prefix)

if (count < 5) { print }

if (prefix == last_prefix) { count++ } else { if (count > 5) { print prefix " " count " more files..." } count = 1 }

last_prefix = prefix }'

Example output:

├── venv-metal
│   ├── bin
│   │   ├── Activate.ps1
│   │   ├── activate
│   │   ├── activate.csh
│   │   ├── activate.fish
│   │   ├── estimator_ckpt_converter
│   │   └── 28 more files...              <<<<<<<<<<<
│   ├── include
│   │   └── python3.11
│   ├── lib
│   │   └── python3.11
│   └── pyvenv.cfg

https://gist.github.com/TTy32/4b66351dde364af54b01ca7a4dd7df02

Qetesh
  • 5