Get Directories
You are given a list of strings which represent absolute paths. Identify all unique directories.
Similar to linux notation, a name which has a slash to the right of it is a directory:
/foo/bar/
both foo and bar are directories/foo/bar
here foo is a directory but bar is not
You can assume all paths will be absolute and therefore start with a /
Example:
Input:
- a/b/c
- /a/d
- b/c
- /a/b/d
- /a
- /d
- /a/b/c/e
Output (in no particular order):
- /
- /a
- /a/b
- a/b/c
- /b
- b/c
So the idea is going to be to iterate through each string path component and build up a tree then just gather up all nodes in the tree.
First thing is first we want to parse the string into directories. I believe we can do that with regex. Bonus, I believe we can ignore all non-directories in the same step by grabbing only strings that have a slash to the right
import re
def get_directory_fragments(path):
return re.findall(r"([^/]*/)", path)
So we’re going to grab all strings of non-overlapping characters that include a slash to the right
<<get_directory_fragments>>
return get_directory_fragments("/a/bc/de/f")
Perfect, we get the directories and not any files
Ok, so now we can just build up the tree. Just to keep things nice lets create a structure for out tree
from collections import namedtuple
DirectoryNode = namedtuple('DirectoryNode', ['name', 'children'])
And now lets use the above to build out our structure. We are going to pass in an array of nodes which represent the “level” we’re at currently (the children of the parent node/all directories in the current one). If the next directory in the path fragments exists in this list then use that directory’s children to recurse, otherwise create a new node, add it to the parent’s childeren, and recurse on its children (which will be empty so the process should in theory repeat)
<<DirectoryNode>>
def build_up_tree_paths(parent_nodes, path_fragments):
if not path_fragments:
return
[directory, *other_directories] = path_fragments
try:
directory_node = next((n for n in parent_nodes if n.name == directory))
except StopIteration:
directory_node = DirectoryNode(directory, [])
parent_nodes.append(directory_node)
build_up_tree_paths(directory_node.children, other_directories)
<<build_up_tree_paths>>
rootNodes = []
build_up_tree_paths(rootNodes, ['/', '/a', '/b'])
build_up_tree_paths(rootNodes, ['/', '/c'])
return rootNodes
DirectoryNode | (name= / children= (DirectoryNode (name= /a children= (DirectoryNode (name= /b children= nil))) DirectoryNode (name= /c children= nil))) |
Now we just need some way to walk the three and generate the final list
def collect_names(root):
if root:
yield root.name
for c in root.children:
yield from (f'{root.name}{x}' for x in collect_names(c))
<<collect_names>>
<<DirectoryNode>>
return list(collect_names(DirectoryNode('/', [DirectoryNode('a/', []), DirectoryNode('b/', [DirectoryNode('d/', [])])])))
Excellent
So now lets put it all together
from itertools import chain
import json
<<get_directory_fragments>>
<<build_up_tree_paths>>
<<collect_names>>
data = [d for [d,] in data]
root_directories = []
for path in data:
build_up_tree_paths(root_directories, get_directory_fragments(path))
return list(chain.from_iterable((collect_names(paths) for paths in root_directories)))
Cool!