Skip to content

Latest commit

 

History

History

get-directories

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Problem

Get Directories

You are given a list of strings which represent absolute paths. Identify all unique directories.

Similar to linux notation, a name which has a slash to the right of it is a directory:

  • /foo/bar/ both foo and bar are directories
  • /foo/bar here foo is a directory but bar is not

You can assume all paths will be absolute and therefore start with a /

Example:

Input:

  • a/b/c
  • /a/d
  • b/c
  • /a/b/d
  • /a
  • /d
  • /a/b/c/e

Output (in no particular order):

  • /
  • /a
  • /a/b
  • a/b/c
  • /b
  • b/c

Solution

So the idea is going to be to iterate through each string path component and build up a tree then just gather up all nodes in the tree.

First thing is first we want to parse the string into directories. I believe we can do that with regex. Bonus, I believe we can ignore all non-directories in the same step by grabbing only strings that have a slash to the right

import re

def get_directory_fragments(path):
    return re.findall(r"([^/]*/)", path)

So we’re going to grab all strings of non-overlapping characters that include a slash to the right

<<get_directory_fragments>>
return get_directory_fragments("/a/bc/de/f")
[’’, ‘a’, ‘bc/’, ‘de/’]

Perfect, we get the directories and not any files

Ok, so now we can just build up the tree. Just to keep things nice lets create a structure for out tree

from collections import namedtuple

DirectoryNode = namedtuple('DirectoryNode', ['name', 'children'])

And now lets use the above to build out our structure. We are going to pass in an array of nodes which represent the “level” we’re at currently (the children of the parent node/all directories in the current one). If the next directory in the path fragments exists in this list then use that directory’s children to recurse, otherwise create a new node, add it to the parent’s childeren, and recurse on its children (which will be empty so the process should in theory repeat)

<<DirectoryNode>>
def build_up_tree_paths(parent_nodes, path_fragments):
    if not path_fragments:
        return
    [directory, *other_directories] = path_fragments

    try:
        directory_node = next((n for n in parent_nodes if n.name == directory))
    except StopIteration:
        directory_node = DirectoryNode(directory, [])
        parent_nodes.append(directory_node)

    build_up_tree_paths(directory_node.children, other_directories)
<<build_up_tree_paths>>
rootNodes = []
build_up_tree_paths(rootNodes, ['/', '/a', '/b'])
build_up_tree_paths(rootNodes, ['/', '/c'])
return rootNodes
DirectoryNode(name= / children= (DirectoryNode (name= /a children= (DirectoryNode (name= /b children= nil))) DirectoryNode (name= /c children= nil)))

Now we just need some way to walk the three and generate the final list

def collect_names(root):
    if root:
        yield root.name
        for c in root.children:
            yield from (f'{root.name}{x}' for x in collect_names(c))
<<collect_names>>
<<DirectoryNode>>
return list(collect_names(DirectoryNode('/', [DirectoryNode('a/', []), DirectoryNode('b/', [DirectoryNode('d/', [])])])))
[’’, ‘/a’, ’b’, ’b/d’]

Excellent

So now lets put it all together

from itertools import chain
import json
<<get_directory_fragments>>
<<build_up_tree_paths>>
<<collect_names>>

data = [d for [d,] in data]
root_directories = []
for path in data:
    build_up_tree_paths(root_directories, get_directory_fragments(path))

return list(chain.from_iterable((collect_names(paths) for paths in root_directories)))
[’’, ‘/a’, ’a/b’, ’a/b/c’, ’b’, ’b/c’]

Cool!