Lazy recursion with generators

In this article, I plan to study Python’s generator, and use it to reconcile and recursive code memory usage;

When code calls itself¶

When the code calls itself

Do you know what recursion is;
is the function calling itself,
To review, let’s look at an example:

 def factorial ( n ): # base case if n == 1 : return 1 # recurse return n * factorial ( n - 1 )

To understand how calling itself works,
Here is the factorial(5) process breakdown:

 factorial ( 5 ) = 5 * factorial ( 4 ) = 5 * 4 * factorial ( 3 ) = 5 * 4 * 3 * factorial ( 2 ) = 5 * 4 * 3 * 2 * factorial ( 1 ) = 5 * 4 * 3 * 2 * 1 = 120

The value of 1 comes from factorial(1),
Then continue to multiply by 2, 3 and so on when returning to the call stack;

Why bother? ¶

Why bother?

The above recursion can of course be reduced to a for loop:

 def factorial ( n ): product = 1 for i in range ( 2 , n + 1 ): product = product * i return product

So why bother writing it recursively?

In fact, some logic is recursive in nature;
A good example is to print out all paths in a folder,
Just like the find command does; here is the relevant code from one of my projects:

 $ find ./src ./src ./src/pylox ./src/pylox/tokens.py ./src/pylox/utils ./src/pylox/utils/__init__.py ./src/pylox/utils/visitor.py ./src/pylox/utils/ast_printer.py ./src/pylox/__init__.py ./src/pylox/lexer.py ./src/pylox/__main__.py ./src/pylox/expr.py ./src/pylox/py.typed ./src/pylox/parser.py

The way find works is fairly simple; here’s how to get all the contents of a folder instantly:

print folder path
get everything in the folder
For each item in the folder:
- If it is a file, print out the path of the file
- Where I come from is the folder, print out all the contents of this subfolder

Note that the last directive (print everything in the subfolder)
only a fraction of the original task; recursion is well suited for such tasks;

Programming time¶

Time to code

Let’s translate these instructions into code; a hypothetical file structure will be used for testing;

Simulates the following tree structure:

 $ tree / / ├── etc │ ├── passwd │ └── shadow └── usr ├── bin │ ├── cat │ └── ls └── lib ├── my_lib └── gcc └── x86_64-linux-gnu

For Python code:

 file_tree = [ '' , [ [ 'etc' , [ 'passwd' , 'shadow' ]], [ 'usr' , [ [ 'bin' , [ 'cat' , 'ls' ]], [ 'lib' , [ 'my_lib' , [ 'gcc' , [ 'x86_64-linux-gnu' ]] ]] ]] ]] def print_paths_recursive ( folder , path = ()): name , contents = folder path = ( * path , name ) print ( '/' . join ( path )) for item in contents : if isinstance ( item , str ): # This is a file, print out its path print ( '/' . join (( * path , item ))) else : # This is a folder, recurse print_paths_recursive ( item , path ) print_paths_recursive ( file_tree )

and output:

 $ python find.py /etc /etc/passwd /etc/shadow /usr /usr/bin /usr/bin/cat /usr/bin/ls /usr/lib /usr/lib/my_lib /usr/lib/gcc /usr/lib/gcc/x86_64-linux-gnu Fairly straightforward.

Now comes the twist: how would you make this function return all paths?

Recursion and collections¶

Recursion and collection

It’s not uncommon to write recursive code to collect some data; having to collect all the files that are large but not printing them out is an obvious instance;

To do this, we make a rather small change:

The function will now return a list of paths;
This is an important difference with recursion,
Because, the function didn’t return anything before;
We will append to the path list instead of printing out the contents;
This will receive the subpath as the return value,
Instead of just making recursive calls and appending all returns to the final answer dataset;

Here is the modified code:

 def get_paths_recursive ( folder , path = ()): paths = [] name , contents = folder path = ( * path , name ) paths . append ( '/' . join ( path )) for item in contents : if isinstance ( item , str ): # This is a file, append its path paths . append ( '/' . join (( * path , item ))) else : # This is a folder, recurse and append all subpaths for subpath in get_paths_recursive ( item , path ): paths . append ( subpath ) return paths paths = get_paths_recursive ( file_tree ) print ( paths )

output:

 $ python find.py [ '' , '/etc' , '/etc/passwd' , '/etc/shadow' , '/usr' , '/usr/bin' , '/usr/bin/cat' , '/usr/bin/ls' , '/usr/lib' , '/usr/lib/my_lib' , '/usr/lib/gcc' , '/usr/lib/gcc/x86_64-linux-gnu' ]

Here comes the problem¶

The problem

The problem arises if a lot of folders are collected;
If you have thousands or millions of files and folders in your directory,
store all files and folders in a list,
It may be troublesome for two reasons:

Your memory usage will randomly go up; there’s no limit to how big the list can grow, so you can technically even run out of memory;
If you only care about a few items in the file, you’re out of luck – the algorithm will find each subfolder, and then you can do something else with the resulting data;

Essentially, this is an algorithm for acute assessment;
The only way to avoid storing all the data is to perform tasks directly inside the function,
Just like we did when printing directly;
However, this strongly couples our code and business;

Solution¶

The solution

So, to summarize our problem: we want to execute any code that needs to be executed against any file path;

The task might be to print it out, or store it in a list, or whatever:

 def get_paths_recursive ( folder , path = ()): name , contents = folder path = ( * path , name ) ## Do something with the `path` here, ## Example: print(path), or paths.append(path) for item in contents : if isinstance ( item , str ): ## Do something with the `path + item` here... else : for subpath in get_paths_recursive ( item , path ): ## Do something with the `subpath` here... return paths

Python has provided us with a very powerful structure to solve this problem, which is the generator;

You may have heard generators in some other contexts, for example:

 def gen (): yield 10 yield 20 yield 10 for item in gen (): print ( 'Got:' , item ) # Got: 10 # Got: 20 # Got: 10

However, there is a little-known fact about generators:
People can move your evaluation between two points in the code;

I mean something like this:

 def gen (): print ( "Start!" ) yield 1 print ( "Now we're calculating stuff in gen()" ) value = sum ( range ( 10 )) yield value print ( "Last value!" ) yield 42 print ( "Done." ) for item in gen (): print ( f "Doing things with { item } ..." )

You can see how execution goes back and forth between gen() and the for loop:

 $ py a.py Start! Doing things with 1 ... Now we ' re calculating stuff in gen () Doing things with 45 ... Last value! Doing things with 42 ... Done.

This is exactly what we need in this example:
Whenever we have a new path, we need the execution context to return to the main code;
So, we can pass control of the generator to the loop:

 def get_paths_generator ( folder , path = ()): name , contents = folder path = ( * path , name ) yield '/' . join ( path ) for item in contents : if isinstance ( item , str ): yield '/' . join (( * path , item )) else : for subpath in get_paths_generator ( item , path ): yield subpath

Now, here comes the best part,
We can create primitive use cases, print and store lists easily:

 $ python -i find.py >>> for path in get_paths_generator ( file_tree ) : ... print ( path ) /etc /etc/passwd /etc/shadow /usr /usr/bin /usr/bin/cat /usr/bin/ls /usr/lib /usr/lib/my_lib /usr/lib/gcc /usr/lib/gcc/x86_64-linux-gnu >>> list ( get_paths_generator ( file_tree )) [ '' , '/etc' , '/etc/passwd' , '/etc/shadow' , '/usr' , '/usr/bin' , '/usr/bin/cat' , '/usr/bin/ls' , '/usr/lib' , '/usr/lib/my_lib' , '/usr/lib/gcc' , '/usr/lib/gcc/x86_64-linux-gnu' ]

This solution is much more flexible,
Also, there will never be eager evaluation problems like the original solution;

Bonus: yield from ¶

Bonus: yield from

The code that originally used append to store the path can be improved:
Instead of writing a for loop to append each subpath one by one, you can use list.extend:

 # ... for item in contents : if isinstance ( item , str ): paths . append ( '/' . join (( * path , item ))) else : ## REPLACING THIS LOOP: # for subpath in get_paths_recursive(item, path): # paths.append(subpath) paths . extend ( get_paths_recursive ( item , path ))

Two things can be done in our generator solution,
Use yield from gen():

 def get_paths_generator ( folder , path = ()): name , contents = folder path = ( * path , name ) yield '/' . join ( path ) for item in contents : if isinstance ( item , str ): yield '/' . join (( * path , item )) else : yield from get_paths_generator ( item , path )

yield from will yield all the values in another generator, one by one;

Footnotes¶

footer

Above, hopefully you discovered using generators to improve new (and old) recursive code in Python;
Also, James Powell gave a great talk:
Python Generators || James Powell – YouTube

For more ideas on how generators can be extended,
Worth a look if you’re interested;

refer.¶

Every serious technical blogger has a lot of accumulation worth intensive reading, such as this Tushar Sadhwani ‘s:

– Understanding all of Python, through its builtins¶

This article is transferred from: https://blog.zoomquiet.io/lazy_recursion_with_generators.html
This site is only for collection, and the copyright belongs to the original author.