Original link: https://blog.zoomquiet.io/lazy_recursion_with_generators.html
background¶
Original: Lazy recursion, with generators
From Weekly PyCoder 569 ~Weekly~Collection of things from around the world recommendation
Quick translation¶
In this article, I plan to study Python’s generator, and use it to reconcile and recursive code memory usage;
When code calls itself¶
When the code calls itself
Do you know what recursion is;
is the function calling itself,
To review, let’s look at an example:
def factorial ( n ): # base case if n == 1 : return 1 # recurse return n * factorial ( n - 1 )
To understand how calling itself works,
Here is the factorial(5) process breakdown:
factorial ( 5 ) = 5 * factorial ( 4 ) = 5 * 4 * factorial ( 3 ) = 5 * 4 * 3 * factorial ( 2 ) = 5 * 4 * 3 * 2 * factorial ( 1 ) = 5 * 4 * 3 * 2 * 1 = 120
The value of 1 comes from factorial(1),
Then continue to multiply by 2, 3 and so on when returning to the call stack;
Why bother? ¶
Why bother?
The above recursion can of course be reduced to a for loop:
def factorial ( n ): product = 1 for i in range ( 2 , n + 1 ): product = product * i return product
So why bother writing it recursively?
In fact, some logic is recursive in nature;
A good example is to print out all paths in a folder,
Just like the find command does; here is the relevant code from one of my projects:
$ find ./src ./src ./src/pylox ./src/pylox/tokens.py ./src/pylox/utils ./src/pylox/utils/__init__.py ./src/pylox/utils/visitor.py ./src/pylox/utils/ast_printer.py ./src/pylox/__init__.py ./src/pylox/lexer.py ./src/pylox/__main__.py ./src/pylox/expr.py ./src/pylox/py.typed ./src/pylox/parser.py
The way find works is fairly simple; here’s how to get all the contents of a folder instantly:
- print folder path
- get everything in the folder
- For each item in the folder:
- If it is a file, print out the path of the file
- Where I come from is the folder, print out all the contents of this subfolder
Note that the last directive (print everything in the subfolder)
only a fraction of the original task; recursion is well suited for such tasks;
Programming time¶
Time to code
Let’s translate these instructions into code; a hypothetical file structure will be used for testing;
Simulates the following tree structure:
$ tree / / ├── etc │ ├── passwd │ └── shadow └── usr ├── bin │ ├── cat │ └── ls └── lib ├── my_lib └── gcc └── x86_64-linux-gnu
For Python code:
file_tree = [ '' , [ [ 'etc' , [ 'passwd' , 'shadow' ]], [ 'usr' , [ [ 'bin' , [ 'cat' , 'ls' ]], [ 'lib' , [ 'my_lib' , [ 'gcc' , [ 'x86_64-linux-gnu' ]] ]] ]] ]] def print_paths_recursive ( folder , path = ()): name , contents = folder path = ( * path , name ) print ( '/' . join ( path )) for item in contents : if isinstance ( item , str ): # This is a file, print out its path print ( '/' . join (( * path , item ))) else : # This is a folder, recurse print_paths_recursive ( item , path ) print_paths_recursive ( file_tree )
and output:
$ python find.py /etc /etc/passwd /etc/shadow /usr /usr/bin /usr/bin/cat /usr/bin/ls /usr/lib /usr/lib/my_lib /usr/lib/gcc /usr/lib/gcc/x86_64-linux-gnu Fairly straightforward.
Now comes the twist: how would you make this function return all paths?
Recursion and collections¶
Recursion and collection
It’s not uncommon to write recursive code to collect some data; having to collect all the files that are large but not printing them out is an obvious instance;
To do this, we make a rather small change:
The function will now return a list of paths;
This is an important difference with recursion,
Because, the function didn’t return anything before;
We will append to the path list instead of printing out the contents;
This will receive the subpath as the return value,
Instead of just making recursive calls and appending all returns to the final answer dataset;
Here is the modified code:
def get_paths_recursive ( folder , path = ()): paths = [] name , contents = folder path = ( * path , name ) paths . append ( '/' . join ( path )) for item in contents : if isinstance ( item , str ): # This is a file, append its path paths . append ( '/' . join (( * path , item ))) else : # This is a folder, recurse and append all subpaths for subpath in get_paths_recursive ( item , path ): paths . append ( subpath ) return paths paths = get_paths_recursive ( file_tree ) print ( paths )
output:
$ python find.py [ '' , '/etc' , '/etc/passwd' , '/etc/shadow' , '/usr' , '/usr/bin' , '/usr/bin/cat' , '/usr/bin/ls' , '/usr/lib' , '/usr/lib/my_lib' , '/usr/lib/gcc' , '/usr/lib/gcc/x86_64-linux-gnu' ]
Here comes the problem¶
The problem
The problem arises if a lot of folders are collected;
If you have thousands or millions of files and folders in your directory,
store all files and folders in a list,
It may be troublesome for two reasons:
- Your memory usage will randomly go up; there’s no limit to how big the list can grow, so you can technically even run out of memory;
- If you only care about a few items in the file, you’re out of luck – the algorithm will find each subfolder, and then you can do something else with the resulting data;
Essentially, this is an algorithm for acute assessment;
The only way to avoid storing all the data is to perform tasks directly inside the function,
Just like we did when printing directly;
However, this strongly couples our code and business;
Solution¶
The solution
So, to summarize our problem: we want to execute any code that needs to be executed against any file path;
The task might be to print it out, or store it in a list, or whatever:
def get_paths_recursive ( folder , path = ()): name , contents = folder path = ( * path , name ) ## Do something with the `path` here, ## Example: print(path), or paths.append(path) for item in contents : if isinstance ( item , str ): ## Do something with the `path + item` here... else : for subpath in get_paths_recursive ( item , path ): ## Do something with the `subpath` here... return paths
Python has provided us with a very powerful structure to solve this problem, which is the generator;
You may have heard generators in some other contexts, for example:
def gen (): yield 10 yield 20 yield 10 for item in gen (): print ( 'Got:' , item ) # Got: 10 # Got: 20 # Got: 10
However, there is a little-known fact about generators:
People can move your evaluation between two points in the code;
I mean something like this:
def gen (): print ( "Start!" ) yield 1 print ( "Now we're calculating stuff in gen()" ) value = sum ( range ( 10 )) yield value print ( "Last value!" ) yield 42 print ( "Done." ) for item in gen (): print ( f "Doing things with { item } ..." )
You can see how execution goes back and forth between gen() and the for loop:
$ py a.py Start! Doing things with 1 ... Now we ' re calculating stuff in gen () Doing things with 45 ... Last value! Doing things with 42 ... Done.
This is exactly what we need in this example:
Whenever we have a new path, we need the execution context to return to the main code;
So, we can pass control of the generator to the loop:
def get_paths_generator ( folder , path = ()): name , contents = folder path = ( * path , name ) yield '/' . join ( path ) for item in contents : if isinstance ( item , str ): yield '/' . join (( * path , item )) else : for subpath in get_paths_generator ( item , path ): yield subpath
Now, here comes the best part,
We can create primitive use cases, print and store lists easily:
$ python -i find.py >>> for path in get_paths_generator ( file_tree ) : ... print ( path ) /etc /etc/passwd /etc/shadow /usr /usr/bin /usr/bin/cat /usr/bin/ls /usr/lib /usr/lib/my_lib /usr/lib/gcc /usr/lib/gcc/x86_64-linux-gnu >>> list ( get_paths_generator ( file_tree )) [ '' , '/etc' , '/etc/passwd' , '/etc/shadow' , '/usr' , '/usr/bin' , '/usr/bin/cat' , '/usr/bin/ls' , '/usr/lib' , '/usr/lib/my_lib' , '/usr/lib/gcc' , '/usr/lib/gcc/x86_64-linux-gnu' ]
This solution is much more flexible,
Also, there will never be eager evaluation problems like the original solution;
Bonus: yield from ¶
Bonus: yield from
The code that originally used append to store the path can be improved:
Instead of writing a for loop to append each subpath one by one, you can use list.extend:
# ... for item in contents : if isinstance ( item , str ): paths . append ( '/' . join (( * path , item ))) else : ## REPLACING THIS LOOP: # for subpath in get_paths_recursive(item, path): # paths.append(subpath) paths . extend ( get_paths_recursive ( item , path ))
Two things can be done in our generator solution,
Use yield from gen():
def get_paths_generator ( folder , path = ()): name , contents = folder path = ( * path , name ) yield '/' . join ( path ) for item in contents : if isinstance ( item , str ): yield '/' . join (( * path , item )) else : yield from get_paths_generator ( item , path )
yield from will yield all the values in another generator, one by one;
Footnotes¶
footer
Above, hopefully you discovered using generators to improve new (and old) recursive code in Python;
Also, James Powell gave a great talk:
Python Generators || James Powell – YouTube
For more ideas on how generators can be extended,
Worth a look if you’re interested;
refer.¶
Every serious technical blogger has a lot of accumulation worth intensive reading, such as this Tushar Sadhwani ‘s:
– Understanding all of Python, through its builtins¶
This article is transferred from: https://blog.zoomquiet.io/lazy_recursion_with_generators.html
This site is only for collection, and the copyright belongs to the original author.