LoginSignup
4
1

More than 5 years have passed since last update.

Python For Loop Scope and Dynamic/Recursive Json Parsing

Last updated at Posted at 2017-10-28

I had been in a trouble for dynamically/recursively parsing and updating json. It took much time more than I expected. In the end, It turned out what made me confused was the difference in the concept of "scope" between python 2.7 and Java/C. I was writing code in python 2.7 but thinking in Java/C way. In this post, I am going to describe what trouble I had and what difference between python 2.7 and Java/C brought me into that struggle.
Most code written in this post is reformatted and modified for the purpose of explanation. As a result, they will possibly not work or have some bugs. Also, this time because of some reason, I needed to write code in python 2.7 instead of python 3, which I am used to. I researched about this issue mostly for python 2.7 and didn't compare with python 3, therefore, python's behaviors described in this post will not likely happen in python 3. Reading through some sources, I guess some of these behaviors are modified in python 3.

1. Purpose

First of all, without knowing what I was trying to do, it is hard to understand what was the problem.
In short, I just wanted to parse and update some json format input dynamically and recursively.

Let's say there is a json input like this:

{ 
  "key1": 1, 
  "key2": {
      "key3": [3, 4], 
      "key4": { 
      ...omitted....
      }
  }
}

One of the most simple way to parse the json and get the value of the key key3is like this:

key3_value = json["key2"]["key3"]

On the other hand, if you want to update the value of the key key3, this is so simple as well:

json["key2"]["key3"] = key3_value 

Using these simple ways, however, makes the code very less generalized. If you want to parse the value of key4, then you need to add a new line in your code.

key4_value = json["key2"]["key4"]  

It is so inefficient. What makes matters worse is that it is possible to write this kind of code only when you know the exact path to the key, or worse, a path to a certain key could be not static and change often. In that case, you need to write code every time the format of json changed. There are many API we can use around us. They returns many json in various format. Indeed you need to know each json tree structure somewhat at least. But, if you could write code which is well generalized to parse various json, it is very useful.
For that purpose, I was trying to write code which is capable of parsing through a json tree recursively/dynamically and updating key-value pairs at a certain point in the json tree.

2. Issue

In fact, if the purpose is only for parsing (but not updating), it was not so confusing. Just write simple code like this:

def recursively_parse_json(input_json, target_key):
    if type(input_json) is dict and input_json:
        if key == target_key:
            print input_json[key]    
        for key in input_json:
            recursively_parse_json(input_json[key], target_key)

    elif type(input_json) is list and input_json:
        for entity in input_json:
            recursively_parse_json(entity, target_key)

Okay, this code returns the value of target_key. (But be sure that target_key should be unique key in the json tree.)
Then, let's say you want to update the value of target_key, what kind of line should be added?
In very simple example, you can add this kind of code. (This is so hard-coding that you can't write this code without knowing how many keys exist in the path to target_key. But here I want you to understand that to update value at a certain point in a json tree, you need to know the path (keys to the target) at least.)

input_json["path"]["to"]["the"]["target_key"] = update_value  

In order to implement this code, at that time I was thinking in Java/C way so I thought in the following way.
First, you need to know path to target_key. As a result, It is necessary to store keys in a list while digging through a json input. Then, you will get this kind of list: list=["path", "to", "the", "target_key"]. Now you can use this list to specify the place of target_key the json tree.

(Sorry for confusing statement, but I would like to make it clear. It was wrong. In python 2.7 (3 as well?), in reality, you don't need this kind of json handling for updating value at a certain point in json tree. While in nested "for loop", you can easiliy update value. I am going to explain this later. But anyway, here I want you to know that I thought I need the list which stores keys(path) to target_key while digging through the json tree to update value properly.)

So I modified the above code for storing keys (path) into list while parsing json dynamically and recursively until reaching target_key.

def track_path_to_key(input_json, target_key, update_value, path_tracker = None):
    if path_tracker is None:
        path_tracker = []

    if type(input_json) is dict and input_json:
        for key in input_json:
            path_tracker = path_tracker + [key]
            if key == target_key:
                return path_tracker

            if type(input_json[key]) is dict or type(input_json[key]) is list:
                path_tracker = track_path_to_key(input_json[key], target_key, update_value, path_tracker)

    elif type(input_json) is list and input_json:
        for entity in input_json:
            path_tracker = path_tracker + [input_json.index(input_json)]

            if type(input_json[key]) is dict or type(input_json[key]) is list:
                path_tracker = track_path_to_key(entity, target_key, update_value, path_tracker)

    return path_tracker                

At this point, I encounterd the scope issue. After running the above code, the list contained keys which I didn't expect.

3. Variable Scope: Python 2.7 vs Java/C

(As I was thinking in Java/C way), what I expected was like this:
At the place the red arrow is pointing, path list should like list=["path", "to", "KeyA"]. Here it is not confusing yet.
(Because of my literary issue, I would like to use figures to explain. I think it is needless to say, but each leaf has value.))

image

Now, you moved to the next branch, here what I expected was list=["path", "to", "the"]. But the actual value waslist=["path", "to", "KeyA", "KeyB", "the"]. Why did this happen?

image

Under the to node, the process accesses to each child node in a "for loop". In my understanding, I thought each process A and B are in their own loop and have their own variable scope. In Java and C, variable scope in for loop work in that way. As my first programming language was C, I was stuck to the idea so was not be able to understand why the list stored keys which should not be there.

image

After a while, I found a thread in StackOverflow discussing block/scope in loops in Python: https://stackoverflow.com/questions/3611760/scoping-in-python-for-loops

According to the thread, in short, a local block/scope is created in each loop in Java/C but it is not in Python. (I wanted to find document which mentions about the coverage of scope in for loop.) Reading through other sources, I thought this scope issue is the most possible cause of the strange behavior. Then, the next question is how is it possible to define a local scope in each loop? (Or maybe you could write a good algorithm just using global scope and variable.)

Finally, I found a document says:

Variable scope and lifetime
A variable which is defined inside a function is local to that function. It is accessible from the point at which it is defined until the end of the function, and exists for as long as the function is executing.
http://python-textbok.readthedocs.io/en/1.0/Variables_and_Scope.html

Okay, I got it. I modified the code above and got the following.

def track_path_to_key(input_json, target_key, update_value, path_tracker = None ,  last_key = None):
    if path_tracker is None:
        path_tracker = []

    if last_key is not None:
        #When function is called, it creates its own scope/block so that changes to variables in function process is isolated from global scope.
        path_tracker = list(path_tracker + [last_key])

    if type(input_json) is dict and input_json:
        for key in input_json:
            if key == target_key:
                return path_tracker

            if type(input_json[key]) is dict or type(input_json[key]) is list:
                path_tracker = track_path_to_key(input_json[key], target_key, update_value, path_tracker, key)

    elif type(input_json) is list and input_json:
        for entity in input_json:
            path_tracker = track_path_to_key(entity, target_key, update_value, path_tracker, input_json.index(entity))

    else:
        if path_tracker: #If path_tracker is not empty
            path_tracker.pop()

    return path_tracker

I moved the line, where keys are stored intopath_tracker, to the beginning of the function instead of inside of for loop. Now, it returns expected list ["path", "to", "the", "target_key"] when it find the target_key while parsing json tree recursively and dynamically.

4. Update Json

Now you got the list which contains keys as a path to a certain key in a json tree. Is it time to start writing code which update value in a deep nested json? No it is not. In fact, it turned out it was completely not necessary to store keys as a path to specify where in a json tree.
All you have to do is just update the value of target_key just when you find it while recursively/dynamically parsing a json.

def update_panel_json(input_json, target_key, update_value):
    if type(input_json) is dict and input_json:
        for key in input_json:
            if key == target_key:
                input_json[key] = update_value
            update_panel_json(input_json[key], target_key, update_value)

    elif type(input_json) is list and input_json:
        for entity in input_json:
            update_panel_json(entity, target_key, update_value)

To tell the truth, I am still not sure why it works.
Parsing through the json tree and at the place the red arrow is pointing, it will find target_key and update its value. I understand how it finds the key and update the value. This is just normal json operation. But what I can't understand is why the update affect the global input_json instead of local one. Is this related to that in Python "for loop" does not have local scope? While parsing a json tree, json object also store the information like at which level it is being parsed ?
image

Anyway, even though this scope issue was confusing to me, but it was also very interesting to me. I didn't expect that there is such a big difference in the concept of scope among programming languages because it is a very basic one.

4
1
1

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
1