How to Convert Nested JSON (Dictionary) to CSV or DataFrame in Python

Table of Contents
Weather Web Service API to obtain JSON, and using a recursive function to extract all values from the nested JSON, then write code to convert it to Excel, CSV, Pandas DataFrame, etc. I will explain it simply.
Code
import time
import requests as req
import pandas as pd
def conv_to_2d(objct, parent=None, num=None):
for key, vals in objct.items():
# keyの設定
if parent is not None and num is not None:
abs_key = ”{}.{}.{}“.format(parent, key, num)
elif parent is not None:
abs_key = ”{}.{}“.format(parent, key)
else:
abs_key = key
<span class="hljs-comment"># valsのタイプごとに処理を分岐</span>
<span class="hljs-keyword">if</span> <span class="hljs-built_in">type</span>(vals) <span class="hljs-keyword">is</span> <span class="hljs-built_in">dict</span>:
<span class="hljs-keyword">yield</span> <span class="hljs-keyword">from</span> conv_to_2d(objct=vals, parent=key)
<span class="hljs-keyword">elif</span> <span class="hljs-built_in">type</span>(vals) <span class="hljs-keyword">is</span> <span class="hljs-built_in">list</span>:
val_list = []
<span class="hljs-keyword">for</span> n, val <span class="hljs-keyword">in</span> <span class="hljs-built_in">enumerate</span>(vals):
is_target = [<span class="hljs-built_in">type</span>(val) <span class="hljs-keyword">is</span> <span class="hljs-built_in">int</span>, <span class="hljs-built_in">type</span>(val) <span class="hljs-keyword">is</span> <span class="hljs-built_in">float</span>, <span class="hljs-built_in">type</span>(val) <span class="hljs-keyword">is</span> <span class="hljs-built_in">bool</span>]
<span class="hljs-keyword">if</span> <span class="hljs-built_in">type</span>(val) <span class="hljs-keyword">is</span> <span class="hljs-built_in">str</span>:
<span class="hljs-keyword">if</span> val:
val_list += [val]
<span class="hljs-keyword">elif</span> <span class="hljs-built_in">any</span>(is_target):
num_str = <span class="hljs-built_in">str</span>(val)
<span class="hljs-keyword">if</span> num_str:
val_list += [num_str]
<span class="hljs-keyword">elif</span> <span class="hljs-built_in">type</span>(val) <span class="hljs-keyword">is</span> <span class="hljs-built_in">dict</span>:
<span class="hljs-keyword">yield</span> <span class="hljs-keyword">from</span> conv_to_2d(objct=val, parent=abs_key, num=n)
<span class="hljs-keyword">if</span> val_list:
<span class="hljs-keyword">yield</span> abs_key, <span class="hljs-string">","</span>.join(val_list)
<span class="hljs-keyword">else</span>:
<span class="hljs-keyword">yield</span> abs_key, vals
def get_json(url):
r = req.get(url)
return r.json()
def main():
base_url = “http://weather.livedoor.com/forecast/webservice/json/v1?city={}”
city_id = [“400040”, “290010”, “270000”, “190010”, “130010”, “015010”, “473000”, “350010”]
url_list = [base_url.format(id_) for id_ in city_id]
weather_table = []
<span class="hljs-keyword">for</span> url <span class="hljs-keyword">in</span> url_list:
res = get_json(url)
record = {key: val <span class="hljs-keyword">for</span> key, val <span class="hljs-keyword">in</span> conv_to_2d(res)}
weather_table.append(record)
time.sleep(<span class="hljs-number">1</span>)
df = pd.DataFrame(weather_table)
df.to_excel(<span class="hljs-string">"sample.xlsx"</span>, index=<span class="hljs-literal">False</span>)
if name == “main”:
main()
Processing Content
STEP1: Setting the URL
To run the entire program, execute the main function and follow the processes in order. By inputting city_id after the ’=’ in http://weather.livedoor.com/forecast/webservice/json/v1?city=
, you can obtain weather information for the city corresponding to city_id. We create a list with city_id inputs, and using list comprehension in the url_list variable, we generate a list of URLs with city_id embedded.
STEP2: Obtaining JSON
Outside the for loop, we create an empty list weather_table
to store the responses. We loop through url_list and use the get_json function to obtain JSON. The get_json function takes a URL as an argument, uses requests to send an HTTP GET request to the API to get the response, reads it as JSON, and returns the value to the main function.
STEP3: Converting JSON to Two-Dimensional
This is the main theme of this post. We use a recursive function to obtain key-value pairs from the acquired JSON. conv_to_2d
is the function we wrote this time. Since conv_to_2d
acts as a generator function, it can obtain key-value pairs sequentially, so we generate a dictionary record using dictionary comprehension. A recursive function refers to a process where function A calls itself (function A).
Reasons for Using Recursive Function
The reason for using a recursive function this time is that, considering the use of other JSON data, the amount of data is unknown, and the level of nesting is unknown, we want to assume that it can be converted as long as it is in the correct JSON format. In that case, it is necessary to handle N-level nesting, and the recursive function is the optimal method to achieve that processing.
Stopping Condition for Recursive Function
Since the conv_to_2d function calls itself, if executed unconditionally, conv_to_2d will be called infinitely and the process will not end (resulting in an error). Therefore, we need some condition to stop the function, and in this case, the stopping condition is until there are no more JSON values to read. I wrote it without using return, waiting for all values to be returned while yielding them, as it seems to make the process flow simpler.
Arguments and Usage of the Function
Argument | Description |
---|---|
objct | Dictionary-type object |
parent=None | Value of the parent key |
num=None | Index of the list |
The arguments consist of objct to read the dictionary-type object, parent to read the value of the parent key, and num to indicate the index number of the list. When calling the function for the first time, only specify objct as the argument.
Operation
When calling the function, since we are reading the top layer of the dictionary, there is no parent key or list index. Therefore, in the main function, we execute it by specifying only objct. After execution, we use a for loop and items() to break down the dictionary into keys and values and check them individually.
Key
If there is a parent key or index number, we combine the parent information and the child key read in the for loop, separated by ’.’, to express the nested structure.
Value
We branch the processing depending on whether the value is a dictionary, list, string, number, or boolean. If it is a dictionary, we call conv_to_2d
with the key to check the contents of the deeper dictionary layer. This realizes N-level for loop operation. If it is a list, we obtain the list index number using enumerate while checking again whether it is a dictionary or string, etc. We do not consider lists within lists. If there is a dictionary within the list, we recursively call conv_to_2d
with the parent key and index number. When you want to obtain the value of the recursive function as a generator function, add yield from
before the recursive call.
STEP4: Registering the Record
We store the read dictionary record in weather_table
. Although it is a lightweight and small amount of processing, we wait for 1 second to avoid putting a load on the server.
STEP5: Generating and Converting Data Frame
After the for loop finishes, we load weather_table
into Pandas and convert it to a DataFrame. It is structured as a two-dimensional list > dictionary type, and even if the key configurations are different, it can be nicely converted to a table. Once converted to a DataFrame, you can perform filtering or conversion as usual with Pandas, and convert it to any format such as Excel, CSV, TSV, Pickle, or HTML.
Output Image
※Due to data volume constraints, columns are significantly abbreviated.
pinpointLocations.name.0 | pinpointLocations.link.1 | pinpointLocations.name.1 |
---|---|---|
Ōmuta City | Kurume City | |
Nara City | Yamatotakada City | |
Osaka City | Sakai City | |
Kōfu City | Yamanashi City | |
Chiyoda Ward | Chūō Ward | |
Muroran City | Tomakomai City | |
Miyakojima City | Tarama Village | |
Shimonoseki City | Ube City |
Summary
Nested JSON and recursive functions are no longer scary. I hope this helps with analysis and understanding of recursive functions!