Scriptone

How to Convert Nested JSON (Dictionary) to CSV or DataFrame in Python

Table of Contents

Weather Web Service API to obtain JSON, and using a recursive function to extract all values from the nested JSON, then write code to convert it to Excel, CSV, Pandas DataFrame, etc. I will explain it simply.

Code

import time

import requests as req import pandas as pd

def conv_to_2d(objct, parent=None, num=None): for key, vals in objct.items(): # keyの設定 if parent is not None and num is not None: abs_key = ”{}.{}.{}“.format(parent, key, num) elif parent is not None: abs_key = ”{}.{}“.format(parent, key) else: abs_key = key

    <span class="hljs-comment"># valsのタイプごとに処理を分岐</span>
    <span class="hljs-keyword">if</span> <span class="hljs-built_in">type</span>(vals) <span class="hljs-keyword">is</span> <span class="hljs-built_in">dict</span>:
        <span class="hljs-keyword">yield</span> <span class="hljs-keyword">from</span> conv_to_2d(objct=vals, parent=key)
    <span class="hljs-keyword">elif</span> <span class="hljs-built_in">type</span>(vals) <span class="hljs-keyword">is</span> <span class="hljs-built_in">list</span>:
        val_list = []
        <span class="hljs-keyword">for</span> n, val <span class="hljs-keyword">in</span> <span class="hljs-built_in">enumerate</span>(vals):
            is_target = [<span class="hljs-built_in">type</span>(val) <span class="hljs-keyword">is</span> <span class="hljs-built_in">int</span>, <span class="hljs-built_in">type</span>(val) <span class="hljs-keyword">is</span> <span class="hljs-built_in">float</span>, <span class="hljs-built_in">type</span>(val) <span class="hljs-keyword">is</span> <span class="hljs-built_in">bool</span>]
            <span class="hljs-keyword">if</span> <span class="hljs-built_in">type</span>(val) <span class="hljs-keyword">is</span> <span class="hljs-built_in">str</span>:
                <span class="hljs-keyword">if</span> val:
                    val_list += [val]
            <span class="hljs-keyword">elif</span> <span class="hljs-built_in">any</span>(is_target):
                num_str = <span class="hljs-built_in">str</span>(val)
                <span class="hljs-keyword">if</span> num_str:
                    val_list += [num_str]
            <span class="hljs-keyword">elif</span> <span class="hljs-built_in">type</span>(val) <span class="hljs-keyword">is</span> <span class="hljs-built_in">dict</span>:
                <span class="hljs-keyword">yield</span> <span class="hljs-keyword">from</span> conv_to_2d(objct=val, parent=abs_key, num=n)
        <span class="hljs-keyword">if</span> val_list:
            <span class="hljs-keyword">yield</span> abs_key, <span class="hljs-string">","</span>.join(val_list)
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">yield</span> abs_key, vals

def get_json(url): r = req.get(url) return r.json()

def main(): base_url = http://weather.livedoor.com/forecast/webservice/json/v1?city={} city_id = [“400040”, “290010”, “270000”, “190010”, “130010”, “015010”, “473000”, “350010”] url_list = [base_url.format(id_) for id_ in city_id]

weather_table = []
<span class="hljs-keyword">for</span> url <span class="hljs-keyword">in</span> url_list:
    res = get_json(url)
    record = {key: val <span class="hljs-keyword">for</span> key, val <span class="hljs-keyword">in</span> conv_to_2d(res)}
    weather_table.append(record)
    time.sleep(<span class="hljs-number">1</span>)
df = pd.DataFrame(weather_table)
df.to_excel(<span class="hljs-string">"sample.xlsx"</span>, index=<span class="hljs-literal">False</span>)

if name == main: main()


Processing Content

STEP1: Setting the URL

To run the entire program, execute the main function and follow the processes in order. By inputting city_id after the ’=’ in http://weather.livedoor.com/forecast/webservice/json/v1?city=, you can obtain weather information for the city corresponding to city_id. We create a list with city_id inputs, and using list comprehension in the url_list variable, we generate a list of URLs with city_id embedded.

STEP2: Obtaining JSON

Outside the for loop, we create an empty list weather_table to store the responses. We loop through url_list and use the get_json function to obtain JSON. The get_json function takes a URL as an argument, uses requests to send an HTTP GET request to the API to get the response, reads it as JSON, and returns the value to the main function.

STEP3: Converting JSON to Two-Dimensional

This is the main theme of this post. We use a recursive function to obtain key-value pairs from the acquired JSON. conv_to_2d is the function we wrote this time. Since conv_to_2d acts as a generator function, it can obtain key-value pairs sequentially, so we generate a dictionary record using dictionary comprehension. A recursive function refers to a process where function A calls itself (function A).

Reasons for Using Recursive Function

The reason for using a recursive function this time is that, considering the use of other JSON data, the amount of data is unknown, and the level of nesting is unknown, we want to assume that it can be converted as long as it is in the correct JSON format. In that case, it is necessary to handle N-level nesting, and the recursive function is the optimal method to achieve that processing.

Stopping Condition for Recursive Function

Since the conv_to_2d function calls itself, if executed unconditionally, conv_to_2d will be called infinitely and the process will not end (resulting in an error). Therefore, we need some condition to stop the function, and in this case, the stopping condition is until there are no more JSON values to read. I wrote it without using return, waiting for all values to be returned while yielding them, as it seems to make the process flow simpler.

Arguments and Usage of the Function

Argument

Description

objct

Dictionary-type object

parent=None

Value of the parent key

num=None

Index of the list


The arguments consist of objct to read the dictionary-type object, parent to read the value of the parent key, and num to indicate the index number of the list. When calling the function for the first time, only specify objct as the argument.

Operation

When calling the function, since we are reading the top layer of the dictionary, there is no parent key or list index. Therefore, in the main function, we execute it by specifying only objct. After execution, we use a for loop and items() to break down the dictionary into keys and values and check them individually.

Key

If there is a parent key or index number, we combine the parent information and the child key read in the for loop, separated by ’.’, to express the nested structure.

Value

We branch the processing depending on whether the value is a dictionary, list, string, number, or boolean. If it is a dictionary, we call conv_to_2d with the key to check the contents of the deeper dictionary layer. This realizes N-level for loop operation. If it is a list, we obtain the list index number using enumerate while checking again whether it is a dictionary or string, etc. We do not consider lists within lists. If there is a dictionary within the list, we recursively call conv_to_2d with the parent key and index number. When you want to obtain the value of the recursive function as a generator function, add yield from before the recursive call.

STEP4: Registering the Record


We store the read dictionary record in weather_table. Although it is a lightweight and small amount of processing, we wait for 1 second to avoid putting a load on the server.

STEP5: Generating and Converting Data Frame

After the for loop finishes, we load weather_table into Pandas and convert it to a DataFrame. It is structured as a two-dimensional list > dictionary type, and even if the key configurations are different, it can be nicely converted to a table. Once converted to a DataFrame, you can perform filtering or conversion as usual with Pandas, and convert it to any format such as Excel, CSV, TSV, Pickle, or HTML.

Output Image

※Due to data volume constraints, columns are significantly abbreviated.

pinpointLocations.name.0

pinpointLocations.link.1

pinpointLocations.name.1

Ōmuta City

http://weather.livedoor.com/area/forecast/4020300

Kurume City

Nara City

http://weather.livedoor.com/area/forecast/2920200

Yamatotakada City

Osaka City

http://weather.livedoor.com/area/forecast/2714000

Sakai City

Kōfu City

http://weather.livedoor.com/area/forecast/1920500

Yamanashi City

Chiyoda Ward

http://weather.livedoor.com/area/forecast/1310200

Chūō Ward

Muroran City

http://weather.livedoor.com/area/forecast/0121300

Tomakomai City

Miyakojima City

http://weather.livedoor.com/area/forecast/4737500

Tarama Village

Shimonoseki City

http://weather.livedoor.com/area/forecast/3520200

Ube City


Summary

Nested JSON and recursive functions are no longer scary. I hope this helps with analysis and understanding of recursive functions!