Duplicate Keys when Generating a Json from a Dictionary in Python

TLDR: A dictionary in json treats all keys as string, while a python dict distinguishes not only between the content but also its datatype (see stackoverflow). When saving a dictionary into a json and reloading the dictionary from it, you have to be careful not to implicitely convert the original numeric key into a key of datatype string.

Observation

In an experiment i am processing data in multiple runs, calculating appropriate metrics and save those results in a dictionary. I would then save this dictionary in a json file, with the run number being the keys and the metrics as values. By loading the saved json into a dictionary, i can continue my experiment at any time. However it appears, that when i load the data as a dictionary, running my experiment and saving the dictionary again as json, it would create duplicate json keys.

Resolution

This behavor appears, because the keys in a dictionary can be distinguished not only based on the key “content” but also its datatype, while keys in a json are always stored as strings. A small example shall illustrate the subtle dilemma.

import json
###########
# create a simple dictionary with a numeric key
##########
d = {}
d[1] = "hungsblog.com"
# output: d = {1: 'hungsblog.com'}

##########
# Dumping the dictionary into a json
##########
j = json.dumps(d, indent=4)
# output: j = 
#{
#    "1": "hungsblog.com",
#}

##########
# Loading the json into a dictionary and assigining the same numeric key
##########
d = json.loads(j)
d[1] = "hungsblog.com"
# output: d = {1: 'hungsblog.com', '1': 'hungsblog.com'}

We first create the value and map it to a numeric key in the python dictionary. Then we save it as json and load it back again as dictionary. However, the key is now loaded as a string instead of an integer. When we insert virtually the same key as before, we seem to create a duplicate key entry in our python dictionary but with different data types.

Furthermore, if we save the dictionary as json again, the json will contain both key value pairs, where the keys are strings both times. Now loading back the json as a dictionary implicitely selects only the last key value entry (stackoverflow).

Leave a Comment

Your email address will not be published. Required fields are marked *

hungsblog | Nguyen Hung Manh | Dresden
Scroll to Top