Inside a loop I was accessing an object within a dictionary multiple times, transform and visualize it. The intention was, to have all transformation isolated from each other. What actually happened though, was that those transformations accumulated because of Python’s Pass by object reference behavior. Let’s look into this:
# create a Dataframe and put it in a dictionary object
d = pd.DataFrame({'blog':[0,1,2,3,4]})
dict = {'hung':d}
print(f"ObjectID:{id(d)} : {d['blog'].values}")
# Loop two times and do a transformation
for i in range(2):
df = dict['hung']
df['blog'] = -df['blog']
print(f"ObjectID:{id(df)} : {d['blog'].values}")
# Output
# ObjectID:136417234566688 : [0 1 2 3 4]
# ObjectID:136417234566688 : [ 0 -1 -2 -3 -4]
# ObjectID:136417234566688 : [0 1 2 3 4]
I would have expected, that the Dataframe contained in the dictionary is assigned to the variable df and every modification on df would not effect the underlying Dataframe. However, as the output reveals, each loop utilizes the same objectid, which means, that the underlying Dataframe is in fact affected by the modification in each loop.
That is because in Python the Dataframe is technically not passed as an isolated copy but always as a reference. More precisely, the “pointer” to the object d and that corresponding memory space has been passed to the variable df. Any changes to d would also effect the underlying Dataframe df.
To elaborate on this idea, the concept of call by value would create a copy of the passing variable, while call by reference will pass the memory space. In Python, variables are passed by value, with the distinction, that the values are references to a memory space. So when a variable is assigned to another variable, the object it refers to is not copied. Instead, a new reference to the same underlying object is created. If the new variable is changed, the underlying object is changed as well, which can give the impression of “call by reference”.
However, this only applies to mutable objects. Immutable objects like int or strings can not be modified and any operation that seems to modify them actually creates a new object. This behavior can sometimes make it appear as though the object is being “call by value.”
References:
https://plainenglish.io/blog/pass-by-object-reference-in-python-79a8d92dc493
https://robertheaton.com/2014/02/09/pythons-pass-by-object-reference-as-explained-by-philip-k-dick/