When trying to convert a DataFrame into a YAML format, I encountered the following output:
? !!python/object/apply:sqlalchemy.sql.elements.quoted_name
- read_connection_id
- null
: oracle_orbis
Whereas read_connection_id
is the columnname and oracle_orbis
is the value
Context
I was working with a PostgreSQL table and needed to transform its content into YAML files. The general process looked like this:
data:image/s3,"s3://crabby-images/4dd4a/4dd4a1ab72d010c2a2639fdb240ffaac4118dce0" alt=""
However, during the YAML conversion step, an error appeared related to quoted_name
. The issue originated from SQLAlchemy, where column names were not recognized as regular strings but rather as sqlalchemy.sql.elements.quoted_name
objects.
Reproducing the Issue
Reading a PostgreSQL table and converting it to YAML:
from sqlalchemy import create_engine
import pandas as pd
import yaml
engine = create_engine(db_uri)
df = pd.read_sql_table(table_name, con=engine, schema=schema_name)
for index, row in df.iterrows():
row_dict = row.to_dict()
yaml_data = yaml.dump(row_dict, default_flow_style=False, sort_keys=False)
This resulted in to yaml dumps with the above mentioned output leading with python/object/apply:sqlalchemy.sql.elements.quoted_name
.
Why Does This Happen?
SQLAlchemy represents column names using quoted_name
objects instead of plain strings (docs), to prevent modification during ‘name normalization’ process. You can convert column names to strings using df.columns = df.columns.astype(str)
. However, when iterating through each row with df.iterrows()
and converting each row to a dictionary using .to_dict()
, the quoted_name
object seems to persist as the column name. When YAML serialization occurs, these objects do not convert automatically to standard strings, leading to serialization issues.
Solution
To fix this, we need to ensure that all dictionary keys are converted into plain strings. Here’s a helper function:
from sqlalchemy.sql.elements import quoted_name
def convert_keys_to_strings(d):
"""Recursively convert dictionary keys to plain strings, fixing quoted_name issues."""
if isinstance(d, dict):
return {str(k) if isinstance(k, quoted_name) else k: convert_keys_to_strings(v) for k, v in d.items()}
return d
Final Working Code
Using the helper function before YAML serialization ensures a clean transformation:
for index, row in df.iterrows():
row_dict = row.to_dict()
filtered_dict = remove_none_values(convert_keys_to_strings(row_dict))
yaml_data = yaml.dump(row_dict, default_flow_style=False, sort_keys=False)
Conclusion
The error occurs because SQLAlchemy uses quoted_name
objects for column names instead of regular strings. The solution is to recursively convert dictionary keys to plain strings before YAML serialization.