Data Engineering

Subtle Difference in Dockerfile and Dockercompose – Variables in Entrypoints

Subtle Difference in Dockerfile and Dockercompose – Variables in Entrypoints

Hung ManhMay 31, 20243 min read

TLDR: Variables in Entrypoints should be escaped. This can be done by using a second $. Background While setting up a Spark Thrift Server i encountered a – in retrospective – obvious oversight. I would always get the following Error,…

Spark – Error with UTF8 encoding in Docker Image

Spark – Error with UTF8 encoding in Docker Image

Hung ManhMar 24, 20243 min read

In German, we encounter special characters known as Umlaute, including ä, ü, ö. If the configuration is not correctly set, encoding these symbols may result in information loss. Let’s explore a practical example where such a misconfiguration led to a…

Spark – java.nio.channels.UnresolvedAddressException

Spark – java.nio.channels.UnresolvedAddressException

Hung ManhFeb 1, 20244 min read

A very short writedown of the following error, which apperently this user also encountered and documented (github). Be aware, that this error code might appear in several scenarios. It just happened, that in my specific situation, it was an easy…

Exasol – object XXX not found

Exasol – object XXX not found

Hung ManhJan 20, 20242 min read

TLDR: Identifiers in Exasol are stored in upper case internally. Selections should also be quoted. Observation: In Exasol I created a Python User Defined Function like this: CREATE OR REPLACE PYTHON3 SCALAR SCRIPT “SCHEMA”.”PARSE_XML” (“xml” VARCHAR(2000000) UTF8) EMITS (“parsed_column” VARCHAR(2000000)…

dummy-img

Python – Pass by object: Practical pitfall

Hung ManhSep 19, 20233 min read

Inside a loop I was accessing an object within a dictionary multiple times, transform and visualize it. The intention was, to have all transformation isolated from each other. What actually happened though, was that those transformations accumulated because of Python’s…

Duplicate Keys when Generating a Json from a Dictionary in Python

Duplicate Keys when Generating a Json from a Dictionary in Python

Hung ManhJan 25, 20232 min read

TLDR: A dictionary in json treats all keys as string, while a python dict distinguishes not only between the content but also its datatype (see stackoverflow). When saving a dictionary into a json and reloading the dictionary from it, you…

How To Create A Superset Guest Token With Python To Embed Dashboards

How To Create A Superset Guest Token With Python To Embed Dashboards

Hung ManhDec 30, 20224 min read

The ulterior motive is to embed a Superset Dashboard into e.g. a REACT application. To achieve this, one step includes the creation of guest tokens (service accounts). This process is (in my opinion) not sufficiently well documented, which is why…

Airflow – Fill Dagbag takes too long

Airflow – Fill Dagbag takes too long

Hung ManhDec 14, 20223 min read

TLDR: It is possible to dynamically create dags with only one dag script. However, at task execution the original dag script will be parsed once again. This results in unnecessary parsing iterations of dags, which are not the parent dag…

Migrating existing OCI Kubernetes to VCN-Native Cluster with Terraform

Migrating existing OCI Kubernetes to VCN-Native Cluster with Terraform

Hung ManhDec 2, 20222 min read

Your OCI Kubernetes Cluster might have a little tool tip which states “migration required”. This is because, “in earlier releases (before March 16, 2021), Container Engine for Kubernetes provisioned clusters with Kubernetes API endpoints that were not integrated into your…

Using pushdataset in PowerBI to create near real time logging dashboard

Using pushdataset in PowerBI to create near real time logging dashboard

Hung ManhMar 1, 20224 min read

Recently i participated in a hackerthon, in which the goal was to create a near real time monitoring dashboard using Microsoft PowerBI. The data was already generated and persisted in SQLServer and needed to be queried efficiently. Since i am…

hungsblog | Nguyen Hung Manh | Dresden
Scroll to Top