A very short writedown of the following error, which apperently this user also encountered and documented (github). Be aware, that this error code might appear in several scenarios. It just happened, that in my specific situation, it was an easy solve.
Setup
I run a Spark image (spark:3.5.0-scala2.12-java11-python3-ubuntu) in a Docker Compose.
spark-master:
build:
context: build/spark
target: master
environment:
SPARK_MODE: master
SPARK_MASTER_HOST: spark-master # SHOULD BE: 0.0.0.0
SPARK_MASTER_PORT: ${SPARK_MASTER_PORT}
SPARK_MASTER_WEBUI_PORT: ${SPARK_MASTER_WEBUI_PORT}
spark-worker:
build:
context: build/spark
target: worker
environment:
SPARK_MODE: worker
SPARK_MASTER_URL: spark://spark-master:${SPARK_MASTER_PORT}
To start a master server, the following command is executed:
command: "/opt/spark/bin/spark-class org.apache.spark.deploy.master.Master --ip ${SPARK_MASTER_HOST} --port 7077 --webui-port ${SPARK_MASTER_WEBUI_PORT}"
However, the container shut downs almost immediately with the following error code
24/05/07 13:54:03 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
java.nio.channels.UnresolvedAddressException
at java.base/sun.nio.ch.Net.checkAddress(Unknown Source)
at java.base/sun.nio.ch.ServerSocketChannelImpl.bind(Unknown Source)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:141)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:562)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:600)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:579)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:260)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:356)
at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)
24/05/07 13:54:03 INFO ShutdownHookManager: Shutdown hook called
Solution
So actually ChatGPT gave me the right hint, he said:
The error message you’re encountering, java.nio.channels.UnresolvedAddressException
, typically occurs when Spark (or an underlying networking library like Netty) is trying to bind to a network address that cannot be resolved. This error generally indicates that the application is attempting to use an incorrect or non-existent hostname or IP address.
Possible Causes and Solutions
- Invalid Hostname/IP Address:
- Ensure that the hostname or IP address you’re using is valid and resolvable. Check the network settings and any environment variables used to specify the addresses.
- In a Docker environment, confirm that all services and containers can correctly resolve the specified hostnames.
- Misconfigured Bind Address:
- The bind address is the network interface Spark attempts to bind to. Verify the values of environment variables like
SPARK_MASTER_HOST
,SPARK_LOCAL_IP
, or any other bind address parameters.
- The bind address is the network interface Spark attempts to bind to. Verify the values of environment variables like
That gave me the necessary hint. I just defined SPARK_MASTER_HOST wrongly and should have defined it as 0.0.0.0 as the application is running in a Docker Container.
Thanks, this helped me out! In my situation, I have a local cluster setup using docker-compose. Docker compose sets up each container with a unique hostname based on the name of the container in docker-compose.yml, so I had SPARK_MASTER_HOST set to spark-master in my Dockerfile. I want to use the same Dockerfile when deploying to a k8s cluster, so I updated it to use 0.0.0.0 for SPARK_MASTER_HOST when it does that. Immediately fixed my issue!