[ Spark ] 스파크 작업 중 만난 "no space left on device" 에러

2019. 6. 12. 09:05

스파크 작업 중 다음과 같은 에러를 만났다.

19/06/11 17:44:14 INFO scheduler.DAGScheduler: ResultStage 9 (saveAsTextFile at MainHanhwa.scala:51) failed in 31.631 s due to Job aborted due to stage failure: Task 108 in stage 9.0 failed 1 times, most recent failure: Lost task 108.0 in stage 9.0 (TID 529, localhost, executor driver): java.lang.OutOfMemoryError: error while calling spill() on org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@33e9c68d : 장치에 남은 공간이 없음
    at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:183)
    at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:249)
    at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:112)
    at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:332)
    at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:347)
    at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:91)
    at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168)
    at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
    at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

이 때 스파크 설정은 다음과 같았다.

spark-submit \
    --name "hanhwa-job" \
    --master yarn \
    --driver-memory 2g \
    --executor-memory 4g \
    --class spark.datatech.MainHanhwa \
    /home1/irteam/dbkim/hanhwa/spark-gradle-1.6.x-0.1.jar

원인을 찾아본 결과 다음과 같은 스택오버플로우 내용이 있었다.

Questions :

When performing a shuffle my Spark job fails and says "no space left on device", but when I run df -h it says I have free space left! Why does this happen, and how can I fix it?

Answers 1:

By default Spark uses the /tmp directory to store intermediate data. If you actually do have space left on some device -- you can alter this by creating the file SPARK_HOME/conf/spark-defaults.confand adding the line. Here SPARK_HOME is wherever you root directory for the spark install is.

spark.local.dir SOME/DIR/WHERE/YOU/HAVE/SPACE

출처 : https://stackoverflow.com/questions/25707784/why-does-a-job-fail-with-no-space-left-on-device-but-df-says-otherwise

Answer2:

This is because Spark create some temp shuffle files under /tmp directory of you local system.You can avoid this issue by setting below properties in your spark conf files.

Set this property in spark-evn.sh.

SPARK_JAVA_OPTS+=" -Dspark.local.dir=/mnt/spark,/mnt2/spark -Dhadoop.tmp.dir=/mnt/ephemeral-hdfs" export SPARK_JAVA_OPTS

출처 : https://stackoverflow.com/questions/30162845/spark-java-io-ioexception-no-space-left-on-device

뭐 스파크 작업중 중간 과정에서 셔플링 데이터가 쌓이는데 해당 공간이 부족해서 그렇다며 스파크 디렉토리 데이터를 수정해주면 된다는~!

하지만 난 그냥 뭔가 spark-submit시 설정만 변경하기로 하고 설정을 다시 봤더니 executor-cores와 num-executor를 설정해주지 않은 걸 보고 default로 잡혀서 너무 작게 잡혀서 그런가 하고 추가로 넣어서 다시 spark-submit결과 정상동작하였다.

Spark-submit 옵션은 다음과 같았다.

num-executors와 executor-cores가 추가되었다.

spark-submit \
    --name "hanhwa-job" \
    --master yarn \
    --driver-memory 2g \
    --num-executors 10 \
    --executor-cores 12 \
    --executor-memory 4g \
    --class spark.datatech.MainHanhwa \
    /home1/irteam/dbkim/hanhwa/spark-gradle-1.6.x-0.1.jar

문제는 이렇게 해결되긴 했지만 spark설정들의 default값을 좀 더 알아보고 클러스터, 서비스에 맞는 적절한 설정을 할 수 있도록 더 알아보고 학습해야 겠다.

저작자표시 비영리

'Bigdata > Spark' 카테고리의 다른 글

[ Spark ] binary형태 데이터 읽기 (0)	2019.07.24
[ Spark ] sort와 order by 차이점??? (0)	2019.07.16
[ Spark ] 스파크 특정 노드에서 기존에 발생하지 않던 이슈가 발생했다면? (0)	2019.03.27
[ Spark ] 리눅스 서버에서 스파크(SPARK)버전 확인하기 (0)	2019.03.13
[ Spark ] 스파크 넓은 종속성(narrow dependency) vs 좁은 종속성(wide dependency) (0)	2018.12.03

행복한디벨로퍼

* WEB developer

* Data engineer

* Server backend

> NHN 2014.07 ~ 2021.07

> TOSS 2021.08 ~

운동하는개발자

개발자 관련 모든 강연 관심있어요

ex) 동기부여, 개발 경험담 등

📩 kim3zz@naver.com