Is using function in transformation causing Not Serializable exceptions?
Is using function in transformation causing Not Serializable exceptions?
I have a Breeze DenseMatrix, i find mean per row and mean of squares per row and put them in another DenseMatrix, one per column. But i get Task Not Serializable exception. I know that sc is not Serializable but i think that the exception is because i call functions in a transformation in Safe Zones.
Breeze DenseMatrix
mean
mean
DenseMatrix
Task Not Serializable
sc
Serializable
Am i right? And how could be a possible way to be done without any functions? Any help would be great!
Code:
object MotitorDetection {
case class MonDetect() extends Serializable {
var sc: SparkContext = _
var machines: Int=0
var counters: Int=0
var GlobalVec= BDM.zeros[Double](counters, 2)
def findMean(a: BDM[Double]): BDV[Double] = {
var c = mean(a(*, ::))
c}
def toMatrix(x: BDV[Double], y: BDV[Double], C: Int): BDM[Double]={
val m = BDM.zeros[Double](C,2)
m(::, 0) := x
m(::, 1) := y
m}
def SafeZones(stream: DStream[(Int, BDM[Double])]){
stream.foreachRDD { (rdd: RDD[(Int, BDM[Double])], _) =>
if (isEmpty(rdd) == false) {
val InputVec = rdd.map(x=> (x._1, toMatrix(findMean(x._2), findMean(pow(x._2, 2)), counters)))
GlobalMeanVector(InputVec)
}}}
Exception:
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2287)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:370)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:369)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.map(RDD.scala:369)
at ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1.apply(MotitorDetection.scala:85)
at ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1.apply(MotitorDetection.scala:82)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Caused by: java.io.NotSerializableException: org.apache.spark.SparkContext Serialization stack:
- object not serializable (class: org.apache.spark.SparkContext, value: org.apache.spark.SparkContext@6eee7027)
- field (class: ScalaApps.MotitorDetection$MonDetect, name: sc, type: class org.apache.spark.SparkContext)
- object (class ScalaApps.MotitorDetection$MonDetect, MonDetect())
- field (class: ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1, name: $outer, type: class ScalaApps.MotitorDetection$MonDetect)
- object (class ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1, <function2>)
- field (class: ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1$$anonfun$2, name: $outer, type: class ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1)
- object (class ScalaApps.MotitorDetection$MonDetect$$anonfun$SafeZones$1$$anonfun$2, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
... 28 more
1 Answer
1
The findMean method is a method of the object MotitorDetection. The object MotitorDetection has a SparkContext on-board, which is not serializable. Thus, the task used in rdd.map is not serializable.
findMean
MotitorDetection
MotitorDetection
SparkContext
rdd.map
Move all the matrix-related functions into a separate serializable object, MatrixUtils, say:
MatrixUtils
object MatrixUtils {
def findMean(a: BDM[Double]): BDV[Double] = {
var c = mean(a(*, ::))
c
}
def toMatrix(x: BDV[Double], y: BDV[Double], C: Int): BDM[Double]={
val m = BDM.zeros[Double](C,2)
m(::, 0) := x
m(::, 1) := y
m
}
...
}
and then use only those methods from rdd.map(...):
rdd.map(...)
object MotitorDetection {
val sc = ...
def SafeZones(stream: DStream[(Int, BDM[Double])]){
import MatrixUtils._
... = rdd.map( ... )
}
}
@mkey You also got rid of
counters, and you made sure that all those objects are not enclosed in some (synthetically generated) object (repl?)?.– Andrey Tyukin
Jun 29 at 23:15
counters
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

It didn't work but i am wondering if doing the same computations with just transformations would result in an exception too?
– mkey
Jun 29 at 22:18