apache spark - How to create an empty DataFrame? Why "ValueError: RDD is empty"? -
i trying create empty dataframe in spark (pyspark).
i using similar approach 1 discussed here enter link description here, not working.
this code
df = sqlcontext.createdataframe(sc.emptyrdd(), schema) this error
traceback (most recent call last): file "<stdin>", line 1, in <module> file "/users/me/desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 404, in createdataframe rdd, schema = self._createfromrdd(data, schema, samplingratio) file "/users/me/desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 285, in _createfromrdd struct = self._inferschema(rdd, samplingratio) file "/users/me/desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 229, in _inferschema first = rdd.first() file "/users/me/desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/rdd.py", line 1320, in first raise valueerror("rdd empty") valueerror: rdd empty
extending joe widen's answer, can create schema no fields so:
schema = structtype([]) so when create dataframe using schema, you'll end dataframe[].
>>> empty = sqlcontext.createdataframe(sc.emptyrdd(), schema) dataframe[] >>> empty.schema structtype(list()) in scala, if choose use sqlcontext.emptydataframe , check out schema, return structtype().
scala> val empty = sqlcontext.emptydataframe empty: org.apache.spark.sql.dataframe = [] scala> empty.schema res2: org.apache.spark.sql.types.structtype = structtype()
Comments
Post a Comment