apache spark - How to create an empty DataFrame? Why "ValueError: RDD is empty"? -

i trying create empty dataframe in spark (pyspark).

i using similar approach 1 discussed here enter link description here, not working.

this code

df = sqlcontext.createdataframe(sc.emptyrdd(), schema)

this error

traceback (most recent call last): file "<stdin>", line 1, in <module> file "/users/me/desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 404, in createdataframe rdd, schema = self._createfromrdd(data, schema, samplingratio) file "/users/me/desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 285, in _createfromrdd struct = self._inferschema(rdd, samplingratio) file "/users/me/desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 229, in _inferschema first = rdd.first() file "/users/me/desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/rdd.py", line 1320, in first raise valueerror("rdd empty") valueerror: rdd empty

extending joe widen's answer, can create schema no fields so:

schema = structtype([])

so when create dataframe using schema, you'll end dataframe[].

>>> empty = sqlcontext.createdataframe(sc.emptyrdd(), schema) dataframe[] >>> empty.schema structtype(list())

in scala, if choose use sqlcontext.emptydataframe , check out schema, return structtype().

scala> val empty = sqlcontext.emptydataframe empty: org.apache.spark.sql.dataframe = []  scala> empty.schema res2: org.apache.spark.sql.types.structtype = structtype()

Search This Blog

If code

apache spark - How to create an empty DataFrame? Why "ValueError: RDD is empty"? -

Comments

Post a Comment