apache spark - How to create an empty DataFrame? Why "ValueError: RDD is empty"? -


i trying create empty dataframe in spark (pyspark).

i using similar approach 1 discussed here enter link description here, not working.

this code

df = sqlcontext.createdataframe(sc.emptyrdd(), schema) 

this error

traceback (most recent call last): file "<stdin>", line 1, in <module> file "/users/me/desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 404, in createdataframe rdd, schema = self._createfromrdd(data, schema, samplingratio) file "/users/me/desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 285, in _createfromrdd struct = self._inferschema(rdd, samplingratio) file "/users/me/desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 229, in _inferschema first = rdd.first() file "/users/me/desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/rdd.py", line 1320, in first raise valueerror("rdd empty") valueerror: rdd empty 

extending joe widen's answer, can create schema no fields so:

schema = structtype([]) 

so when create dataframe using schema, you'll end dataframe[].

>>> empty = sqlcontext.createdataframe(sc.emptyrdd(), schema) dataframe[] >>> empty.schema structtype(list()) 

in scala, if choose use sqlcontext.emptydataframe , check out schema, return structtype().

scala> val empty = sqlcontext.emptydataframe empty: org.apache.spark.sql.dataframe = []  scala> empty.schema res2: org.apache.spark.sql.types.structtype = structtype()     

Comments

Popular posts from this blog

how to insert data php javascript mysql with multiple array session 2 -

multithreading - Exception in Application constructor -

windows - CertCreateCertificateContext returns CRYPT_E_ASN1_BADTAG / 8009310b -