More than 5 years have passed since last update.

pyspark write csv turn off scientific notation

Posted at 2019-10-10

When I write spark DataFrame to CSV file in databricks,
I found the number column show like 2.6220427383E10,
the scientific notation format.

I searched in stackoverflow and found a lot of solutions,
such as DecimalType(18, 0) etc.

But in my case,
I found the reason is that the column is divided by 100,
like

(df_sale['TOTAL_PRICE'].cast('integer')/100).alias('TOTAL_PRICE')

Although when I display(df_sale) it shows like an integer value,
But when I write it to csv file, it become to 2.6220427383E10.

So the solution is cast('integer') one more time like this:

(df_sale['TOTAL_PRICE'].cast('integer')/100).cast('integer').alias('TOTAL_PRICE')

And then, everything is going well.