Piotr Starczynski shows us how we can read data from a table, transform it, and write it back to the same file:
Recently I have reached interesting problem in Databricks Non delta. I tried to read data from the the table (table on the top of file) slightly transform it and write it back to the same location that i have been reading from. Attempt to execute code like that would manifest with exception:“org.apache.spark.sql.AnalysisException: Cannot insert overwrite into table that is also being read from”.
The lets try to answer the question How to write into a table(dataframe) that we are reading from as this might be a common use case?
This problem is trivial but it is very confusing, if we do not understand how queries are processed in spark.
Click through for the answer. I’m a little squeamish about doing this because my expectation is for data to flow from one source to another source; feeding the data back to the initial source feels strange, like running a load of clothes through the washer and dryer and then dumping them back into the hamper with the remainder of the dirty clothes.