Overwriting Data In Use With Databricks

Piotr Starczynski shows us how we can read data from a table, transform it, and write it back to the same file:

Recently I have reached interesting problem in Databricks Non delta. I tried to read data from the the table (table on the top of file) slightly transform it and write it back to the same location that i have been reading from. Attempt to execute code like that would manifest with exception:“org.apache.spark.sql.AnalysisException: Cannot insert overwrite into table that is also being read from”

The lets try to answer the question How to write into a table(dataframe) that we are reading from as this might be a common use case?

This problem is trivial but it is very confusing, if we do not understand how queries are processed in spark.

Click through for the answer. I’m a little squeamish about doing this because my expectation is for data to flow from one source to another source; feeding the data back to the initial source feels strange, like running a load of clothes through the washer and dryer and then dumping them back into the hamper with the remainder of the dirty clothes.

Related Posts

Flink: Batch as a Special Case of Streaming

Fabian Hueske and Aljoscha Krettek describe streaming versus batch processing in Apache Flink: The Apache Flink project has followed the philosophy of taking a unified approach to batch and stream data processing, building on the core paradigm of “continuous processing of unbounded data streams” for a long time. If you think about it, carrying out […]

Read More

Multi-Tenant Security in Kudu + Impala

Grant Henke shows how you can combine Apache Impala’s fine-grained authorization with Apache Kudu’s coarse-grained authentication for multi-tenant scenarios: Kudu supports coarse-grained authorization of client requests based on the authenticated client Kerberos principal. The two levels of access which can be configured are:1. Superuser – principals authorized as a superuser are able to perform certain administrative […]

Read More

Categories

February 2019
MTWTFSS
« Jan Mar »
 123
45678910
11121314151617
18192021222324
25262728