Burak Yavuz, et al, explain how the transaction log works with Delta Tables in Apache Spark:
When a user creates a Delta Lake table, that table’s transaction log is automatically created in the
_delta_log
subdirectory. As he or she makes changes to that table, those changes are recorded as ordered, atomic commits in the transaction log. Each commit is written out as a JSON file, starting with000000.json
. Additional changes to the table generate subsequent JSON files in ascending numerical order so that the next commit is written out as000001.json
, the following as000002.json
, and so on.
It’s interesting that they chose JSON instead of a binary transaction log like relational databases use.
Comments closed