Azure Data Factory v2 And Decompression

Kevin Feasel

2018-04-23

Bugs, Cloud, ETL

Ben Jarvis notes a file naming bug with Azure Data Factory v2 when decompressing files:

ADF V2 natively supports decompression of files as documented at https://docs.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs#compression-support. With this functionality ADF should change the extension of the file when it is decompressed so 1234_567.csv.gz would become 1234_567.csv however, I’ve noticed that this doesn’t happen in all cases.

In our particular case the file names and extensions of the source files are all uppercase and when ADF uploads them it doesn’t alter the file extension e.g. if I upload 1234_567.CSV.GZ I get 1234_567.CSV.GZ in blob storage rather than 1234_567.CSV.

Click through for more details and be sure to vote on his Azure Feedback bug if this affects you.

Related Posts

The Zen Of Airflow

Bas Harenslak shows how you can think of The Zen of Python as it applies to Apache Airflow: Apache Airflow is a Python framework for programmatically creating workflows in DAGs, e.g. ETL processes, generating reports, and retraining models on a daily basis. This allows for concise and flexible scripts but can also be the downside of […]

Read More

Azure Data Lake Store Gen2

James Serra gives us the low-down on Azure Data Lake Store Gen2 now that it is generally available: When to use Blob vs ADLS Gen2New analytics projects should use ADLS Gen2, and current Blob storage should be converted to ADLS Gen2, unless these are non-analytical use cases that only need object storage rather than hierarchical storage […]

Read More

Categories

April 2018
MTWTFSS
« Mar May »
 1
2345678
9101112131415
16171819202122
23242526272829
30