Tomaz Kastrun has been busy. On day 9, we build a custom environment:
Microsoft Fabric provides you with the capability to create a new environment, where you can select different Spark runtimes, configure your compute resources, and create a list of Python libraries (public or custom; from Conda or PyPI) to be installed. Custom environments behave the same way as any other environment and can be used and attached to your notebook or used on a workspace. Custom environments can also be attached to Spark job definitions.
An Apache Spark job definition is a single computational action, that is normally scheduled and triggered. In Microsoft Fabric (same as in Synapse), you could submit batch/streaming jobs to Spark clusters.
By uploading a binary file, or libraries in any of the languages (Java / Scala, R, Python), you can run any kind of logic (transformation, cleaning, ingest, ingress, …) to the data that is hosted and server to your lakehouse.
We have looked into creating the lakehouse, checked the delta lake and delta tables, got some data into the lakehouse, and created a custom environment and Spark job definition. And now we need to see, how to start working with the data.
We have started working with the data and now, we would like to create and submit the experiment. In this case, MLFlow will be used here.
Create a new experiment and give it a name. I have named my “Advent2023_Experiment_v3”.
Click through to catch up with Tomaz.