Press "Enter" to skip to content

Category: Notebooks

Loading Data from Sharepoint Lists into Microsoft Fabric

Stepan Resl loads some data:

In a time of Fabric, it’s worth pointing out our three options for data ingestion.

  • Data Pipelines with Copy Activity
  • Dataflows Gen 2
  • Notebooks

We must compare them to understand ​​what each can offer us from different perspectives. To be able to compare them thoroughly, there are some guardrails that we need to set so that everything goes the same way.

My biggest takeaway from this is, don’t load important business data into Sharepoint Lists to begin with.

Comments closed

Enabling Python and R Support for VS Code Polyglot Notebooks

Joy George Kunjikkur enables a preview option:

Obviously, we should have Polyglot notebooks up and running. The first step to enable Python preview is that we need to install Jupyter on the machine and make sure the Python kernel spec is available. Run the below command to make sure it is there.

It looks like what the preview is doing is shelling out to Jupyter notebooks, so I’d imagine variables won’t cross over between languages.

Comments closed

Cache Management and Semantic Link in Fabric Notebooks

Marc Lelijveld warms up the cache:

In the previous blog, I wrote about data temperature as part of Fabric when you’re using Direct Lake storage mode. In that blog, I explained how you can get insights in the temperature of a column, what that temperature means and what the impact of the temperature is on columns that are queried more often.

In this blog, I will continue this story by elaborating on a process called framing and how you can influence data eviction to drop data from memory. Finally, this blog goes into more details on how you could use Semantic Link in Fabric Notebooks to warm up the data for most optimal end-user performance.

The SQL Server analog here is having some automated queries which keep specific pages in the buffer pool, like a warm-up script for an instance with plenty of memory but slow disks.

Comments closed

Parameterizing Databricks Notebooks with Widgets

Meagan Longoria adds some widgets:

Widgets provide a way to parameterize notebooks in Databricks. If you need to call the same process for different values, you can create widgets to allow you to pass the variable values into the notebook, making your notebook code more reusable. You can then refer to those values throughout the notebook.

Click through to learn more about the four types of widgets and how they work.

Comments closed

ggplot2 in Python Notebooks

John Mount runs R in Python with rpy2:

For an article on A/B testing that I am preparing, I asked my partner Dr. Nina Zumel if she could do me a favor and write some code to produce the diagrams. She prepared an excellent parameterized diagram generator. However being the author of the book Practical Data Science with R, she built it in R using ggplot2. This would be great, except the A/B testing article is being developed in Python, as it targets programmers familiar with Python.

As the production of the diagrams is not part of the proposed article, I decided to use the rpy2 package to integrate the R diagrams directly into the new worksheet. Alternatively, I could translate her code into Python using one of: Seaborn objectsplotnineggpy, or others. The large number of options is evidence of how influential Leland Wilkinson’s grammar of graphics (gg) is.

Click through to see how you can execute R code within the context of Python, similar to how you can use the reticulate package to execute Python code in the context of R.

Comments closed

Generating Reproducible Reports with Jupyter and Quarto

Parisa Gregg and Myles Mitchell don’t need to copy and paste for their TPS reports:

Quarto is a free-to-use, open-source software based on Pandoc that enables users to convert plain text files into a range of formats, including PDF, HTML and powerpoint presentations. These documents can contain a mixture of narrative text, Python code, and figures that are dynamically generated by the embedded code.

This has many use-cases:

  • Your company may have a weekly board meeting to go over the latest sales figures. By having a Quarto presentation that pulls in the latest company sales data, you can regenerate the presentation slides each week at the click of a button.
  • As a researcher you may be preparing a report for publication. By having the code that generates data tables and figures embedded within the report, regenerating the draft as the experimental data floods in is a breeze!

Read on for a fun example of how you could automated a research-driven report.

Comments closed

An Introduction to R Markdown

Adrian Tam continues a series on R:

One reason people would like to use RStudio for their work is because of the R Markdown. This made the RStudio not only an IDE for programming in R, but also a notepad in which they could put down their thoughts with R code and results. In this post, you will learn how to use R Markdown. Specifically, you will learn

  • What is Markdown
  • How to use Markdown to create a technical document in RStudio

Click through to learn more. I’d also suggest diving into the docs for knitr.

Comments closed

Microsoft Fabric Notebooks and Compute Limits

Reitse Eskens hits a wall:

In this case, my notebook threw an error at me but the command seemed to finish without any issue. Sounds vague? It did to me. The notebookcell I tried to run had a lot of stuff happening at the same time.

As you can see in the above screenshot, the status shows green checkmarks but there’s an error as well. The error message was not really clear to me, but that can really be me lack of deep level experience. So, I logged a call with Microsoft Support and see what they could come up with.

I’ve had enough experience with Spark to see the issue and figure the response, but click through for the screenshot and what Reitse did to resolve the issue.

Comments closed

Logging Notebook Runs in Microsoft Fabric

Reitse Eskens checks the logs:

I reported an issue yesterday with Microsoft Support and during the following call today (they’re really quick to set up an initial meeting), the support engineer showed me where I can find a lot of logging information.
Suppose you’ve got a notebook that has been run a few times. The front-end will only retain the information from the last run. If you see an error, for example this one

Click through to learn where you can find these execution logs.

Comments closed

Parameterizing Jupyter Notebooks

John Mount shows off a feature:

I’d like to share a great new feature in the wvpy package (available at PyPi).

This package is useful in converting Jupiter notebooks to/from python, and also in rendering many parameterized notebooks. The idea is to make Jupyter notebook easier to use in production.

The latest feature is an extension of notebook parameterization. In addition to the init_code and output_suffix features, which allow adding arbitrary code to notebooks and saving multiple renders of the same notebook under different (non-coliding!) names. The new sheet_vars feature allows the insertion of arbitrary data into notebook renders (in addition to the earlier code insertion facility).

Click through for an example on how to use this. Several years ago, I would have considered this to be outstanding. Today, I think it’s cool, but I’ve also gravitated toward using notebooks as an intermediary step rather than a final product, so it’s less critical for me these days.

Comments closed