Press "Enter" to skip to content

Author: Kevin Feasel

Creating Jupyter Books in Azure Data Studio

Drew Skwiers-Koballa takes us through creating and deploying Jupyter Books:

The notebook experience in Azure Data Studio allows users to create and share documents containing live code, execution results, and narrative text. Potential usage includes data cleaning and transformation, statistical modeling, troubleshooting guides, data visualization, and machine learning. Jupyter books compile a collection of notebooks into a richer experience with more structure and a table of contents.  In Azure Data Studio we are able not only to use Jupyter books but also create and share them. Learn the basics of notebooks in Azure Data Studio from the documentation and read on to learn how to leverage a GitHub Action to publish and share remote Jupyter books.

Click through for the process of creating, opening, and distributing Jupyter Books.

Comments closed

Visualizing Analysis Services Tasks with the Job Graph

Chris Webb is excited:

More details about it, and how it can be used, are in the samples here:

https://github.com/microsoft/Analysis-Services/tree/master/ASJobGraphEvents

The data returned by the Job Graph event isn’t intelligible if you look at the text it returns in Profiler. However if you save a .trc file with Job Graph event data to XML you can use the Python scripts in the GitHub repo to generate DGML diagrams that can be viewed in Visual Studio, plus Gantt charts embedded in HTML. Of course to do this you’ll need to have Python installed; you’ll also need to have Visual Studio and its DGML editor installed (see here for details).

Read on to see how it looks and Chris’s thoughts on the matter.

Comments closed

Error Handling in R

Adi Sarid compares a few methods for error handling in R:

Error catching can be hard to catch at times (no pun intended). If you’re not used to error handling, this short post might help you do it elegantly.

There are many posts about error handling in R (and in fact the examples in the purrr package documentation are not bad either). In this sense, this post is not original.

However, I do demonstrate two approaches: both the base-R approach (tryCatch) and the purrr approach (safely and possibly). The post contains a concise summary of the two methods, with a very simple example.

Read the whole thing. H/T R-Bloggers

Comments closed

Import Files From Sharepoint Into Power Query

Imke Feldmann solves a problem:

When you use the UI to import files from SharePoint, you’ll end up with the Sharepoint.Files function. This function can become fairly or super slow when you use it on large SharePoint sites. This is due to the fact, that it will retrieve metadata for ALL files that lie on the site. Meaning: The root site whose URL you have to enter as the function argument. So I’ve developed a better way for File import from SharePoint.

Click through for the solution and how to use it. Imke reports 2X query performance when reading Sharepoint data, so it’s worth checking out.

Comments closed

Using the READPAST Query Hint

Rajendra Gupta looks at a lesser-used query hint:

If we specify the READPAST hint in the SQL queries, the database engine ignores the rows locked by other transactions while reading data. Suppose you have a transaction that blocked a few rows in a table for updating the information in those rows. Now, if another user starts a transaction and specifies the READPAST query hint, the query engine ignores these rows and returns the remaining rows satisfying the data requirement of the query. It might return incorrect data as well.

There are some very limited uses for this hint, though they are out there.

Comments closed

Linked Server Data Does Not Match Expected Data Length

Jack Vamvas looks at a linked server error:

I’ve found an issue we have with BI Connector over MongoDB. I’m extracting data from MongoDB to SQL Server – using the MongoDB ODBC Data Source Configuration and Linked Server. Basically the BI Connector fails when the string in the column gets too long.  When running the job to bring in the data to the production workflow data into a  Datawarehouse application we get the following error.

Msg 7347, Level 16, State 1, Line 1

OLE DB provider ‘MSDASQL’ for linked server ‘MongoDB_PROD_mydb’ returned data that does not match expected data length for column ‘[MSDASQL].mywork.mytasks.mytaskInfo’. The (maximum) expected data length is 5332, while the returned data length is 5970.

Completion time: 2020-11-05T17:14:10.2206504+00:00

Read on for one solution.

Comments closed

Target Groups in Elastic Jobs

Reitse Eskens shares some more information about elastic jobs in Azure:

In one of my previous blogs, I wrote about how to create an elastic job agent when you need the SQL Agent functionality on Azure. You can read that one here.

This morning, I needed a job to update the stats on a database, but on just one database within the “instance” on Azure. But my first group contained all the databases, and the Ola Hallengren script isn’t available on all databases and the credential I’m using to execute the jobs doesn’t have access to all the databases.

Read on to learn how Reitse solved the problem.

Comments closed

Query Folding in Direct Query Plus Import Scenarios

Marc Lelijveld violates Betteridge’s Law of Headlines:

Recently, I run into a case where we wanted to combine Direct Query sources with imported data. Well, this functionality is actually called composite models nowadays.

Having that said, let me start with managing some expectations for this post. The title says composite models, but I have to admit, it is not about the to be released composite models feature where you can combine multiple Power BI datasets. Instead of that, it is about combining Direct Query and Import mode in one single dataset.

In this blogpost I will describe more about my use case, why I considered to use some sources on Direct Query, how this combines with Row Level Security and Query Folding.

Read on for the test.

Comments closed

The Evolving Lakehouse

Simon Whiteley looks at the current status of the Lakehouse model:

We have discussed in the past this idea of the lakehouse, the aspirational target of many analytics platforms these days of combining the huge power and potential of data lakes with the rigour, reliability and concurrency of a data warehouse. It’s an interesting concept but has, in the past, been firmly an aspiration.

In the world without lakehouses, we often see the “Modern Data Warehouse”, this two-phased approach to providing a holistic platform – we load our early data into a lake where we shape it and massage it into an understandable state. It is here we perform data science, exploratory data analysis, early sight analytics prototyping and various other functions that don’t quite fit into a data warehouse… but then we load our data into a relational store for serving to the business. This is where we can meet their demands for a rich SQL environment, auditable data models and rigorous change procedures. Essentially, we store data twice so that we can achieve the best of both worlds.

Definitely read Simon’s take on it. My take is that the Lakehouse concept will start to be useful to specific companies in about 2-3 years, as I don’t think the performance is there today.

Comments closed