Press "Enter" to skip to content

Curated SQL Posts

Import Files From Sharepoint Into Power Query

Imke Feldmann solves a problem:

When you use the UI to import files from SharePoint, you’ll end up with the Sharepoint.Files function. This function can become fairly or super slow when you use it on large SharePoint sites. This is due to the fact, that it will retrieve metadata for ALL files that lie on the site. Meaning: The root site whose URL you have to enter as the function argument. So I’ve developed a better way for File import from SharePoint.

Click through for the solution and how to use it. Imke reports 2X query performance when reading Sharepoint data, so it’s worth checking out.

Comments closed

Using the READPAST Query Hint

Rajendra Gupta looks at a lesser-used query hint:

If we specify the READPAST hint in the SQL queries, the database engine ignores the rows locked by other transactions while reading data. Suppose you have a transaction that blocked a few rows in a table for updating the information in those rows. Now, if another user starts a transaction and specifies the READPAST query hint, the query engine ignores these rows and returns the remaining rows satisfying the data requirement of the query. It might return incorrect data as well.

There are some very limited uses for this hint, though they are out there.

Comments closed

Linked Server Data Does Not Match Expected Data Length

Jack Vamvas looks at a linked server error:

I’ve found an issue we have with BI Connector over MongoDB. I’m extracting data from MongoDB to SQL Server – using the MongoDB ODBC Data Source Configuration and Linked Server. Basically the BI Connector fails when the string in the column gets too long.  When running the job to bring in the data to the production workflow data into a  Datawarehouse application we get the following error.

Msg 7347, Level 16, State 1, Line 1

OLE DB provider ‘MSDASQL’ for linked server ‘MongoDB_PROD_mydb’ returned data that does not match expected data length for column ‘[MSDASQL].mywork.mytasks.mytaskInfo’. The (maximum) expected data length is 5332, while the returned data length is 5970.

Completion time: 2020-11-05T17:14:10.2206504+00:00

Read on for one solution.

Comments closed

Target Groups in Elastic Jobs

Reitse Eskens shares some more information about elastic jobs in Azure:

In one of my previous blogs, I wrote about how to create an elastic job agent when you need the SQL Agent functionality on Azure. You can read that one here.

This morning, I needed a job to update the stats on a database, but on just one database within the “instance” on Azure. But my first group contained all the databases, and the Ola Hallengren script isn’t available on all databases and the credential I’m using to execute the jobs doesn’t have access to all the databases.

Read on to learn how Reitse solved the problem.

Comments closed

Query Folding in Direct Query Plus Import Scenarios

Marc Lelijveld violates Betteridge’s Law of Headlines:

Recently, I run into a case where we wanted to combine Direct Query sources with imported data. Well, this functionality is actually called composite models nowadays.

Having that said, let me start with managing some expectations for this post. The title says composite models, but I have to admit, it is not about the to be released composite models feature where you can combine multiple Power BI datasets. Instead of that, it is about combining Direct Query and Import mode in one single dataset.

In this blogpost I will describe more about my use case, why I considered to use some sources on Direct Query, how this combines with Row Level Security and Query Folding.

Read on for the test.

Comments closed

The Evolving Lakehouse

Simon Whiteley looks at the current status of the Lakehouse model:

We have discussed in the past this idea of the lakehouse, the aspirational target of many analytics platforms these days of combining the huge power and potential of data lakes with the rigour, reliability and concurrency of a data warehouse. It’s an interesting concept but has, in the past, been firmly an aspiration.

In the world without lakehouses, we often see the “Modern Data Warehouse”, this two-phased approach to providing a holistic platform – we load our early data into a lake where we shape it and massage it into an understandable state. It is here we perform data science, exploratory data analysis, early sight analytics prototyping and various other functions that don’t quite fit into a data warehouse… but then we load our data into a relational store for serving to the business. This is where we can meet their demands for a rich SQL environment, auditable data models and rigorous change procedures. Essentially, we store data twice so that we can achieve the best of both worlds.

Definitely read Simon’s take on it. My take is that the Lakehouse concept will start to be useful to specific companies in about 2-3 years, as I don’t think the performance is there today.

Comments closed

Using Linked Servers to Scan SQL Server Instances

Kenneth Fisher has a plan:

I recently did a post on testing a linked server where I said I would explain why I wanted to make the test. Basically I needed to scan a few hundred instance names and do the following

– Check if the instance is one we have access to or even exists? If not make a note of the error so we can tell the difference.
– Collect information like instance size (total size of all databases), CPU count, memory count etc.
– Collect a list of database names on the instance, their status, size, etc.

I’m of mixed feelings about this. On the one side, I appreciate that it’s pretty simple and does what is promised: a quick and dirty method for reaching a SQL Server instance (or something which might be one). On the other hand, it feels like this is trying to be the combination of two or three things which do the job better:

  • Central Management Server, for managing the SQL Server instances available to you. This is where you’d put confirmed instances and start running checks like Kenneth wants.
  • Nmap, for determining whether there are servers which are not part of your CMS listing. Nmap can check for availability on port 1433 and see if that’s a SQL Server instance on there, as well as hitting the SQL Browser service for instances not on 1433 (and assuming you leave the Browser on).
  • A Powershell script to combine these together, or at least a script in some language which does a decent job with looping. Even better if it offers parallelism.
Comments closed

Check if an ADF Pipeline is Already Running

Paul Andrew has a scenario for us:

Scenario: I want to trigger a Data Factory pipeline, but when I do I want the pipeline to know if it’s already running. If it is already running, stop the new run.

Sounds simple enough right?

Wrong!

But, now simple for you, because I’ve done it for you, yay! 🙂

I thought it was simple, but it wasn’t simple, but now it’s simple, but is it really simple? Click through to find out.

Comments closed

Azure Data Studio November 2020 Release

Alan Yu announces the November 2020 release of Azure Data Studio:

Another feature request was to provide support for parameters in a notebook. Parameterization is the ability to execute the same notebook with different parameters.

With this release of Azure Data Studio, users will now be able to utilize Papermill’s ability to parameterize, execute, and store notebooks. By stating the parameters cell as the first code cell in your notebook, it ensures that the injected parameters in the outputted parameterized notebook will be placed directly after the original parameters cell. That way the parameterized notebook will utilize the newly injected parameters instead of the original parameters cell.

Users can utilize Papermill CLI as well as the Python API  to pass in a new set of parameters quickly and efficiently as shown below.

That does look interesting.

Comments closed