Press "Enter" to skip to content

Author: Kevin Feasel

Power BI: Comparing Web.Contents and File.Contents Performance

Chris Webb compares the performance of File.Contents and Web.Contents:

In my last post I mentioned the Power Query engine’s persistent cache, which in some scenarios caches the data read from a data source when a query is refreshed. Another important nugget of information that Ehren von Lehe of the Power Query dev team mentioned in a post on the Power Query MSDN forum recently is the fact that if you use File.Contents to get data from a file then the persistent cache is not used, but if you use Web.Contents to get data from the same file then the persistent cache is used. I guess the thinking here is that there is no point creating an on-disk cache containing the contents of a file that is already on disk.

Chris takes us through a couple of unexpected twists, so check it out.

Comments closed

Getting the Last Actual Execution Plan for a Query

Pedro Lopes takes us through a new dynamic management function in SQL Server 2019 CTP 2.4:

In CTP 2.4 not all actual execution plans will be available, you can see more details on that here.

For an upcoming CTP version, all queries will be available with the equivalent of the actual execution plan. At least those where the plan was cached in the first place, or those where the plan has not been evicted from cache.

That caveat aside, I’m happy with this.

Comments closed

Finding Dependency Clusters

Michael J. Swart performs cluster analysis with tables:

That ball of mush in the middle is hard to look at, but the smaller disconnected bits aren’t! Just like Ben, I want to work on those smaller pieces too! And just like the lonely tables we looked at last week, these small isolated components are also good candidates for extracting from SQL Server.

The script looks at joins in execution plans, which is a rather clever way of doing this when you don’t have a comprehensive set of foreign key constraints.

Comments closed

Deleting In Batches

Andy Mallon explains why deleting in batches is a good idea:

Yesterday, I wrote about the importance of planning your data purges. Today, let’s look at the simplest requirement for purging data: Keep data for X days after it’s created. There are tons of cases where this is the rule. Log data, telemetry data, even transactional data is often retained based on X days since it is created.

Given how common this scenario is, let’s talk in a bit more detail about I like to go about deleting it. Deleting data is easy, right? DELETE t FROM dbo.Transactions AS t WHERE CreateDate <= DATEADD(dd,-90,GETDATE());

This part of the series covers the why; the next part will cover the how.

Comments closed

Python Natural Language Processing Tools

Sandeep Aspari takes us through some of the tooling available in Python around Natural Language Processing:

TextBlob
TextBlob is a python library tool and extension of NLTK. It provides a simple API approach to its methods and executes a large number of NLTK functions, and it also includes the pattern library functionality. You are just at the beginning, this might be an excellent tool to learning, and we can use it in applications production those don’t require heavy performant. TextBlob libraries are similar to python strings, so we can quickly transform and play similarly we performed in python. Finally, TextBlob is used in everywhere, and it is best suitable for smaller projects.

There are several tools from which you can choose. Sandeep also gives us some Node- and Java-based tools as well.

Comments closed

Developing Big Data Cluster Spark Jobs with IntelliJ

Jenny Jiang shows how we can use IntelliJ IDEA to develop Spark jobs against SQL Server Big Data Clusters:

We’re delighted to release the Azure Toolkit for IntelliJ support for SQL Server Big Data Cluster Spark job development and submission. For first-time Spark developers, it can often be hard to get started and build their first application, with long and tedious development cycles in the integrated development environment (IDE). This toolkit empowers new users to get started with Spark in just a few minutes. Experienced Spark developers also find it faster and easier to iterate their development cycle.

The toolkit extends IntelliJ support for the Spark job life cycle starting from creation, authoring, and debugging, through submission of jobs to SQL Server Big Data Clusters. It enables you to enjoy a native Scala and Java Spark application development experience and quickly start a project using built-in templates and sample code. The integration with SQL Server Big Data Cluster empowers you to quickly submit a job to the big data cluster as well as monitor its progress. The Spark console allows you to check schemas, preview data, and validate your code logic in a shell-like environment while you can develop Spark batch jobs within the same toolkit.

It looks pretty good from my vantage point.

Comments closed

Quick Data Migration With Powershell

Emanuele Meazzo shows how you can use dbatools to perform a quick table-by-table data migration using Powershell:

I’m using the sqlserver and dbatools Powershell modules to accomplish such a tedious task in the fastest way possibile.

The Write-DbaDbTableData cmdlet is pretty neat because it can create automatically the destination table if it doesn’t exists, truncate the table if it exists (or append, your choice), keep the identity values and nulls if necessary and everything is done via a bulk insert with a configurable batch size.

Click through for the script. It’s not a replacement for a real ETL process but if you just need something fast, it will do the job.

Comments closed

Using Temp Stored Procedures

Bert Wagner shows how you can perform testing with temporary stored procedures:

A while back I learned that it’s possible to create temporary stored procedures in SQL Server.

I never put that knowledge into practice however because I struggled to think of a good use case for when a temporary stored procedure would be preferable to a permanent stored procedure.

Not long ago I encountered a scenario where using a temporary stored procedure was the perfect solution to my problem.

Those scenarios are rare but Bert did hit one of them.

Comments closed

SQL Server Isolation Levels

Pamela M. takes us through a humorous journey covering transaction isolation levels in SQL Server:

READ COMMITTED:  One step up (and the default for SQL Server).    A query in the current transaction can’t read data modified by someone else that hasn’t yet committed.  No dirty reads.  BUT….data could be changed by others between statements in the current transaction, so the data may not look the same twice.  READ COMMITTED uses shared locks to prevent dirty reads, but that’s about all you get.  You still get non-repeatable reads and phantom reads here (more on phantom reads below).

Click through for the full list.

Comments closed