Press "Enter" to skip to content

Day: August 7, 2019

Options with stats::density() in R

Evgeni Chasnovski takes us through what the parameters in the stats::density() R function do:

Argument bw is responsible for computing bandwith of kernel density estimation: one of the main parameters that greatly affect the output. It can be specified as either algorithm of computation or directly as number. Because actual bandwidth is computed as adjust*bw(adjust is another density() argument, which is explored in the next section), here we will see how different algorithms compute bandwidths, and the effect of changing numeric value of bandwidth will be shown in section about adjust.

There are 5 available algorithms: “nrd0”, “nrd”, “ucv”, “bcv”, “SJ”. 

Evgeni has also created animations for each of these, so it’s easy to see what they do compared to the default output.

Comments closed

SQL Server CTP 3.2 and Java Extensibility

Niels Berglund walks us through what has changed with Java support in ML Services in SQL Server 2019 CTP 3.2:

One of the announcements of what is new in CTP 3.2 was that SQL Server now includes Azul System’sZulu Embedded right out of the box for all scenarios where we use Java in SQL Server, including Java extensibility.

So, in this post, we look at the impact, (if any), this has to how we use the Java extensibility framework in SQL Server 2019.

This also affects PolyBase.

Comments closed

Parsing Rows Manually with Spark .NET

Ed Elliott shows how we can solve a challenging problem when newlines are in the wrong place:

So the first thing we need to do is to read in the whole file in one chunk, if we just do a standard read the file will get broken into rows based on the newline character:

var file = spark.Read().Option("wholeFile", true).Text(@"C:\git\files\newline-as-data.txt");

This solution is a bit complex. As Ed points out, you’re better off reshaping the file before you try to process it. If it’s a structured file like the example Ed has, a regular expression can do the trick.

Comments closed

Clustered Columnstore and Azure SQL DB

Arun Sirpal takes us through online clustered columnstore index creation in Azure SQL Database:

What tier do you need to create one of these things? Let’s see.

CREATE CLUSTERED  COLUMNSTORE INDEX cciSales ON [SalesLT].[ProductModelProductDescription] WITH ( ONLINE = ON )

But I get this message, Msg 40536, Level 16, State 32, Line 1
‘COLUMNSTORE’ is not supported in this service tier of the database. See Books Online for more details on feature support in different service tiers of Windows Azure SQL Database.

Read on to see the minimum tier which allows online creation of clustered columnstore indexes.

Comments closed

Drawing SSIS Packages

Bartosz Ratajczyk continues a quest to draw SSIS packages as SVGs:

To get the Value and Expression properties I need to find the precedence constraint in the .dtsx file during the XSL transformations. It requires three changes in the package2svg.xsl:

– I have to pass the name of the .dtsx file
– I have to read the XML from the .dtsx file
– I have to use the DTS namespace because it’s the namespace of the .dtsx file

Read on for more. Bartosz to this point has covered the control flow.

Comments closed

Tips for Reading Execution Plans

Bert Wagner gives us some tips for reading execution plans in SQL Server:

Execution plans show the steps SQL Server takes to execute your query. Each icon in the graphical execution plan is known as an operator, and the most common way to read a plan is by starting with the top right most operator and following the arrows to the left.

When you reach a join or concatenation operator where multiple branches merge into one operator, you can proceed to the right-most operator of one of the lower branches and start the process of reading right to left again. In general, this can be summed up as reading a plan right to left, top to bottom.

Right-to-left, top-to-bottom gives you the flow of information and that’s quite important. To understand how the engine works, though, you also need to read left-to-right, as Brad Schulz’s outstanding one-act play demonstrates.

Comments closed

Another Look at Cosmos DB Indexing

Hasan Savran revises some indexing recommendations based on changes to Cosmos DB:

Lazy indexing used to be an option. It’s not in any CosmosDB documentation anymore. By using Lazy indexing, you could save 20 to 30 percent for Request Units. Just like anything else in life, you get what you pay for when it comes to Lazy indexing. By selecting Lazy indexing, you are saying that eventually Indexes will be updated. If Indexes are not updated, that means your queries might not return all the data since all data might not be indexed yet. Lazy indexing is still an option, nobody talks about it for a good reason. In my opinion, it should be listed as obsolete feature or it should have a better documentation about how it works or why it might not be a good option for your solutions.

     If you use Lazy Indexing to reduce Request Units in your solution, change it to consistent now unless you have a really good reason!

Read on for more advice in this vein.

Comments closed

Version Control and Power BI Desktop

Gilbert Quevauvilliers takes us through version control with PBIX files:

In the second part of my blog post I am going to detail how to use the version control with Power BI Desktop files.

This will include adding files, checking files in and out, viewing previous versions and reverting to previous versions.

If this is the first time you are reading this blog post, I would highly suggest reading Setting up Version Control for my Power BI Desktop Files (PBIX) with no additional Cost * | Part 1

In short, Gilbert treats PBIX file as any other data file. These can get kind of beefy, though, so I’ve also saved them as templates—that way, you get the structure without pulling in all of the data.

Comments closed