Press "Enter" to skip to content

Day: March 16, 2021

k-gram Language Models in R

Valerio Gherardi takes us through the concept of k-grams:

The post is structured as follows: we start by giving a succinct theoretical introduction to kk-gram models. Subsequently, we illustrate how to train a kk-gram model in R using kgrams, and explain how to use the standard perplexity metric for model evaluation or tuning. Finally, we use our trained model to generate some random text at different temperatures.

This goes into some depth on the topic and is worth giving a careful read.

Comments closed

Fuzzy Joins in SQL Server using R

Rajendra Gupta shows how you can use R in SQL Server Machine Learning Services to perform fuzzy joins:

Suppose you have a web page where users right comments in the text box. You are performing data analysis. However, there are few spelling mistakes, and you want to perform the approximate match or fuzzy lookup in another dataset. Similarly, you have a product catalog database. Your users search for a product; however, they might not type the exact keyword for the product name. Using the fuzzy joins, we can return the user the products with an approximate match to the product names.

SQL Server Machine Learning using R scripts enables you to execute the R language queries inside the SQL Server. In the previous articles, we explored a few use-cases of the machine learning language. In the previous articles, we explored the R scripts for the below topics.

It’s R, so there’s already a package in CRAN for that.

Comments closed

Why Have Multiple Edge Constraints in SQL Graph?

Louis Davidson has an explanation for us:

Edge constraints were added in SQL Server 2019 to make the node to edge relationship stricter/enforced, and more like typical foreign key constraints. When used, they define what node types can be used in the from and to position of the edge. What makes edges different than a many-to-many relationship in a relational table is that an edge can implement more than one many-to-many relationship in a single table. To constrain the types of data that can be put into the edge, you can use an edge constraint.

Edge constraints are very similar to implementing foreign key constraints, but there are a few key differences. Foreign keys are between two tables. Edges are between one edge table, and multiple pairs of node tables. In both cases, you can have multiple constraints, even from the same table to the same related table on the same column. However, with edge constraints, because you can have multiple pairs of expressions, and even multiple constraints, it bears discussion. If you have more than one constraint, it has one big negative, but it is allowed to implement one big positive!

Click through for the explanation, as well as an example.

Comments closed

Thinking About Power BI Premium Per User

Martin Schoombee has an analysis for us:

When Power BI Premium was released in June 2017, many of us felt left out as the jump from Pro to Premium was just too high…unless you were an Enterprise-level customer that could afford the ~$5,000/month price tag. The economics just didn’t make sense for most people, and the size limitations of Power BI Pro was still a concern to companies who wanted to jump on board.

At Microsoft Ignite this year, we finally got the answer we were waiting for. The pricing for Power BI PPU (Premium Per User) was announced and will be generally available in April 2021. With the price set at $20/user per month, which is better than I expected, I think the landscape is going to change in a significant way. Let me explain…

I’m tied in reasonably well to the Power BI blogging space and I’m a bit shocked that Power BI Premium Per User hasn’t been the recipient of a lot more excitement. I typically live on the budget side of things and disliked the $5K entry point for Premium. At $20 per user, I’m quite happy. Martin does point out a third scenario, however: using Azure Analysis Services plus Power BI Professional. But even in his chart, I think I’m still picking Premium Per User over AAS + Pro due to the additional functionality at approximately the same price.

Comments closed

Using the tree Command

Denis Gobo learns a new trick:

I was watching a Pluralsight course and the person typed in the tree command.. and I was like whoaaaa.. How do I not know this?  Perhaps maybe because I don’t use the command window all that much?  Anyway I thought that this was pretty cool

As you can see tree list all the directories and sub directories in a tree like structure. This is great to quickly see all the directories in one shot

It’s a useful command. And if you’re on Linux, there are a lot of useful switches. If you’re on Windows, there are fewer useful switches.

Comments closed

Deploying Azure Data Services via Terraform

Chris Adkin has two additional parts of a series. Part 3 shows us how to deploy a virtual machine on VMware:

To do this you require an Ubuntu virtual machine, I’ve tested this with Ubuntu 18.04 LTS and I will get around to testing it with Ubuntu 20.10 at some stage. If for example the virtual machine was created with a user called azuser, the deployment server should also have an azuser account under which all Terraform commands are executed. 

Part 4 takes those VMs and set up a Kubernetes cluster across them:

Whatever you do when deploying a Kubernetes cluster, somewhere along the line you have to use kubeadm. There is a wealth of material available on blog posts and on the internet in general in which people roll there own scripts using kubeadm. I often suspect that many of these efforts are the result of Kelsey Hightower’s: Kubernetes the hard wayIn this post we are emphatically going to do things the easy way, […]

And now we’re caught up on the series…for the moment, at least.

Comments closed

Creating Parquet Files from SQL Server Data

Andy Leonard answers a challenge:

I searched and found some promising Parquet SSIS components available from CData Software and passed that information along. I shared my inexperience in exporting to parquet format and asked a few friends how they’d done it.

I thought: How many times have I demonstrated Azure Data Factory and clicked right past file format selection without giving Parquet a second thought? Too many times. It was time to change that.

Another route is to use PolyBase. If you’re okay with writing the results to Azure Blob Storage, you can insert directly into Parquet files the results of a SQL query. If that sounds interesting, here are posts on connecting to Azure Blob Storage via PolyBase and inserting into Azure Blob Storage. I insert in CSV format to make it easier for people to follow, but swap the file format with Parquet and it works all the same.

Comments closed

T-SQL Tuesday 136 Wrap-Up

Brent Ozar rounds up the usual suspects, plus several more:

For this month’s T-SQL Tuesday, I asked you to blog about your most-loved and least-loved data types.

Crazy, right? How could people possibly love or hate data types? Well, if you’ve been working with them for a while, I figured you’d have built up an array of tips or pain points, and y’all delivered with 29 interesting blog posts.

Click through for a lengthy list of interesting posts.

Comments closed