Press "Enter" to skip to content

Day: September 9, 2019

When to Use Different ML Algorithms

Stefan Franczuk explains the different categories of machine learning algorithms available in Talend:

Clustering is the task of grouping together a set of objects in such a way, that objects in the same group are more similar to each other than to those in other groups. Clustering is really useful for identify separate groups and therefore is used to solve use cases such as “who are my premium customers?”.

Understanding when to use which algorithm is important. You don’t want to build out the world’s best regression if your benefactors are asking for a classifier.

Comments closed

Databricks versus Mapping Data Flows

Helge Rege Gardsvoll contrasts Azure Databricks, Azure Data Factory Mapping Data Flows, and SQL Server Integration Services:

Mapping Data Flows
One of the many data flows from Microsoft these days providing, for the first time, data transformation capabilities within Data Factory. This is not a U-SQL script or Databricks notebook that is orchestrated from Data Factory, but a tool integrated. This means that you can reuse (many of) the datasets you have defined in Data Factory, while in Databricks you don’t.

Mapping Data Flows runs on top of Databricks, but the cluster is handled for you and you don’t have to write any of that Scala code yourself.

Read on for the full comparison.

Comments closed

Develop BDC PySpark Jobs in Visual Studio Code

Jenny Jiang announces a new capability in Visual Studio Code:

With the Visual Studio Code extension, you can enjoy native Python programming experiences such as linting, debugging support, language service, and so on. You can run current linerun selected lines of code, or run all for your PY file. You can import and export a .ipynb notebook and perform a notebook like query including Run Cell, Run Above, or Run Below. You can also enjoy a notebook like interactive experience that includes your source code and markdown comments along with the running results and output. You can remove the unneeded sections, enter comments, or type additional code in the interactive results window. Moreover, you can visualize your results in a graphic format through a matplotlib like Jupyter Notebook. The integration with SQL Server 2019 Big Data Clusters empowers you to quickly submit a PySpark batch job to the big data cluster and monitor job progress.

This is rather useful for developers, though I greatly prefer the Azure Data Studio notebook interface.

Comments closed

Determining Instant File Initialization Status

Dave Mason gives us a couple of methods for determining whether we turned Instant File Initialization on:

Here’s a little tidbit I wanted to share regarding the Perform Volume Maintenance Tasks security setting. In the SQL Server world, this is often referred to as IFI. On more recent versions of SQL (SQL 2012 SP4 or later, I believe), you can verify if IFI is enabled or not for the database engine logon account by checking the error log.

That’s one, but click through for the technique you can easily script out.

Comments closed

Goodbye, Powershell 5.1 Ad

Chrissy LeMaire has a Powershell ad blocker:

I really abhor the new ad in the PowerShell 5.1 console and it seems there’s no hope of Microsoft making it go away.

After a long, involved Twitter conversation with the community and the PowerShell team that confirmed it’s impossible for the advertisement (?!) to be easily removed, it looks like the only solution is to bypass it. Przemysław Kłys has a great suggestion to emulate the old prompt that totally works!

Click through for that solution.

Comments closed

Reporting Services and SPNs

Greg Dodd shares a couple tips on creating SPNs for SQL Server Reporting Services:

Reporting Services often requires an SPN assigned to the account running the Reporting Services Service. You’ll know that you need to set this up when you try connecting to your Reporting Services instance from within the same domain and you are prompted for credentials. If SPN’s are setup correctly then your browser will work out the authentication for you and your users won’t need to login again.

Read on for an example, but also a pitfall and how to avoid it.

Comments closed

Intellisense and the DAC

Slava Murygin doesn’t like severity 20 errors just popping up for no good reason:

Yesterday I’ve needed to use Dedicated Administrator Connection (DAC) once in a while, and because I have all kinds of notifications in my system, I immediately got an “Severity 20” alert.

As you probably know, Severity 20 Errors “Indicate system problems and are fatal errors” (See books online: https://docs.microsoft.com/en-us/sql/relational-databases/errors-events/database-engine-error-severities?view=sql-server-2017)

Even though “Severity 20” does not indicate any problems with data and belong only to a user process it is still worth to investigate the problem.

Read on to see the cause of Slava’s problem and how there’s no way to fix it in SSMS.

Comments closed