Press "Enter" to skip to content

Author: Kevin Feasel

Linux SMO In Powershell Core

Max Trinidad wins the technology mix-in competition of the day, using Powershell Core to access SQL Server SMO on a Linux instance:

In my case, I got various systems setup: Windows and Ubuntu 16.04. So, I make sure I download correct *zip or *tar.gz file

As, pre-requisite, you will needed to have already installed *”.NET Core 2.0 Preview 1” for the SQL Service Tools to work and remember this need to be installed in all systems.

Just in case, here’s the link to download “.NET Core 2.0 Preview 1“: https://www.microsoft.com/net/core/preview#windowscmd
https://www.microsoft.com/net/core/preview#linuxubuntu

Now, because we are working with PowerShell Core, don’t forget to install the latest build found at:
https://github.com/PowerShell/PowerShell/releases

Read the whole thing.

Comments closed

Installing Linux And Then SQL Server On Linux

David Alcock has a couple posts covering installation of SQL Server on a brand new Ubuntu VM.  First, David installs Ubuntu:

The system requirements for running SQL Server on Ubuntu 16.04.2 contains the following

Note

You need at least 3.25GB of memory to run SQL Server on Linux. For other system requirements, see System requirements for SQL Server on Linux.
On the create VM window the Memory is currently set to 1024 MB so by clicking the Customize Hardware button I can change the allocated memory to 4GB (4096 MB) as in the screenshot below:

Then, he explains the process of installing SQL Server:

Let’s break it down a little bit. First sudo, which is giving root permissions to a particular command this is as opposed to sudo su which I had to do later on in the install to switch to superuser mode for the session.

Next is apt. Apt is a command line tool which works with the Advanced Packaging Tool and enables to perform installs, updates and removals of software packages. In this case we’re installing curl so we use the install command.

I think Microsoft did a good job of simplifying the installation process on Linux and making it “Linux-y,” with an easy installation and then post-installation configuration.

Comments closed

Rebuilding Full-Text Catalogs

Thomas Rushton ran into an issue with full-text indexing component versions:

Restoring 27 databases; they all restored properly, but 15 of them gave a warning along these lines:

Warning: Wordbreaker, filter, or protocol handler used by catalog ‘FOOBARBAZ’ does not exist on this instance. Use sp_help_fulltext_catalog_components and sp_help_fulltext_system_components check for mismatching components. Rebuild catalog is recommended.

Read on for the solution.

Comments closed

Lamba Architecture Basics

Michael Walker walks through the basics of the Lambda architecture:

Lambda architecture – developed by Nathan Marz – provides a clear set of architecture principles that allows both batch and real-time or stream data processing to work together while building immutability and recomputation into the system. Batch processes high volumes of data where a group of transactions is collected over a period of time. Data is collected, entered, processed and then batch results produced. Batch processing requires separate programs for input, process and output. An example is payroll and billing systems. In contrast, real-time data processing involves a continual input, process and output of data. Data must be processed in a small time period (or near real-time). Customer services and bank ATMs are examples.

Lambda architecture has three (3) layers:

  • Batch Layer

  • Serving Layer

  • Speed Layer

I haven’t heard much about the Lambda and Kappa architectures lately, so when I saw this, I figured it was time for a refresher.

Comments closed

Running DoAzureParallel On The Cheap

David Smith reports an update on the doAzureParallel R package:

At the EARL conference in San Francisco this week, JS Tan from Microsoft gave an update (PDF slides here) on the doAzureParallel package . As we’ve noted here before, this package allows you to easily distribute parallel R computations to an Azure cluster. The package was recently updated to support using automatically-scaling Azure Batch clusters with low-priority nodes, which can be used at a discount of up to 80% compared to the price of regular high-availability VMs.

That lowers the barrier to usage significantly, so it’s a very welcome update.

Comments closed

Permissions On XML Schema Collections

Shane O’Neill diagnoses a permissions issue with XML Schema Collections…or is it?

In my head I’m thinking of all the things that I can do to try and troubleshoot this problem.

  1. Extended Events my session,
  2. Ask my Senior DBA,
  3. Cry

Then I realize that I’m jumping the gun again and I slow down, and check the first error message again. This time without the developers shouting in my ear, about permissions.

This is a great example of why it’s important to troubleshoot using a methodical, logical process.  If you get it stuck in your head that the answer is quite obviously something, you lose a bunch of time if it turns out that it isn’t quite as obvious.

Comments closed

Microsoft ML For Park

Xiaoyong Zhu announces that the Microsoft Machine Learning library is now available for Spark:

We’ve learned a lot by working with customers using SparkML, both internal and external to Microsoft. Customers have found Spark to be a powerful platform for building scalable ML models. However, they struggle with low-level APIs, for example to index strings, assemble feature vectors and coerce data into a layout expected by machine learning algorithms. Microsoft Machine Learning for Apache Spark (MMLSpark) simplifies many of these common tasks for building models in PySpark, making you more productive and letting you focus on the data science.

The library provides simplified consistent APIs for handling different types of data such as text or categoricals. Consider, for example, a DataFrame that contains strings and numeric values from the Adult Census Income dataset, where “income” is the prediction target.

It’s an open source project as well, so that barrier to entry is lowered significantly.

Comments closed

Quantile Regression With Python

Gopi Subramanian discusses one of my favorite regression concepts, heteroskedasticity:

With variance score of 0.43 linear regression did not do a good job overall. When the x values are close to 0, linear regression is giving a good estimate of y, but we near end of x values the predicted y is far way from the actual values and hence becomes completely meaningless.

Here is where Quantile Regression comes to rescue. I have used the python package statsmodels 0.8.0 for Quantile Regression.

Let us begin with finding the regression coefficients for the conditioned median, 0.5 quantile.

The article doesn’t render the code very well at all, but Gopi does have the example code on Github, so you can follow along that way.

Comments closed

Kafka For Pythonistas

Matt Howlett has an introduction to Apache Kafka, designed for Python developers:

In the call to the produce method, both the key and value parameters need to be either a byte-like object (in Python 2.x this includes strings), a Unicode object, or None. In Python 3.x, strings are Unicode and will be converted to a sequence of bytes using the UTF-8 encoding. In Python 2.x, objects of type unicode will be encoded using the default encoding. Often, you will want to serialize objects of a particular type before writing them to Kafka. A common pattern for doing this is to subclass Producer and override the produce method with one that performs the required serialization.

The produce method returns immediately without waiting for confirmation that the message has been successfully produced to Kafka (or otherwise). The flush method blocks until all outstanding produce commands have completed, or the optional timeout (specified as a number of seconds) has been exceeded. You can test to see whether all produce commands have completed by checking the value returned by the flush method: if it is greater than zero, there are still produce commands that have yet to complete. Note that you should typically call flush only at application teardown, not during normal flow of execution, as it will prevent requests from being streamlined in a performant manner.

This is a fairly gentle introduction to the topic if you’re already familiar with Python and have a familiarity with message broker systems.

Comments closed

Decomposing Power BI Desktop Files

Reza Rad wants to see exactly where the M scripts in a Power BI Desktop file are stored:

Talking about Power Query; DataMashup file is all you need. It includes everything from the structure of queries, tables, parameters, list, to the actual M scripts behind the scene. You can Fetch all of these information from this single file. Let’s look at the structure of this file. If you open this file with a text editor. you will see some binary things first (which are related to the zipped nature of this file), and also some XML information. Yes, this is a zipped file. Let’s start with unzipping it into a folder. I’ve done that with 7-zip application.

This is an interesting peek under the covers of a PBIX file.

Comments closed