Press "Enter" to skip to content

Author: Kevin Feasel

Learning R Versus Python

Andy Kirk shares the results of a rather informal Twitter poll:

Yesterday I ran a simple Twitter poll about the relative ease of learning R vs. Python. Although a correct answer to this query will ALWAYS have to be based on nuances like pre-existing skills and the scope of need, this originates from people telling me they encounter job or career profiles that list a need for R and/or Python. If they don’t have either, if they prioritised the pursuit of just one, which would be possible to develop a degree of competency more easily, more quickly and more efficiently?

Andy has also created a Twitter moment from the responses.

My thought, based only on the question itself, is that R would be better than Python because the hypothetical person has no additional programming skills. For someone with additional programming skills, the breakdown for me starts with, if your background is statistics, database development, or functional programming, you probably want R; if your background is object-oriented development or imperative programming, you probably want Python. And then it gets nuanced.

Comments closed

Using the ML.NET Model Builder

I have a post looking at the ML.NET Model Builder:

You have four options from which to choose: two-class classification, multi-class classification, regression, or Choose Your Own Adventure. Today, we’re going to create a two-class classification model. Incidentally, they’re not kidding about things changing in preview—last time I looked at this, they didn’t have multi-class classifiers available.

Once you select Sentiment Analysis (that is, two-class classification of text), you can figure out how to feed data to this trainer.

I think this is fine for developers who are looking to add a machine learning component as a small part of a bigger product. I don’t think it will beat a trained human using R or Python, but it’s an interesting avenue.

Comments closed

SQL Server Management Studio 18.1 Now Available

Dinakar Nethi takes us through some changes in SQL Server Management Studio version 18.1:

We’re excited to announce the release of SQL Server Management Studio (SSMS) 18.1. It’s been just over a month since we released SSMS 18.0. While we brought in many fantastic capabilities, we also regressed some functionality for some of our users. We are happy to share that we’ve fixed those and are also bringing in some new features along with bug fixes.

The big thing for a lot of people is that database diagrams have returned. I was never the biggest fan of those, but there was enough of an uproar to bring them back.

Comments closed

Alerting When SQL Agent Jobs Fail

John Shaulis has a script to send out e-mail alerts on SQL Agent job failures:

Here we have a T-SQL statement that we can either create as a stored procedure or just run as T-SQL in a job step. It looks back over the last 10 minutes and looks for job failures. I would recommend scheduling this T-SQL to run every 5 minutes, you will get duplicate entries for a short period of time, but ideally, you shouldn’t get any failures anyway right? Plus once you’re notified, you can turn this off while you work on it or you can specify in the where clause to remove this job until fixed.

Click through for the script and do read the instructions.

1 Comment

DBAs in the Cloud

Brent Ozar argues that production DBAs will still be important even at cloud-only companies:

One of my favorite recent examples was a company who came to me saying, “We’re spending about $2M per year in the cloud just on our databases alone. Can you help us reduce those costs?” Absolutely: with just a couple of days spent query & index tuning, we chopped their biggest database expenses in half while increasing performance.

At the end of that engagement, the CTO told me, “I thought I’d save money in the cloud by not having a DBA, but what I’m learning is that in the cloud, I actually get a return on my DBA investments.

I completely agree with this post. The exact tools DBAs use will change, but the role will still be around decades from now. And that’s at the companies which move quickly.

Comments closed

A Sizing Problem with Heaps

Slava Murygin shows an example where adding a clustered index to an existing table can reduce its size:

Tables were very narrow with just a few columns and my expectations for data growth were very modest. However, after just a little while I was very surprised when my database showed huge unexpected growth and size of the data became multiple times higher than I’ve expected.

After very little research I’ve found and fixed the problem. In this post I’ll describe how I’ve done it.

Read on to learn how Slava figured this out and how a clustered index fixed the problem.

Comments closed

Managing Database Changes with Scripts

Ed Elliott continues a series on automated database deployments with a quick change process:

In my blog post here https://the.agilesql.club/blogs/ed-elliott/2019-06-10/steps-to-automated-database-deployments I described the steps you need to go through so you can build up your confidence that you are capable of deploying databases using automation. I mean, afterall, knowing that it is possible to automate your deployments and having confidence that they will succeed are two very different things.

Even with the best tooling in the world, automated database deployments are still a struggle and there is one key thing that you can do, no matter what tools you choose and that is to make the deployments re-runnable. (Insert discussion here on the word idempotent and how it means re-runnable but sounds far cooler and intellectual). If you make your deployments re-runnable then you can, by their very definiton, re-run them.

Click through for two options. I definitely prefer option number 1 as well.

Comments closed

Comparing On-Prem To Managed Instance Performance

Jovan Popovic has an article explaining how you can compare your current on-premises SQL Server’s performance to an Azure SQL Managed Instance’s performance:

In this post you will see some recommended tools and best practices that you should apply while doing performance comparison. The recommended performance comparison process has three stages:

1. Compare the environment settings on SQL Server and Managed Instance. 
2. Create performance baseline on source SQL Server
3. Compare performance on Managed Instance with the baseline

In the following sections will be described the best practices and the recommended approaches 

This is a good bit more involved than installing some product, clicking a few buttons, and comparing numbers.

Comments closed

Quantum: Serverless Database

Ashish Thusoo announces a new serverless SQL engine:

Today, we are launching Quantum, a high-performance serverless SQL engine, available on Qubole Data Platform, that simplifies SQL access by offering a true serverless deployment option to enable data analysts to query petabyte-scale volumes of data using ANSI-SQL.

Quantum allows teams to realize value from their data much more quickly, and because of its serverless nature, users pay only for queries they run. Data analysts can query object stores on AWS, Azure, Google Cloud, and Oracle Cloud in seconds to achieve faster time to market with far less IT management overhead.

Ankit Dixit, et al, provide a technical overview as well:

Existing serverless SQL service offerings do not provide users with the ability to use a  metastore of their choice. With Quantum, data teams can use their own custom metastore and start using Quantum without recreating schemas or table metadata.

Most existing Qubole customers already use a custom metastore in the cloud. So there’s virtually no ramp up time to reap the benefits of Quantum.

The technical overview is a bit too much marketing for my tastes, but this is a move worth watching.

Comments closed

Explaining Hadoop Struggles

Alex Woodie has an editorial taking us through why the major Hadoop distributors (Cloudera and MapR) are struggling so much lately:

The future is not clear for either Cloudera and MapR. While there are similarities in the two companies’ positions, there are big differences too.

Cloudera does not have a permanent CEO at the moment, and it still hasn’t shipped the new converged Hadoop distribution, dubbed Cloudera Data Platform (CDP), that will replace the old Cloudera and Hortonworks distributions. During its first quarter ended April 30, Cloudera said customers are holding off investing in the old Hadoop products since they know the new CDP is due by the end of the year. That fact led Cloudera to dramatically lower its revenue expectations for the year, which upset stockholders, who pushed Cloudera’s stock (NYSE: CLDR) down 40% the following day.

The way I’m phrasing it is that the Hadoop ecosystem is strong (with the successes of companies like Databricks and Confluent), but core Hadoop companies are struggling.

Comments closed