Press "Enter" to skip to content

Day: March 21, 2017

Introduction To Probability

Mala Mahadevan covers some basics of probability:

Probability is an important statistical and mathematical concept to understand. In simple terms – probability refers to the chances of possible outcome of an event occurring within the domain of multiple outcomes. Probability is indicated by a whole number – with 0 meaning that the outcome has no chance of occurring and 1 meaning that the outcome is certain to happen. So it is mathematically represented as P(event) = (# of outcomes in event / total # of outcomes). In addition to understanding this simple thing, we will also look at a basic example of conditional probability and independent events.

It’s a good intro to a critical topic in statistics.  If I would add one thing to this, it would be to state that probability is always conditional upon something.  It’s fair to write something as P(Event) understanding that it’s a shortcut, but in reality, it’s always P(Event | Conditions), where Conditions is the set of assumptions we made in collecting this sample.

Comments closed

R On Athena

Gopal Wunnava shows how to run R scripts using Amazon Athena as a data source:

Next, you’ll practice interactively querying Athena from R for analytics and visualization. For this purpose, you’ll use GDELT, a publicly available dataset hosted on S3.

Create a table in Athena from R using the GDELT dataset. This step can also be performed from the AWS management console as illustrated in the blog post “Amazon Athena – Interactive SQL Queries for Data in Amazon S3.”

This is an interesting use case for Athena.

Comments closed

Poor Man’s More

Chris Koester has a quick-and-easy file reader in a few lines of C#:

This post describes one way that you can read the top N rows from large text files with C#. This is very useful when working with giant files that are too big to open, but you need to view a portion of them to determine the schema, data types, etc.

I’ve used PowerShell many times to do this with large csv files, but in this example we’re going to use C# and look at the Wikipedia XML dump of pages and articles. The 3017-03-01 dump is very large and comes in at 59.5 GB.

I’ve had to write something similar before on Windows machines where I didn’t have access to more/less.  It’s really helpful for perusing the first few lines of gigantic log files.

Comments closed

What Will The DBAs Do?

Kevin Hill predicts that database administration isn’t going anywhere anytime soon:

There have been a lot of questions, posts, answers, guesses and such floating around the SQL blogs lately…most of which seem to suggest that the DBA is going away.

Hogwash.

The DBA position is not going away.  Ever.  Or at least not before I retire to Utah to spend my days mountain biking 😉

That said, Kevin does point out that you shouldn’t rest on your laurels.

One fun anecdote I have about database administration:  I recall some marketing for some NoSQL product about how, by adopting their software, you can get rid of those stodgy database administrators.  Within a couple of years, said product’s parent company was offering developer training on “advanced” techniques, which included taking backups, tuning queries, implementing disaster recovery, and creating good indexes to help with performance.  But hey, at least they don’t have DBAs!

Comments closed

Cores Visible Offline

John Morehouse enlists some assistance to figure out why his SQL Server instance is ignoring 24 cores:

Hidden schedulers are used to process requests that are internal to the engine itself.  Visible schedulers are used to handle end-user requests.  When you run the casual SELECT * query, it will utilize a visible scheduler to process the query.  With this information, if I have a 64 core server and all is well, I should have 64 visible schedulers online to process requests.

However, I discovered that some of the schedulers were set to “VISIBLE OFFLINE”.  This essentially means that those particular schedulers are unavailable to SQL Server for some reason.   How many offline schedulers do I have? A quick query resulted in 24 schedulers currently offline.  24 logical cores means that 12 physical cores are offline.

But why would a scheduler be set to “VISIBLE OFFLINE”?

Read on for the answer, and check the comments for a helpful plug for sp_Blitz.

Comments closed

RevoScaleR With Power BI

Tomaz Kastrun looks at several methods for using RevoScaleR packages in a Power BI dashboard:

I was invited to deliver a session for Belgium User Group on SQL Server and R integration. After the session – which we did online using web based Citrix  – I got an interesting question: “Is it possible to use RevoScaleR performance computational functions within Power BI?“. My first answer was,  a sceptical yes. But I said, that I haven’t used it in this manner yet and that there might be some limitations.

The idea of having the scalable environment and the parallel computational package with all the predictive analytical functions in Power BI is absolutely great. But something tells me, that it will not be that straight forward.

Read on for the rest of the story.

Comments closed

What’s In A Name?

Kenneth Fisher explains the different parts of an object name in SQL Server:

One Part Name
vIndividualCustomer
This is just the name of the object. This is probably the most common usage and yet the only one I would recommend never using. (I’ll freely admit I’m not great at this myself btw.) When you only use a single name the schema is implied by the default schema of the user running the statement. That means that the same statement run by two different users could have two different meanings. Since most users have a default schema of dbo this would probably hit the object dbo.vIndividualCustomer. Assuming one even existed.

Click through for more.

Comments closed