Press "Enter" to skip to content

Day: April 10, 2018

A Basic Explanation Of Associative Rule Learing

Akshansh Jain has some notes on associative rules:

Support tells us that how frequent is an item, or an itemset, in all of the data. It basically tells us how popular an itemset is in the given dataset. For example, in the above-given dataset, if we look at Learning Spark, we can calculate its support by taking the number of transactions in which it has occurred and dividing it by the total number of transactions.

Support{Learning Spark} = 4/5
Support{Programming in Scala} = 2/5
Support{Learning Spark, Programming in Scala} = 1/5

Support tells us how important or interesting an itemset is, based on its number of occurrences. This is an important measure, as in real data there are millions and billions of records, and working on every itemset is pointless, as in millions of purchases if a user buys Programming in Scala and a cooking book, it would be of no interest to us.

Read the whole thing.

Comments closed

Building Forest Plots With ggplot2

Faisal Atakora shows how to build a forest plot using ggplot2:

To build a Forest Plot often the forestplot package is used in R. However, I find the ggplot2 to have more advantages in making Forest Plots, such as enable inclusion of several variables with many categories in a lattice form. You can also use any scale of your choice such as log scale etc. In this post, I will introduce how to plot Risk Ratios and their Confidence Intervals of several conditions.

Click through for the script.  You might also want to compare it to the forestplot package to see how these differ.

Comments closed

Working With Dates And Times In Logstash

Mike Hillwig continues his Logstash series:

So far, I’ve done a decent job getting the data into shape. My biggest challenge, though, was the dates and times. Dates are in one field, and the times are in another. Dates look like 2014-02-26 and times look like 0852 Using a traditional datetime datatype would be nice to have, so I’ll have to do it myself. In order to turn a date and time into a datetime, I need to abut the two fields and then convert it.

I accomplished this by using a mutate filter, employing by several add_field commands. Notice how I simply abut the two times.

Read on to see how Mike does it.

Comments closed

Accessing BigQuery Data From Python And R

Eleni Markou shows how to connect to Google’s BigQuery service using Python and then R:

Some time ago we discussed how you can access data that are stored in Amazon Redshift and PostgreSQL with Python and R. Let’s say you did find an easy way to store a pile of data in your BigQuery data warehouse and keep them in sync. Now you want to start messing with it using statistical techniques, maybe build a model of your customers’ behavior, or try to predict your churn rate.

To do that, you will need to extract your data from BigQuery and use a framework or language that is best suited for data analysis and the most popular so far are Python and R. In this small tutorial we will see how we can extract data that is stored in Google BigQuery to load it with Python or R, and then use the numerous analytic libraries and algorithms that exist for these two languages.

Read on to see how easy it is for either language.

Comments closed

The Value Of Live Query Stats In SSMS

Rob Farley exlaims his appreciation of Live Query Stats in SQL Server Management Studio:

wrote about Live Query Statistics within SSMS a while back – and even presented at conferences about how useful it is for understanding how queries run…

…but what I love is that at customers where I have long-running queries to deal with, I can keep an eye on the queries as they execute. I can see how the Actuals are forming, and quickly notice whether the iterations in a Nested Loop are going to be unreasonable, or whether I’m happy enough with things. I don’t always want to spend time tuning a once-off query, but if I run it with LQS turned on, I can easily notice if it’s going to be ages before I see any rows back, see any blocking operators that are going to frustrate me, and so on.

I don’t use it often, but when I do, I typically learn something interesting about the query I’m running.

Comments closed

Database Backups And GDPR

Grant Fritchey digs into one of the more contentious areas of GDPR:

Nothing within Article 17 talks about backups, offsite storage, readable secondaries, log shipping, or any of that stuff. In fact, there’s nothing technical there at all. No help to tell you what to do about this question.

Now, each article has expansions that further detail the information within the article called recitals. In the case of the right to be forgotten, there are two, Recital 55 and Recital 66. Recital 55 has nothing for us, at all. Recital 66 does talk about the fact that, because we’re dealing in an online world, the best available technical means should be used to deal with the fact that a person’s data may be in more than one location and we’ll need to clean that up.

And that’s it.

In fact, you can search the GDPR and not find the word, backup.

Read on for Grant’s thoughts, including what he argues is a defensible position (though we won’t know for sure until the bureaucracy runs its course).

Comments closed

Things To Think About Before Detaching And Attaching A Database

Jana Sattainathan has a few considerations before running a detach-attach operation:

On the detached database files, the file permissions change when detaching. i.e., The AD account (your AD account) performing the detach operation becomes the owner and the only person with permissions to the file. It does not inherit the permissions from the folder. Oddly, if you just detach and reattach, it would attach fine even though SQL Server Service account does not have any explicit permissions on the files. However, others cannot attach a file that you detached even if they are on the Administrators group until they are explicitly granted permissions on the files themselves. This is well explained in this MSSQLTips article. Quoting the article, if you want to retain file permissions on detach, set the trace flag 1802.

“SQL Server 2005 introduced trace flag 1802 which retains the database files permission after the detach operation. The trace flag is tested and still applicable with SQL Server 2016.”

Click through for several tips of similar ilk.

Comments closed

Dynamically Showing Or Hiding Columns In SSRS With Parameters

Sander Stad shows how to show or hide columns at runtime in SQL Server Reporting Services reports using parameters:

Regularly I have reports that have an extensive amount of columns.
Because the amount of columns, reports tend to become inefficient and have too much information we don’t always need. The users may want to select certain columns and to make the report easier to read.

Hiding and showing columns in SSRS reports using parameters is a solution to make reports more dynamic and easier to use.

At the time of writing of this article, SQL Server Reporting Services did not yet have the possibility to use checkbox parameters. Instead we will be using a multi-value text parameter to show or hide our columns.

Click through to see how to do this.

Comments closed