Press "Enter" to skip to content

Curated SQL Posts

Reticulate: Python-R Interop

Adnan Fiaz walks us through an example of using the reticulate library to call Python from R:

So what exactly does reticulate do? It’s goal is to facilitate interoperability between Python and R. It does this by embedding a Python session within the R session which enables you to call Python functionality from within R. I’m not going to go into the nitty gritty of how the package works here; RStudio have done a great job in providing some excellent documentation and a webinar. Instead I’ll show a few examples of the main functionality.

Just like R, the House of Python was built upon packages. Except in Python you don’t load functionality from a package through a call to librarybut instead you import a module. reticulate mimics this behaviour and opens up all the goodness from the module that is imported.

This is a good intro to a package which is already useful but I think will be even better over time as R & Python interoperability becomes the norm.  H/T R-Bloggers

Comments closed

Working With Data Frames In R

Dave Mason has a couple of blog posts on data frames.  First, the basics:

Conceptually, a dataset is a grid or table of data elements. It consists of rows, which we specifically call “observations”, and of columns , which are called “variables”. (Observations may also be referred to as “instances”. Variables may also be referred to as “properties”.) The data frame in R is designed for data sets. As the R documentation tells us, data frames are “used as the fundamental data structure by most of R’s modeling software”.

The function we’ll be working with primarily in this post is the data.frame() function. I have read that in R programming, creating data frames with this function is rather uncommon. Most of the time, data frames are created by invoking other functions that read data from an external data source (like a file or a database table) with a data frame as the return type. But for simplicity, data.frame() will serve our purposes.

Then, subsetting data frames:

Adding columns to a data frame is easy–easy compared to adding rows. We’ll get to that. To add a column, first create a vector. The class doesn’t matter. But the number of elements does–it has to match the number of observations in the data frame. Now that we have our vector, here are some options to add it as a new column to a data frame: use the $ shortcut, use double brackets with the new column name, bind the vector to the dataframe with cbind().

The data frame (or tibble, if using the tidyverse version) is probably the single most important data type in R for getting work done.

Comments closed

Writing Audit Logs To Azure Event Hubs

Ronit Reger announces that Azure SQL Database auditing logs can now go to Azure Log Analytics or Azure Event Hubs:

Azure Log Analytics plays a central role in monitoring and management of your Azure environment. It enables collecting telemetry and other data from a variety of sources across Azure, and provides a query language and analytics engine for deep analysis and insights on the operation of applications and resources. For more information on the Log Analytics platform, see What is Azure Log Analytics.

With native support for saving SQL audit logs directly to Log Analytics, log data from all of your database resources can be gathered and stored in a single central location. The logs can now be analyzed using the rich analysis tools provided by the platform, which can provide deeper visibility and advanced cross-resource analytics.

In addition, SQL Server audit logs (from on-premises SQL Servers or SQL Servers on a VM) can also be collected in Log Analytics via OMS agent integration, as described in this article. Thus, you can manage and analyze all of your database audit logs, whether from the cloud or on-premises, in a single central location using the power of Azure Log Analytics.

This looks useful.

Comments closed

Keeping Headers Visible When Scrolling In SSRS

Ginger Keys shows us how to keep tablix headers visible when going through a SQL Server Reporting Services report:

When scrolling through the pages of a SQL Server Reporting Services (SSRS) report, it is very useful to be able to see the column headers throughout the report.  So let’s say you have successfully created an SSRS report using Visual Studio, and everything looks wonderful…except the headers on your columns disappear when you scroll down the page.  You have even set the properties of your Tablix to “Keep Headers Visible While Scrolling”, but it still doesn’t work!  Trying to keep the column headings visible while you scroll down the page of your SSRS report can be a frustrating endeavor.  The following steps will demonstrate how to make it work.

I always thought “Keep Headers Visible While Scrolling” should have been renamed to “Don’t Do Anything About Headers But Let Me Think You Did Something So I Can Look Like I Don’t Know What I’m Talking About When I Tell Customers That The Report Headers Should Stay Visible While Scrolling” but I guess that might have been too long of a property description.

Comments closed

Switching Object Schemas

Steve Jones show you a quick way of switching a database object’s schema:

I haven’t had the need to move an object from one schema to another in years. Really since SQL Server 2000. I wrote about deleting a user that owns a schema recently, but that’s often a first step. The next thing I might need to do is actually move objects from that schema to a new one.

I actually ran across this command when I was looking how to move the schema to a new user. There’s actually a parameter for ALTER SCHEMA that will move objects.

This doesn’t pop up too often for me at least, but it’s good to remember if you’re using schemas as a method of categorizing data.

Comments closed

The Basics Of Kubernetes

Chris Adkin shares some thoughts on what Kubernetes is and why it might be interesting to data platform professionals:

I strongly urge anyone with an interest in learning Kubernetes to watch this presentation, as it makes a great job of explaining Kubernetes from the ground up.

“Out of the box”, Kubernetes will look after scheduling. If there is a requirement to ensure that pods only run on a specific set of nodes, there is a means of doing this via label selectors, as documented here. A label selector is a directive of influencing resource utilization related decisions made by the cluster.

Read the whole thing.

Comments closed

Thoughts On Max Memory Settings

Monica Rathbun shares some thoughts on SQL Server’s max memory settings:

Quite often I see database administrators set SQL Server max server memory thinking everything related to SQL Server uses this shared memory pool. This is a mistake. There are many things that rely on memory that are not part of SQL Server. Best practices state that you should leave memory allotted for the operating system. However, did you know that if you are running services like SSIS, SSAS or SSRS on the same server as the database engine that it does not use the same memory you have allocated for SQL Server? If the Max Memory setting is not configured correctly, these other serves could incur memory pressure.  While the memory consumed by SSAS and SSRS can be configured, SSIS can be a little bit more challenging. Beyond this, there are even scenarios where SQL Server max memory consumed can exceed the setting, like with CLR in versions earlier than 2012 and some other bugs in SQL Server.

As a consultant, I have seen memory pressure and memory exhausted too many times to count because the DBA was unaware of this. I applaud those that take the time to properly configure this setting according to what the database engine requires. Let’s take it a step further and take the time to look at what additional services you are using and allot memory accordingly.

Read on for more.

Comments closed

Undeleting A Deleted Azure SQL Database

Arun Sirpal shows us the “undoing a big mistake” button:

Okay honestly I have done this once. I have deleted Azure SQL Databases and then try and find the quickest way to recover. The Azure portal is actually pretty good when it comes to deleting resources, for example it will usually ask you to re-type the name of the resource to confirm deletion, so you can tell what a bad mistake I made.

Let’s look at how to delete a database then recover it.

I’m curious how long it stays there before dropping off into the abyss.

Comments closed

Calculating Lifetime Value With R

Sergey Bryl shows how to calculate the lifetime value of a subscription service:

Predicting LTV is a common issue for a new, recently launched product/service/application when we don’t have a lot of historical data but want to calculate LTV as soon as possible. Even though we may have a lot of historical data on customer payments for a product that is active for years, we can’t really trust earlier stats since the churn curve and LTV can differ significantly between new customers and the current ones due to a variety of reasons.

Therefore, regardless of whether our product is new or “old”, we attract new subscribers and want to estimate what revenue they will generate during their lifetimes for business decision-making.

This topic is closely connected to the Cohort Analysis and if you are not familiar with the concept, I recommend that you read about it and look at other articles I wrote earlier on this blog.

Read the whole thing.

Comments closed

Writing SQL Against Elasticsearch

Guy Shilo shows how you can write SQL to query Elasticsearch:

The mappings Elastic SQL uses are:

Index = Table

Document = Row

Field = Column

This mapping is quite intuitive. Types are left out because they are obsolete in Elastic 6.0 on.

So let’s give it a try. I used the latest Elastic 6.4 for this demonstration and ran the queries from Kibana, although they can be run with curl or just a browser as well. First we will need some data. I found this article in Elastic documentation that suggests several data files ready to be loaded. I did not need all of the data so I only used the json file that contains all the works of William Shakespeare that can be downloaded here.

Feasel’s Law continues.

Comments closed