Curated SQL – Page 1415 – A Fine Slice Of SQL Server

Data Wrangling At Scale

Published 2017-11-21 by Kevin Feasel

John Mount has a short article showing off the cdata package:

Suppose we needed to un-pivot this data into a row oriented representation. Often big data transform steps can achieve a much higher degree of parallelization with “tall data”. With the cdata package this transform is easy and performant, as we show below.

Read the whole thing.

Comments closed

Docker And R

Published 2017-11-21 by Kevin Feasel

Mara Averick has some resources to help you get started with running R in a Docker container:

liftr 📦 by Nan Xiao

🐳📜 Docker your docs: “liftr: Containerize R Markdown Documents” by @road2stat https://t.co/nSN1ylZXsy #rstats #docker #rmarkdown pic.twitter.com/9xJQ4rANjy

— Mara Averick (@dataandme) October 15, 2017

liftr aims to solve the problem of persistent reproducible reporting. To achieve this goal, it extends the R Markdown metadata format, and uses Docker to containerize and render R Markdown documents.

Click through for those resources as well as an addictive 8-bit animated GIF.

Comments closed

Binder: Hosting Jupyter Notebooks

Published 2017-11-21 by Kevin Feasel

Julia Evans points out a really interesting service:

Binder lets you easily host interactive Jupyter notebooks and let anyone on the internet use them interactively immediately! It uses JupyterHub under the hood.

If you want to try it out, you can do that right now:

Go to https://mybinder.org/v2/gh/jvns/pandas-cookbook/master (which will launch the github.com/jvns/pandas-cookbook repository)

Wait for it to build and click ‘launch’

click ‘cookbook’, click a notebook, and play around! There’s an “A quick tour of the IPython Notebook” notebook which shows off some of the basic features.

It apparently uses Kubernetes + Docker under the hood which is interesting! It must be much much more expensive to run than the read-only services, but it’s such a useful and cool thing! I hope it continues to exist.

Definitely worth checking out. I’m going to have to see the steps for getting an R runtime so I can post some of my own notebook repos.

Comments closed

Three Sessions And A Funeral

Published 2017-11-21 by Kevin Feasel

Solomon Rutzky explains what happens to sessions after they see the light at the end of the tunnel:

Sessions, in SQL Server, are born when a Connection is made from a client library to SQL Server. Temporary objects – Tables and/or Stored Procedures (yes, these are a thing) – may be created during a Session’s lifetime. The question is: for those temporary objects that are not explicitly dropped, what exactly happens to them? It is commonly known that they magically (ok fine, “automagically” — ok, ok, FINE, “automatically”) get dropped. But when do they get dropped? When the Session ends, right? And the Session ends when the Connection is closed, right? Well, that is certainly the common / conventional wisdom, at least. But is that understanding of the nature of Sessions and temporary objects correct?

It’s a more complicated topic than you might get from first appearances.

Comments closed

When Join Order Matters

Published 2017-11-21 by Kevin Feasel

Bert Wagner takes a look at one of the lesser appreciated tricks in performance tuning:

I had a great question submitted to me (thank you Brandman!) that I thought would make for a good blog post:

…I’ve been wondering if it really matters from a performance standpoint where I start my queries. For example, if I join from A-B-C, would I be better off starting at table B and then going to A & C?

The short answer: Yes. And no.

One of my favorite query tuning books is SQL Tuning by Dan Tow. Parts of it are rather dated at this point—like pretty much anything involving a rule-based optimizer—but the gist still works well. What it comes down to is finding the best single table from which to drive your query (based on table size, filters, etc.) and selecting the appropriate join order afterward. It’s fairly time-consuming effort, but for the 0.5-1% of queries which really need it, it can be the difference between an awful plan and a good one.

Comments closed

Myths With Page Life Expectancy

Published 2017-11-21 by Kevin Feasel

Kevin Hill has a public service announcement:

I’ve been hearing throughout my entire DBA career that 300 seconds is a good counter for PLE (Page Life Expectancy) to be above.

Paul Randal calls this “utter nonsense“. If anyone would have the right to say that, its Paul.

It is good for the number to be above 300, but that’s like saying it’s good to have your brand new car last a year and a half.

Comments closed

T-SQL Tuesday Roundup

Published 2017-11-21 by Kevin Feasel

Ewald Cress has what might have been the largest T-SQL Tuesday ever:

Firstly, I want to thank every person who took part. SIXTY TWO blog posts got generated, including a few first-time #tsql2sday contributors as well as first-time bloggers. I am fairly glowing to have been a part of it, and I hope the other contributors are too.

Secondly, from my own experience in writing a post, I know it feels terrible when you start worrying about who to pick. There are many people I could have included, but I hope I have made my appreciation for them clear elsewhere. Not that I want to speak on your behalf, but I’ll assume that the same applies to many other contributors.

That’s a lot of reading.

Comments closed

Importing SSMS Registered Servers Into SQL Operations Studio

Published 2017-11-21 by Kevin Feasel

Drew Furgiuele has a hankering for SQL Operations Studio and wants to invite a few servers to the party:

One barrier to entry is that the initial setup can be a little daunting, especially if you use a local connection groups or central management servers to keep track of registered connections in SQL Server Management Studio. You’d be in for a lot of manual clicking and typing of connections if you have a lot of saved connections. But there’s a better way: you can import all that saved information right into SQL Operations Studio, and it’s pretty painless, too. Buckle up, because this involves a little knowledge of how settings are saved in Operations Studio, and how we can quickly get saved connection information out of SSMS and into your new application. Spoiler alert, we’re going to use PowerShell.

I’d love to see CMS support in SQL Operations Studio. In the meantime, this is a more or less reasonable alternative, depending upon how many servers you have and how frequently they change.

Comments closed

Clusterless Availability Groups For Scaling Out Reads

Published 2017-11-21 by Kevin Feasel

Sean Gallardy shows a good use case for Availability Groups in scaling out reads:

Read-Scale availability groups are ones where we don’t want the availability group for high-availability or disaster recovery, instead, we want to use it to create multiple copies of our databases that span across multiple servers allowing for the spreading of a large read-only workload. There are various scenarios where this might be extremely valuable and in previous versions of SQL Server it was possible, though there was a requirement of using Windows Server Failover Clustering (WSFC). Read-Scale availability groups do not require the WSFC component and does not give high-availability or disaster recovery, it only acts as a mechanism (availability groups) to facilitate the synchronization of the databases across multiple servers.

To reiterate, this is not used for high-availability or disaster recovery but instead to scale your databases across multiple servers for read workloads.

The remainder of the post shows how to set up an Availability Group without the corresponding Windows Server Failover Clustering components.

Comments closed

Picking A Python IDE

Published 2017-11-20 by Kevin Feasel

Kevin Jacobs reviews a few Python IDEs from the perspective of a data scientist:

Ladies and gentlemens, this is one of the most perfect IDEs for editing your Python code! At least in my opinion. Jupyter notebook is a web based code editor and can quickly generate visualizations. You can mix up code and text containing no, simple or complex mathematics. One thing I am missing here, is the support for code completion, but there are tons of plugins available so this should be no problem. It is also easy to turn your notebook into a presentation. For collaboration with non-technical teams, this is a great tool.

Conclusion: perfect Python IDE for data science! Less support for code inspection.

Click through for reviews of three IDEs.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Curated SQL Posts

liftr 📦 by Nan Xiao