Press "Enter" to skip to content

Day: January 29, 2019

UTF-8 And R

Sebastian Sauer gives us a brief overview of UTF-8 support in R and other relevant tools (like Excel):

That seems to work easily. Maybe that’s the easiest way at the end of the day (?).

One problem that may arise – besides building on proprietary code that may change without notice – is that Excel may have problems reading a UTF8 csv, as explained here.

Read on for more info on what has become the de facto web standard for text.

Comments closed

Analytical Pipelines In R With H2O And AWS

Hanjo Oden wraps up a series on training models on AWS using H2O in R:

To generate these, you can log into your AWS dashboard, go to the IAM (Identity and Access Management) dashboard and select the Users tab. On the Userstab, add a user and also the administration rights that you want the user to have.Remember to restart R once you have filled in the access key information in the .Renviron file for it to take effect.

At this point, those familiar with cloudyr suite is probably asking – “This is exactly the same as library(aws.ec2), so why use boto3?“. Well, to be honest, I was using aws.ec2 for a while, but I find spot-instances, which the current version of aws.ec2 does not support. In addition I found that boto3 has some other functionalitue – which I prefer. For a full list of boto3 functions to interact with an EC2 instance, have a look at the reference manual.

It’s pretty good stuff; check it out.

Comments closed

Snowflake DB Aliasing

Koen Verbeeck notes that Snowflake DB aliasing is a bit more robust than SQL Server’s:

That’s right. I defined an expression – a concatenation of two fields – and used the alias of that expression in another expression. Furthermore, I used the alias of the second expression in the WHERE clause. Gasp.

My workaround is to use CROSS APPLY and define calculations in blocks there. This doesn’t work for aggregation operations, but in cases like Koen’s example, it does simplify the SELECT and WHERE clauses. This is a nicer solution, though.

Comments closed

Finding Current Session Options In SQL Server

Max Vernon has a helpful script to tell you if, for example, ANSI_NULLS is on:

SQL Server can be configured to provide certain behavior to client sessions, via the SET OPTIONS= command, or via the sys.sp_configure 'user options' system stored procedure. The SET OPTIONS= command only affects the current session, whereas the sys.sp_configure 'user options' system stored procedure configures the default values for all future user sessions.

Click through for the script, which even contains a quick description of each option.

Comments closed

On Disabling Indexes

Kenneth Fisher has a few notes on disabling indexes:

Indexes are probably the number one tool we have to improve performance. That said, there are times when we want to put that index on hold. While indexes dramatically improve read performance they do cause a slight dip in write performance. This isn’t significant most of the time but when doing a large load it can frequently be faster to get rid of the existing indexes and then put them back when you are done.

I don’t think that I’ve ever regularly disabled indexes, even during bulk loading. It’s good to know that the option exists, however.

Comments closed

The Importance Of Index Column Order

Erik Darling shows us how arranging columns in an index can make a huge difference in query performance:

while back I promised I’d write about what allows SQL Server to perform two seeks rather than a seek with a residual predicate.

More recently, a post touched a bit on predicate selectivity in index design, and how missing index requests don’t factor that in when requesting indexes.

This post should tie the two together a bit. Maybe. Hopefully. We’ll see where it goes, eh?

Also apropos: missing index hints return results in alphabetical order, not in selectivity order or what would be best for queries. In other words, just because the green text in SSMS says it’s the index you want doesn’t mean it’s the index you need.

Comments closed

The Value Of Central Management Server

Jeff Iannucci explains the benefits of Central Management Servers:

If you have more than a handful of instances, you really owe it to yourself to take a few minutes and set up your own CMS. The hard part is getting a list of all the instances, but you need to do that anyways, right?

Once you have that list pick ONE SQL Server instance to create your CMS. You want this to be something with solid up-time that is accessible to other team members, but not an instance that’s already spending CPU cycles on something critical. 

The lack of CMS compatibility is one of my larger pain points with Azure Data Studio and an issue which hopefully gets fixed this year.

Comments closed

Authentication With Azure Managed Instances

Hamish Watson explains that Windows authentication is not available with Azure Managed Instances:

Azure SQL Managed Instances do not utilise windows authentication – so your two methods of authenticating applications and users are:
SQL Authentication:This authentication method uses a username and password.
Azure Active Directory Authentication:This authentication method uses identities managed by Azure Active Directory and is supported for managed and integrated domains. Use Active Directory authentication (integrated security) whenever possible.

Hamish also elaborates on some of the trickier bits about Azure Active Directory for someone used to on-prem AD solutions.

Comments closed