One of the useful features of EMR Notebooks is the separation of the notebook environment from your underlying cluster infrastructure. The separation makes it easy for you to execute notebook code against transient clusters without worrying about deploying or configuring your notebook infrastructure every time you bring up a new cluster. You can create multiple serverless notebooks from the AWS Management Console for EMR and access the notebook UI without spending time setting up SSH access or configuring your browser for port-forwarding. Each notebook you create is launched instantly with its own Spark context. This capability enables you to attach multiple notebooks to a single shared cluster and submit parallel jobs without fear of job conflicts in a multi-tenant environment. This way you make efficient use of your clusters.
You can also connect EMR Notebooks to an EMR cluster as small as a one node. This gives you a budget-friendly sandbox environment to develop your Spark application.
Notebooks are everywhere. And for good reason.
Whenever I need to install RStudio on a new machine, I have to think a bit about the configuration options I’ve tweaked. Invariably, I miss a checkbox that leaves me with slightly different RStudio behavior on each system. This post includes screenshots of my RStudio configuration and custom keyboard shortcuts for RStudio 1.3, MacOS, so that I have a reference.
I like these kinds of posts because they can help you find interesting settings you might not otherwise know about. Also, I second the FiraCode recommendation for R as well as F#. The only reason I don’t use it more is because I don’t want to confuse people during presentations. H/T R-Bloggers
Next, let’s add a check constraint our computed column. For this example, we are just going to make sure that the value in the table is a palindrome (because this is something that every data architect has come across at least one in their life, right?). So Value = REVERSE(Value);
Read on for more fun and sometimes-useful things you can do.
Deadlock detection and analysis in SQL Server is important for the overall health of affected applications. This post shows how to setup an Extended Events session to capture deadlock details. It also provides a stored procedure to capture details from the Extended Events session. The stored procedure enables simplified permanent storage of those deadlock detection events. Next, we’ll set up a SQL Server Agent Job to run the stored procedure on an ongoing basis. Finally, we’ll see several examples of how to query the captured events. These queries support making the necessary changes to both the application and database design.
Click through for a description of what a deadlock is as well as scripts to help find and fix them.
Sales.SalesOrderDetail looks like a good choice. Let’s generate a script for that table, all associated objects, and its data.
The safest way to create structure including all indexes, keys, defaults, constraints, dependencies, triggers, etc. is to use SSMS Generate Scripts.
The first one comes from a post on Database Administrators Stack Exchange: STATISTICS IO for parallel index scan
To summarize the situation, the OP had a query that was scanning a clustered index. They were seeing significantly higher numbers reported in the
logical readsportion of the
STATISTICS IOoutput when the query ran in parallel vs. serially (with a
MAXDOP 1query hint). There is a demo of this behavior in the post, so I won’t reproduce it here.
There are several interesting cases in here, so check them out.
I knew from reading the documentation that
dbcreatorgrants permissions to create, alter, drop, and restore databases. My question was does it give permission to backup databases?
It seems to give everything else so is backup databases just missing there? Or is it intentionally left out?
To find out whether it does, click on through.
In addition to default value, the minimum number of query executions in the AUTOquery_capture_mode for storing the query, its plan(s) and runtime statistics in SQL Server 2019 has been increased from 3 to 30. That means, Query Store does not store anything for first 29 query executions. It reserves query_ids, but it starts storing execution plan and runtime stats from 30thexecution in a single day.
These look like reasonable changes to me.