To make the above possible, we provide a Bring Your Own VNET (also called VNET Injection) feature, which allows customers to deploy the Azure Databricks clusters (data plane) in their own-managed VNETs. Such workspaces could be deployed using Azure Portal, or in an automated fashion using ARM Templates, which could be run using Azure CLI, Azure Powershell, Azure Python SDK, etc.
With this capability, the Databricks workspace NSG is also managed by the customer. We manage a set of inbound and outbound NSG rules using a Network Intent Policy, as those are required for secure, bidirectional communication with the control/management plane.
This is a good article if the defaults won’t get past corporate security.
I was troubleshooting a deadlock the other day and it got me thinking…. I know the number 1205 by heart and know it is associated to a deadlock. What other numbers are there that you can associate to an event or object or limitation. For example 32767 will be known by a lot of people as the database id of the ResourceDb, master is 1, msdb is 4 etc etc.
So below is a list of numbers I thought of
Leave me a comment with any numbers that you know by heart
BTW I didn’t do the limits for int, smallint etc etc, those are the same in all programming languages…so not unique to SQL Server
Read on for Denis’s list.
In this first option, Power BI handles everything. We use the web-based Power Query Online tool for structuring the data. Power BI handles scheduling the data refresh.
The underlying data behind the dataflow is stored in a data lake. However, since it’s fully managed, this data lake is not directly accessible or visible to the customer. As with most cloud-based implementations, the infrastructure is hidden under the covers. This is what is happening if your users are utilizing dataflows currently but you haven’t specified a data lake account in the Power BI admin center.
Melissa gives us a great summary of the three patterns, so read the whole thing.
There is a constant rumble among Oracle DBAs- either all-in for Oracle Real Application Cluster, (RAC) or a desire to use it for the tool it was technically intended for. Oracle RAC can be very enticing- complex and feature rich, its the standard for engineered systems, such as Oracle Exadata and even the Oracle Data Appliance, (ODA). Newer implementation features, such as Oracle RAC One-Node offered even greater flexibility in the design of Oracle environments, but we need to also discuss what it isn’t- Oracle RAC is not a Disaster Recovery solution.
Click through for a good high-level contrast, as these are quite different products.
As we get closer to the General Availability of SQL Server Management Studio (SSMS) 18, we have decided to have a quick release of the Release Candidate (RC) build.
It’s been in preview for a while but things are now getting real. I’ll have to check later on if it fixes the bug I found with PolyBase on SQL Server.
Back in August of 2017, we wrote two posts #9: Compating your Share Libraries and #10: Compacting your Shared Libraries, After The Buildabout “stripping” shared libraries. This involves removing auxiliary information (such as debug symbols and more) from the shared libraries which can greatly reduce the installed size (on suitable platforms – it mostly matters where I work, i.e. on Linux).
There’s a pretty good space savings in the
tidyverse package. H/T R-Bloggers.
1. Cluster Size (N): Number of nodes/brokers in the Kafka cluster, we should have 2x+1, i.e. at least 3 nodes or more in an odd number.
2. Partitions: We write/publish data/event into a topic which is divided into partitions (by default 1), but we should have M times N, where M can be any integer number, i.e. M >= 1, to achieve more parallelism and partitioning of data over the cluster.
3.Replication Factor: determines the number of copies (including the original/Leader) of each partition in the cluster. All replicas of a partition exist on separate node/broker, and we should never have R.F. > N, but at least 3.
We recommend having 3 RF with 3 or 5 nodes cluster. This helps in having both availabilities as well as consistency.
Click through for several more tradeoff points.
JiraPS is a wonderful module that is great at a lot of things. There several great people that have put a lot of time into it for it to have such good feature coverage. It was designed to be approachable and do a lot of validation for you. JiraPS does not have a good user story around bulk operations with its issue commands and thats what I need from it the most.
It also has a large user base with hundreds of thousands of downloads and is used in a lot of organizations. Every change made to that module at this point needs to pay very close attention to backwards compatibility.
Now there are two modules depending on your use case.
There is a Jira ticket for the Apache Spark project, SPARK-27006. The gist of this ticket is to bring .NET support to Spark, specifically by supporting DataFrames in C# (and hopefully F#). No support for Datasets or RDDs is included in here, but giving .NET developers DataFrame access would make it easy for us to write code which interacts with Spark SQL and a good chunk of the SparkSession object.
You an click through and read everything I have to say, but do go to the Spark ticket and vote for .NET support.
There is really no point in adding another column
IDfor an individual row in this table, even if a lot of ORMs and non-ORM-defined schemas will do this, simply for “consistency” reasons (and in a few cases, because they cannot handle compound keys).
Now, the presence or absence of such a surrogate key is usually not too relevant in every day work with this table. If you’re using an ORM, it will likely make no difference to client code. If you’re using SQL, it definitely doesn’t. You just never use that additional column.
But in terms of performance, it might make a huge difference!
Lukas makes a good argument here.