If you want to determine the optimal number of clusters in your analysis, you’re faced with an overwhelming number of (mostly subjective) choices. Note that there’s no “best” method, no “correct” k, and there isn’t even a consensus as to the definition of what a “cluster” is. With that said, this picture focuses on three popular methods that should fit almost every need: Silhouette, Elbow, and Gap Statistic.
Click through for the picture and references.
The complete machine for the biggest result (48 Cassandra nodes) has 574 cores in total. This is a lot of cores! Managing the provisioning and monitoring of this sized system by hand would be an enormous effort. With the combination of the Instaclustr managed Cassandra and Kafka clusters (automated provisioning and monitoring), and the Kubernetes (AWS EKS) managed cluster for the application deployment it was straightforward to spin up clusters on demand, run the application for a few hours, and delete the resources when finished for significant cost savings. Monitoring over 100 Pods running the application using the Prometheus Kubernetes operator worked smoothly and gave enhanced visibility into the application and the necessary access to the benchmark metrics for tuning and reporting of results.
The system (irrespective of size) was delivering an approximately constant 400 anomaly checks per second per core.
This is a good summary of what was an interesting series.
I’ve personally used SQL Notebook in my day-to-day work for Data Analysis, as the possibility to tweak the code and run it in the notebook greatly enhances the presentation of the data as oppose to a commented SQL Script ,as you cannot see all the query results in the same page too as opposed to a notebook; Moreover, a notebook (with or without results) can be exported in a read-only format like html or pdf to share the info with third parties, i.e. you can automate an analysis process that include code to be shared, cool stuff.
I think there are still a few (dozen) things to iron out before it’s a great experience, but they’re on the right path with it. If you haven’t checked out Azure Data Studio and its SQL Notebooks, give it a try sometime.
Utilizing the equation for a line, instead of solving for y we will solve for x, where:
– x corresponds to the day we will hit capacity based on current growth rate
– y corresponds to drive capacity in GB
– m is the slope of our regression line, provided by the model via lm.coef_
– b is the intercept of the regression line, also provided by the model via lm.intercept_
Click through for an example. This is one of the areas where DBAs can gain a lot by learning a bit of data science.
What about your SQL Agent jobs? Have you got jobs that perform actions on your data? If you have, do those jobs exist on the new primary? If they don’t then I’m happy to bet that whatever function that they were playing probably isn’t happening anymore.
One thing that I always want to make sure before I failover is, do I have all the relevant jobs ready to roll on the secondary server?
But what about the jobs that you’ve got on the old primary? There’s a fair chance that, if they’re doing any sort of data manipulation, they’re going to be failing.
Click through for additional considerations.
Biml column methods return code fragments. These code fragments can be used as building blocks to generate custom T-SQL statements. For example, the GetColumnList method returns a list of columns, separated by commas, that you can use in a SELECT statement. You can filter the columns and customize the output by passing parameters.
Biml is for more than just SSIS packages.
Most scripts (even ones I’ve worked on!) that look at the plan cache, have had the ability to sort it by different metrics: CPU, reads, writes, duration, etc.
A lot of people are very interested in long running queries — and I am too!
Heck, they’re how I make money. Blogging pays like crap.
But there’s a slight problem with only looking at query duration.
Read on for an example. Erik loses money on every blog post but he makes up for it in volume.
There are two ways to access Azure Cloud Shell, the first being directly through the Azure Portal itself. Once authenticated, look to the top right of the Portal and you should see a grouping of icons and in particular, one that looks very much like a DOS prompt (have no fear, DOS is nowhere to be seen).
The second method to access Azure Cloud Shell is by jumping directly to it via shell.azure.com which will require you to authenticate to your subscription before launching. There is an ever so slight difference between each method. Accessing the Shell via the Azure Portal will not require you to specify your Azure directory context (assuming you have several) since your Portal will have already defaulted to one, whereas with the direct URL method that obviously doesn’t happen.
Read the whole thing.
This may seem like a lot of steps, but it is only because we are taking the problems very slowly. The important point is that we want to minimize additional problem solving when applying the
cdatamethodology. Usually when you need to transform data you are in the middle of some other more important task, so you want to delegate the details of how the layout transform is implemented. With
cdatathe user is not asked to perform additional puzzle solving to guess a sequence of operators that may implement the desired data layout transform. The
cdatasolution pattern is always the same, which can help in mastering it.
cdata, record layout transforms are simple
Robjects with detailed
print()methods- so they are convenient to alter, save, and re-use later. The record layout transform also documents the expected columns and constants of the incoming data.
Check it out.
Boxplots for each quantitative variables are shown. We take advantage of the quantitative variable names (quantitative_vars) determined before to apply a ggplot2 package based boxplot function. The Y axis labeling and title are determined by the variable to be plot. Further, legend is not displayed and we adopt the coordinate flip option for improved readability.
Check it out to get an idea of how to do exploratory data analysis.