Kristian Larsen has a couple of posts on Monte Carlo style simulation in Python. First up is a post which covers how to generate data from different distributions:
One method that is very useful for data scientist/data analysts in order to validate methods or data is Monte Carlo simulation. In this article, you learn how to do a Monte Carlo simulation in Python. Furthermore, you learn how to make different Statistical probability distributions in Python.
A useful method for data scientists/data analysts in order to validate methods or data is Bootstrap with Monte Carlo simulation In this article, you learn how to do a Bootstrap with Monte Carlo simulation in Python.
Both posts are worth the read.
Apache Spark MLlib users often tune hyperparameters using MLlib’s built-in tools
TrainValidationSplit. These use grid search to try out a user-specified set of hyperparameter values; see the Spark docs on tuning for more info.
Databricks Runtime 5.3 and 5.3 ML and above support automatic MLflow tracking for MLlib tuning in Python.
With this feature, PySpark
TrainValidationSplitwill automatically log to MLflow, organizing runs in a hierarchy and logging hyperparameters and the evaluation metric. For example, calling
CrossValidator.fit()will log one parent run. Under this run,
CrossValidatorwill log one child run for each hyperparameter setting, and each of those child runs will include the hyperparameter setting and the evaluation metric. Comparing these runs in the MLflow UI helps with visualizing the effect of tuning each hyperparameter.
Hyperparameter tuning is critical for some of the more complex algorithms like random forests, gradient boosting, and neural networks.
Default schemas in SQL Server can be a blessing, since they reduce the need to specify the schema when creating DDL statements in T-SQL. However, relying on the default schema when creating DML statements can be problematic. A recent question on dba.stackexchange.com asked “Does T-SQL have a Schema search path?”, similar to PostgreSQL implements the
search_pathparameter. This post shows how schemas are implemented in SQL Server. We’ll also see why it’s important to always specify the schema when using SQL Server.
A lot of this behavior goes back to the pre-2005 era. 2005 introduced schemas as logical separators, whereas pre-2005 they were more of a security measure (and
dbo was the database owner’s schema). I completely agree that you should specify two-part names in-database. It’s a tiny bit faster (which adds up when you’re doing thousands of transactions per second) and reduces ambiguity.
Recently during a demo at a SQL Saturday the query to pull the Extended Event session data, didn’t return the expected results. The session I used for the demo was the create database statement.
Prior to the session, I deleted the Create Database session, however did not delete the target files because they are part of the demo. Then I recreated the session, just as I had done before. However, this time was there was a difference when I attempted to read the target data. The entry for the newly created database was not showing up when I used the GUI, however was showing up when I read the XML. During the session, I was not able to figure out why that was the case.
Click through to see the root cause and how Dave fixed the problem.
OK so if we break this down into what a DBA should be doing as part of a code review:
– Ensure formatting is correct and any standards followed
– Have they introduces a SQL injection vulnerability?
– Consider any side effects of the actual change, for instance altering a clustered key on a 1 billion row table will take time – is this possible on a live system?
– Consider any performance effects – is this more prone to tempdb spills? How about deadlocks? Is the plan going to be terrible?
– Is the code going to do what the developer wants? Do they have the update statement correct in the merge statement?
That’s a lot, how can we help developers understand enough so that they can review their own code and cause fewer issues in production?
I believe this is a bit aspirational. Nevertheless, if you do get there, life gets easier.
What was the wait_type? Well, the obscure wait_type was called PREEMPTIVE_OS_PIPEOPS. What causes this wait? As it turns out, this is a generic wait that is caused by SQL pipe related activities such as xp_cmdshell.
Knowing this much information however does not get us to the root cause of this particular problem for this client. How do we get there? This is another case for Extended Events (XEvents).
Read on for two ways to approach this, both using Extended Events.