In this video, I demonstrate how much information we can gain from one sample of a distribution.
Some aspect of this is “that’s a neat parlor trick” but it does speak to the marginal information gain of a small amount of data.
Comments closedA Fine Slice Of SQL Server
In this video, I demonstrate how much information we can gain from one sample of a distribution.
Some aspect of this is “that’s a neat parlor trick” but it does speak to the marginal information gain of a small amount of data.
Comments closedJosephine Bush has an after-action report:
I did a couple of posts previously on dumping/restoring Azure PostgreSQL databases and also using the Azure migration tool. I had to ultimately do a combo of those because of the limitations of the Azure migration tool.
Read on for the process that Josephine went through to complete migration.
Comments closedBen Johnston takes us through a scenario:
I recently had an interesting production SQL Server issue that turned out to be very easy to fix. The fix doesn’t fit every workload or server, but for the limited use cases described below, it’s a simple configuration change. The general best practice for server level configurations is to leave things at default, unless there is a specific reason for changing the settings. This will outline a use case for using memory-optimized tempdb metadata.
This covers a very specific scenario and workload. You won’t use this on a typical server, which is why it isn’t a default setting. You will see this for very specific server workloads, those with many transactions and high temp table usage. Most systems can better use the memory for the regular workload instead of optimizing tempdb metadata, so don’t use this as a default setting for your servers.
Click through for the scenario.
Comments closedRecently, I was asked to compare data from a table in Snowflake with the same table’s data from a few hours before. As always, no one had thought about copying the data into another table before they started working with it. They just remembered an approximate time they started working with the table — 10 AM — and the table’s name — Public.WorkingTable. So, let’s see what we can do, shall we?
Read on for the process, as well as circumstances in which cloning might fail.
Comments closedBertrand Drouvot looks at some new statistics that will be available in Postgres 18:
It means that, thanks to the new pg_stat_get_backend_io() function, we can see the I/O activity in each backend (in addition to the overall aggregate still available through the pg_stat_io view).
Click through for some examples of what you can see and how it works. This seems like it’d be pretty helpful in a clustering scenario to make sure that you’re getting the load balancing that you expect.
Comments closedTomaz Kastrun hides the words:
Let’s play with some words. More in particular with vanishing words.
Using two packages: ggplot2 and gganimate we will construct a animation (looped), where sentences will be vanishing, word by word. A nice way to train the ggplot animations.
Click through for the code.
Comments closedSebastiao Pereira hunts for outliers:
Outliers can significantly distort statistical analysis and lead to incorrect conclusions when interpreting data. In this article, we will look at how to find outliers in SQL Server using various T-SQL queries. Understanding how to find outliers in SQL is crucial for accurate data analysis.
Sebastiao uses PERCENTILE_CONT()
in this demonstration. That works fine for relatively small tables, though it does not scale well at all. Once you’re in the millions of records, it gets slow. From there, my joke is that, if you have 100 million or more records, you can start a query with PERCENTILE_CONT()
on one instance. Meanwhile, on a separate instance, as soon as you kick off that query, go install SQL Server ML Services, configure it, check out a tutorial on R or Python, figure out how you can calculate the inter-quartile range in that language, learn how ML Services works, and you’ll still get the answer before your first query finishes.
If you’re using SQL Server 2022, there is a new APPROX_PERCENTILE_CONT()
that is orders of magnitude faster as you get increasingly large datasets. It’s also accurate to within 1.33% (on each side of the correct answer) within a 99% confidence. The way the query works is a bit different, though, because the approximation is a nested set function using a WITHIN GROUP()
clause, whereas PERCENTILE_CONT()
is a window function that uses an OVER()
clause. That means it’s not quite as easy as slapping “APPROX_” to the start of the query, but because Sebastiao uses WITHIN GROUP
in the T-SQL, it’s pretty close: PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY [ObsValue]) OVER() AS Q1
becomes APPROX_PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY [ObsValue]) AS Q1
or something like that–I’m compiling in the browser here.
Michael J. Swart checks the typewriter ribbon:
It’s hard to destroy data. Even when a column is dropped, the data is still physically there on the data page. We can use the undocumented/unsupported command DBCC PAGE to look at it.
This is tied in with how we can drop a column in SQL Server and have it not take a very long time: because when we drop the column, we’re just modifying the metadata associated with the table and telling SQL Server to ignore this bit here. Do read the whole thing, and also check out a fun comment from Paul White.
Comments closedHaripriya Naidu makes a change:
I was working on modifying isolation level of database from Read Committed to Read Committed Snapshot(RCSI) and had to get exclusive access to the database. After letting the application team know about it and having stopped their processes, I tried to set the database to SINGLE_USER but it errored out.
It turns out that you cannot set a database to single user mode if it is in an availability group or part of database mirroring. Nonetheless, there is still a way to make this change. Read on to learn more.
Comments closedAfter configuring an Email subscription, the subscription result shows: “Failure sending mail: One or more errors occurred.“. In this blog post I will share how I investigated and resolved one such failure.
My first step in troubleshooting this error was to query from the
ExecutionLog3
view inside theReportServer
database. I normally do this to check if and why a report subscription has failed its run.However, in this case all the log records showed success (
rsSuccess
status):
Read on to see what this indicates, how Eitan was able to troubleshoot the problem, and the ultimate fix.
Comments closed