Press "Enter" to skip to content

Curated SQL Posts

Data Science And Data Engineering In HDP 3.0

Saumitra Buragohain, et al, show off some of the things added to the Hortonworks Data Platform for data scientists and data engineers:

We leverage the power of HDP 3.0 from efficient storage (erasure coding), GPU pooling to containerized TensorFlow and Zeppelin to enable this use case. We will the save the details for a different blog (please see the video)- to summarize, as we trained the car on a track, we collected about 30K images with corresponding steering angle data. The training data was stored in a HDP 3.0 cluster and the TensorFlow model was trained using 6 GPU cards and then the model was deployed back on the car. The deep learning use case highlights the combined power of HDP 3.0.

Click through for more additions and demos.

Comments closed

Thinking About Font Sizes

Stephanie Evergreen shares some good information on font sizes:

Did you know that you regularly read type set in size 8, or even smaller? In printed materials, captions and less important information (think: photograph credits, newsletter headline subtext, magazine staff listings) are usually reduced to something between 7.5 to 9 points. We generally read that size type without much issue, like glasses. The reason why we can comfortably read those small sizes is because the designers chose an effective font that keeps its clarity and legibility when shrunk.

Designers don’t make the font that tiny to give you a headache. They do it to establish a font hierarchy. Our brains interpret the biggest size as the most important and the littlest size as the least important. So we can create a hierarchy of font sizes to structure our work and communicate even more clearly.

The font hierarchy is important, but so is picking a font which is clear at the sizes you want to use.

Comments closed

New DMF: dm_db_page_info

Pam Lahoud shows off a new Dynamic Management Function in SQL Server 2019:

The primary use case we had in mind when developing this function was troubleshooting page-related waits. Some of the key performance scenarios that require page details to diagnose are tempdb contentionlast page insert contention (also see this blog articlefor code samples) and page-level blocking. All of these scenarios present as either PAGELATCH or lock wait types in sys.dm_exec_query_stats with a page identifier (db_id:file_id:page_id) in the wait_resource column. In order to gain any insight into the problem, you have to know more details about what that page is. Is it a data page, an index page or something else? If it’s a data or index page, which object does it belong to? Prior to SQL Server 2019 CTP 2.0, the only way to gain any of this insight was to use the lightlydocumented command DBCC PAGE. Unfortunately, if you’re using some sort of monitoring tool or script to gather data about waits and blocking, there’s no good way to include this DBCC PAGE call in those scripts. Not to mention it requires sysadmin privileges. Enter sys.dm_db_page_info.

The sys.dm_db_page_info function takes 4 parameters: database ID, file ID, page ID and mode (‘LIMITED’ or ‘DETAILED’). It returns a table with a single row that contains the header information from the page, including things like object_id, index_id and partition_id. As a standalone function, this provides a documented and fully supported way to retrieve this information without the need for a DBCC command. The real benefit however is the ability to JOIN this function with other DMVs such as sys.dm_exec_requests or sys.sysprocesses to gather this information automatically when there is any sort of page-related contention. If you’re familiar with these views you know that the wait_resource column can contain all sorts of different resources, so it’s not in the right format to facilitate a direct join to sys.dm_db_page_info. To support this use case, we had to add another column to these views called page_resource. Whenever the wait_resource column contains a page resource, the page_resource column will be populated with an 8-byte hexadecimal value. The page ID is the first four bytes, followed by 2 bytes for the file ID and then 2 bytes for the database ID.

There are also a couple of supporting functions, so check those out.

Comments closed

When Query Store Skips An Interval Period

Rich Brenner explains why you might see gaps in your Query Store run times:

When messing about (developing) things that are based on the query store I came across an unusual phenomenon. We were missing query store intervals. It’s not that data had been deleted but rather than it was never created in the first place. Take a look at this;

I’d set the query store to 5 minute intervals for testing purposes and left it. Notice that there is no data between 15:20 and 15:25. It’s not that the data has been deleted, we know that because the runtime_stats_interval_id is still creating increments correctly.

Click through for the answer.

Comments closed

Death Marches Roundup

Jeff Mlakar takes us on a tour of bad project planning:

This month we had 15 post submissions about this daunting topic. Two of the posts were from people who had never posted before. To them I say welcome and I hope you enjoyed the experience.

I think, in general, you were all brave to write about this sensitive topic. I know a lot of you are consultants (which is a great way to expose yourself to a death march project) and must be careful about telling stories that could be misconstrued by clients. Nonetheless, with enough obfuscation and redaction you have brought forth some truly horrifying posts! I am going to group the submissions by the most terrible themes that many posts shared.

It is best to read this post in Vincent Price’s voice.

Comments closed

Removing A Database From An AG

Erin Stellato walks us through the process of removing a database from an Availability Group and then re-adding it later:

I recently had a scenario in a two-node Availability Group where multiple large-batch modification queries were executed and created a large redo queue on the replica.  The storage on the replica is slower than that on the primary (not a desired scenario, but it is what it is) and the secondary has fallen behind before, but this time it was to the point where it made more sense remove the database from the replica and re-initialize, rather than wait several hours for it to catch up.  What I’m about detail is not an ideal solution.  In fact, your solution should be architected to avoid this scenario entirely (storage of equal capability for all involved nodes is essential).  But, stuff happens (e.g., a secondary database unexpectedly pausing), and the goal was to get the replica synchronized again with no downtime.

Click through for the demo.

Comments closed

Capturing Implicit Conversions With Extended Events

Grant Fritchey shows how easy it is to build an extended event which captures implicit conversions:

Built right into the Extended Events is an event that captures conversions that would affect execution plans, plan_affecting_convert. This event will show both CONVERT and CONVERT_IMPLICIT warnings that you would normally only see within an execution plan. You can capture this event with others. Capturing events together along with causality tracking makes it very easy to track queries that have the issue. Here’s one example of how you might capture implicit conversions along with the query:

This kind of event can make system-level performance tuning much easier.

Comments closed

Using R To Hit Azure ML From Power BI

Leila Etaati shows how you can use R to hit an Azure ML endpoint to populate a data set in Power BI:

You need to create a model in Azure ML Studio and create a web service for it.

The traditional example in Predict a passenger on Titanic ship is going to survived or not?

we have a dataset about passengers like their age, gender, and passenger class, then we are going to predict whether they are going to survive or not

Open Azure ML Studio and follow the steps to create a model for predicting this. Navigate to Azure ML Studio.

Then download the dataset for titanic from here

Click through for the step-by-step instructions.

Comments closed