Press "Enter" to skip to content

Month: June 2017

Kafka For Pythonistas

Matt Howlett has an introduction to Apache Kafka, designed for Python developers:

In the call to the produce method, both the key and value parameters need to be either a byte-like object (in Python 2.x this includes strings), a Unicode object, or None. In Python 3.x, strings are Unicode and will be converted to a sequence of bytes using the UTF-8 encoding. In Python 2.x, objects of type unicode will be encoded using the default encoding. Often, you will want to serialize objects of a particular type before writing them to Kafka. A common pattern for doing this is to subclass Producer and override the produce method with one that performs the required serialization.

The produce method returns immediately without waiting for confirmation that the message has been successfully produced to Kafka (or otherwise). The flush method blocks until all outstanding produce commands have completed, or the optional timeout (specified as a number of seconds) has been exceeded. You can test to see whether all produce commands have completed by checking the value returned by the flush method: if it is greater than zero, there are still produce commands that have yet to complete. Note that you should typically call flush only at application teardown, not during normal flow of execution, as it will prevent requests from being streamlined in a performant manner.

This is a fairly gentle introduction to the topic if you’re already familiar with Python and have a familiarity with message broker systems.

Comments closed

Decomposing Power BI Desktop Files

Reza Rad wants to see exactly where the M scripts in a Power BI Desktop file are stored:

Talking about Power Query; DataMashup file is all you need. It includes everything from the structure of queries, tables, parameters, list, to the actual M scripts behind the scene. You can Fetch all of these information from this single file. Let’s look at the structure of this file. If you open this file with a text editor. you will see some binary things first (which are related to the zipped nature of this file), and also some XML information. Yes, this is a zipped file. Let’s start with unzipping it into a folder. I’ve done that with 7-zip application.

This is an interesting peek under the covers of a PBIX file.

Comments closed

Physical Versus Read-Ahead Reads

Kendra Little explains which SQL Server diagnostic tools include read-ahead reads versus “regular” physical reads:

SQL Server has more than one way to pull pages in from disk for your queries. SQL Server can do a physical read of an 8KB page, or an extent of 8  of those 8KB pages.

SQL Server can also use the “read-ahead” mechanism to pull even larger chunks of data in from disk when you have a query that wants to read a lot of data — because just plucking one 8KB page or even 64KB of pages into disk isn’t super fast when you need lotsa pages.

But these terms get a little confusing when you’re changing between different diagnostic tools in SQL Server, because some of these tools include read-ahead reads in physical reads, and some don’t!

There is some good information here, so read the whole thing.

Comments closed

Incorrect PFS Free Space

Dave Mason walks through troubleshooting one database corruption scenario:

I’ve been lucky with database corruption during my career. I could probably count on one hand the number of times I’ve had to deal with it. A couple times, it was in a customer’s environment–they managed it themselves, but called me in to help. The other incidents were ones I inherited from a backup I had to restore into a production environment. The first time it happened to me, I didn’t realize it until days later when DBCC CHECKDB ran during a weekend maintenance window. After that, I added a new “rule” to my list: always run DBCC CHECKDB after restoring a database from someone else. That rule paid dividends today.

Here’s the output I saw:

Msg 8914, Level 16, State 1, Line 50
Incorrect PFS free space information for page (1:2564368) in object ID 457768688, index ID 1, partition ID 72057619124060160, alloc unit ID 72057594116767744 (type LOB data). Expected value   0_PCT_FULL, actual value 100_PCT_FULL.
CHECKDB found 0 allocation errors and 1 consistency errors in table 'tbl_Redacted' (object ID 457768688).
CHECKDB found 0 allocation errors and 1 consistency errors in database 'db_redacted'.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (db_redacted).

Read on to see how Dave solved this issue.

Comments closed

Synchronizing Logins And Jobs

Ryan Adams enumerates several methods for synchronizing logins and SQL Agent jobs across mirrored instances or Availabilty Group replicas:

There is an awesome set of PowerShell cmdlets out there written by MVP Chrissy LeMaire.  This method is my personal choice.  It works great and is easy to automate.  You can run it with SQLAgent or you can just use Scheduled Tasks in the OS.  The scheduled tasks method is a little cleaner, but you don’t get to see it in SQL Server.  Also if you are on a cluster and running Windows 2012 you can cluster the task scheduler as an added benefit.

Chrissy wrote this with the intent of making migrations easier, and she succeeded.  In fact, I made it a point to thank her at MVP Summit last year because it made my life insanely easier.  The advantage here is that you can automate a lot more than than just logins.  In fact you can migrate and automate pretty much anything at the server level.  Here is the link that I guarantee you are going to bookmark followed by a video demo where I show how to install and automate the syncing of logins using both the SQLAgent method and the Scheduled Tasks method.

https://dbatools.io/

DBATools would be my preference in this situation as well, but click through to see four other methods, as well as code.

Comments closed

Tuning Apache Solr

Michael Sun explains how to optimize Apache Solr’s memory usage:

For Oracle JDK 8, both CMS and G1 GC are supported. As rule of thumb, if the heap size is less than 28G, CMS works well. Otherwise G1 is a better choice. If you choose G1, there are more details about G1 configuration in the part 2 of this blog. You can also find helpful guidance in Oracle’s G1 tuning guide.

Meanwhile it’s always a good idea to enable GC logging. The overhead of GC logging is trivial but it gives us a better understanding how the JVM uses memory under the hood. This information is essential in GC troubleshooting. Here is an example of GC logging settings.

There’s some good administrative assistance, but also tips on more efficient querying.

Comments closed

Azure Private Virtual Networks

The Tech Junkie shows how to create a private virtual network in Azure:

In the previous blog post we created an Azure cloud service.  Now we are going to create a private virtual Azure network.  The importance of this is that when you create a virtual machine in Azure you will use this virtual network to connect to your virtual machine.

This is a screenshot-driven, step-by-step post that makes setting these up easy.

Comments closed

Dot-Density Maps In R

Paul Campbell shows how to build a dot density map in R:

To get me started I invested in the expert guidance of data-visualiser-extraordinaire Nathan Yau, aka Flowing Data. Nathan has a whole host of tutorials on how to make really great visualisations in R (including a brand new course focused on mapping) and thankfully one of them deals with how to plot dot density using base R.

Now with a better understanding of the task at hand, I needed to find the required ethnicity data and shapefiles. I recently saw a video of Amelia McNamara’s great talk at the OpenVis Conference titled ‘How spatial polygons shape our world’. The .shp file really is a glorious thing and it seems that the spatial polygon makers are the unsung heros of the datavis world, so a big round of applause for all those guys is in order.

Anyway, I digress. Luckily for me, the good folks over at the London DataStore have a vast array of Shapefiles that go from Borough level all the way down to Super Output Area level. I’m going to use the Output Areas as the boundaries for the dots and the much broader Borough boundaries for ploting area labels and borders.

The ethnic group data itself was sourced from the Nomis website which has a handy 2011 Census table finder tool where you can easily download an Ethnic Group csv file for London output areas. Vamonos.

I’m going to give this a second reading; it’s a great example of how to go from functional to beautiful.  H/T David Smith

Comments closed

Basic Parsing On Invoke-WebRequest

Andy Levy shows how to handle complex redirects with Invoke-WebRequest in Powershell:

So off to the PowerShell prompt I went and ran Invoke-WebRequest -Urihttp://firstresponderkit.org/ to start looking at the object returned so I could see what I needed to parse out to find my way to the true download URL.

Then Firefox (my default browser) opened, and I was staring at https://github.com/BrentOzarULTD/SQL-Server-First-Responder-Kit/tree/master.

I was expecting an HTTP 30X redirect status code which, based upon previous experience, Invoke-WebRequest would honor. Instead, I got a 200 OK which is the web server saying “yep, here’s your stuff, HAND!”

Read on for the solution.

Comments closed

Out Of User Memory Quota

Jack Li troubleshoots an In-Memory OLTP error:

[INFO] HkDatabaseTryAcquireUserMemory(): Database ID: [7]. Out of user memory quota: requested = 131200; available = 74641; quota = 34359738368; operation = 1.

This is my first time to see this error.  As usual, I relied on source code to find answers.   The message is a result of enforcing memory quota for In-memory OLTP usage.  As documented in “In-Memory OLTP in Standard and Express editions, with SQL Server 2016 SP1”, SQL Server 2016 SP1 started to allow In-Memory OLTP to be used in all editions but enforce memory quotas for editions other than Enterprise edition.  The above message is simply telling you that you have reached the quota and what ever operation you did was denied.

Jack provides more context around the error as well.

Comments closed