Press "Enter" to skip to content

Day: October 30, 2019

Offset and Limit with Cosmos DB

Hasan Savran takes us through the OFFSET and LIMIT clauses in Cosmos DB:

OFFSET LIMIT clause one of the latest additions to the Azure Cosmos DB. Skip/Take function was a big request from users and Cosmos DB team listened users and deliver this functionality. If you think Cosmos DB is missing a feature and if you have a new idea, you can use Feedback Forums to give feedback to Cosmos Db team.


     OFFSET LIMIT clause let you skip x number of results then take y numbers of values from the query. Count for OFFSET and Limit are integer and both are required. In other words, You must use LIMIT if you use OFFSET.

A common use for this is paging. I’d be interested to see if this shares the issues that the SQL Server version has: you may only return back 20 rows, but you’re potentially scanning N + 20 each time.

Comments closed

Kafka Consumer Delivery Semantics

Sylvester Daniel continues a series on Apache Kafka:

Enable.auto.commit
This defines how offsets are committed to Kafka — by default  enable.auto.commit is set to true. When this property is set to true, you may also want to set how frequent offsets should be committed using  auto.commit.interval.ms.

By default,auto.commit.interval.ms is set to 5,000ms (5 seconds). When enable.auto.commit is set to true, consumer delivery semantics is “At most once, and commits are async.

Click through for more information on what this means, as well as several other important settings.

Comments closed

Date Buckets in SQL Server

Andy Mallon needs to create groups of date ranges in T-SQL:

Here’s where my math nerd-iness comes out… There’s a little trick for “bucketizing” numbers (in this case, turning “Months” into “Month Buckets”):

– Take a number
– Divide it by your bucket size
– Round that number down to a whole number–We’ll call this the “divided number”
– Multiply the “divided number” by the bucket size–This is your bucket floor
– Add one to the “divided number” and then multiply that by your bucket size–This is your bucket ceiling

I was promised that there would be no math.

Comments closed

Creating a Better Index Maintenance Script

Erik Darling, despite being on Team Profiler, has something important to say:

If you’re the kind of person who cares about various caches on your server, like the buffer pool or the plan cache, then you’d wanna measure something totally different. You’d wanna measure how much free space you have on each page, because having a bunch of empty space on each page means your data will take up more space in memory when you read it in there from disk.

You could do that with the column avg_page_space_used_in_percent.

BUT…

Read the whole thing.

Comments closed

Validating Upstream Data Quality with T-SQL

Ed Elliott has a pattern to try:

We can select those rows which we want into our real table which will either already have constraints enabled or we can enable the constraints after we load the data. There are a few problems with this approach, the first problem is that this isn’t very easy to debug. When you realise that your load process only loaded half of the expected rows, why didn’t it load the rest? Other problems include, what do we do with any failed rows – i.e. which rows failed? To answer this we need another query that is the reverse of the “get good rows”. Finally, this is quite a lot of SQL for a file with four columns and three rules, what if the rules are more complex and our data file has lots of columns? I tell you what happens if the rules are more complex: “a mess of SQL jank”, that’s what.

So what instead? 

Read on to learn about the alternative, which is an interesting approach. This is another case where a good SQL-based solution is almost a pivot of a good imperative solution: instead of thinking in rows, focus on columns first and let the rows sort themselves out.

Comments closed

Plan Hashes and Ad-Hoc Workloads

Erin Stellato has an experiment for us:

Borrowing and adapting code from a previous post, Examining the Performance Impact of an Adhoc Workload, we will first create two stored procedures. The first, dbo.RandomSelects, generates and executes an ad hoc statement, and the second, dbo.SPRandomSelects, generates and executes a parameterized query.

Erin then shows how to review query stats and group together executions which are the same save for a change in literals. Read the whole thing.

Comments closed

Being a SQL Server Product Owner

Kevin Chant has an interesting role:

Now, I have had a few people ask me what a Product Owner actually does. Some say that it sounds like an architect role.

In reality, the role is one that’s mainly related to newer working practices like Scrum.

A Product Owners list of responsibilities include talking to all the stakeholders for you team in the business and organise the priorities on your backlog board.

The concept makes sense, though this is the first time I’ve heard of such a role for a tool the engineers use rather than a product offered for sale.

Comments closed

Dealing with NULLs in Java with SQL Server 2019

Niels Berglund covers changes in SQL Server Machine Learning Services around Java code execution:

In the null values post mentioned above, I mentioned that there are differences between SQL Server and Java in how they handle null. So, when we call into Java from SQL Server, we may want to treat null values the same way as we do in SQL Server.

I wrote about this in the SQL Server 2019 Extensibility Framework & Java – Null Values post mentioned above. However, that post was written before SQL Server 2019 CTP 2.5. In CTP 2.5 Microsoft introduced the Java SDK, and certain things changed. Amongst the things that changed is the way we handle nulls when we receive datasets from SQL Server in our Java code.

Read on to learn how it works today.

Comments closed