The Prevalence of Persistent XSS

Adrian Colyer has a review of a security-minded paper:

Does your web application make use of local storage? If so, then like many developers you may well be making the assumption that when you read from local storage, it will only contain the data that you put there. As Steffens et al. show in this paper, that’s a dangerous assumption! The storage aspect of local storage makes possible a particularly nasty form of attack known as a persistent client-side cross-site scripting attack. Such an attack, once it has embedded itself in your browser one time (e.g. that one occasion you quickly had to jump on the coffee shop wifi), continues to work on all subsequent visits to the target site (e.g., once you’re back home on a trusted network).

In an analysis of the top 5000 Alexa domains, 21% of sites that make use of data originating from storage were found to contain vulnerabilities, of which at least 70% were directly exploitable using the models described in this paper.

Adrian’s been on a security paper kick the last few days, so be sure to check those out.

CSV Data Ingestion with Spark

Jean Georges Perrin shows how you can easily load CSV data with Spark:

Fortunately for you, Apache Spark offers a variety of options for ingesting those CSV files. Ingesting CSV is easy and schema inference is a powerful feature.
Let’s have a look at more advanced examples with more options that illustrate the complexity of CSV files in the outside world. You’ll first look at the file you’ll ingest, and understand its specifications. You’ll then have a look at the result and finally build the mini-application to achieve the result. This pattern repeats for each format.

It’s good to see some of the lesser-used features pop up like date format and multi-line support (which I hadn’t even known about).

The Benefits of Partitioning in CosmosDB

Hasan Savran explains why partitioning data in CosmosDB is so important:

In any Introduction level CosmosDB talk, Presenter will suggest you pay more attention to the Partitioning part of the talk. Microsoft wants you to create great solution by using Azure CosmosDB and there are a lot of resources out there for developers. The challenge in Azure CosmosDB is, you don’t need a DBA in CosmosDB and developers may not pay attention to details like Partitioning when they create databases in Azure CosmosDB.
      It is crucial to pick the right partition key for your databases in CosmosDB simply because you cannot repartition your databases. If you pick a wrong partition key for your data model, it will be simply too late to change it later. You might have good amount of data in your databases, that means you spent good amount of RU to insert this data and only way to fix this problem will be start from scratch and upload all data back again to CosmosDB with the right partition key.

Read on for a couple of analogies and why this is so important.

SSMS Autorecover

Michelle Haarhues takes us through auto-recovery in Management Studio:

Periodically there is a crash, power surge, or sudden reboot of your computer.  Of course, you had SQL Server Management Studio (SSMS) open and you were working on something important.  That seems to be the only time there is a crash/reboot.  You can lose work that that was open in SSMS but has not saved.  There is an AutoRecover feature in SSMS so all may not be lost.  

Read on to learn more about auto-recovery. It only restores on crashes, though there are third-party plugins you can get to restore after start. Azure Data Studio restores on restart as well, so kudos to the ADS team for that.

When SQL Server Replication Ignores Tables

Matt Slocum takes us through a tricky replication scenario (hint, they all are):

There are occasions when Updates, Inserts, and Deletes on a replicated table do not replicate out to the Subscriber.  You’ve verified that the table is listed in the Articles included in the Publication, and that there is at least one Subscription on the Publication. 

The strange thing is that there are likely other tables in the same Publication that are properly being replicated to the same Subscriber.

What is happening here?  Why is replication ignoring this table?

Read on to see Matt’s explanation and fix.

Rollback’s Effect on Identity Columns

Adrian Buckman explains that rollbacks on identity columns still burn those identity values:

As I say – This is just what I has seen people do and it was only the other day when I saw a similar situation but with an insert instead, The user believed that because the changes were made within a transaction this would rollback EVERYTHING however they did not consider the impact on the Identity column on the table they made the insert in.

Here is an example to demonstrate how a rollback on an insert will not rollback your identity seed on your table.

Click through for the demo. Sequences behave in practice the same way: once you pull that next sequence ticket, you can’t put it back into the machine just by rolling back the transaction. That’s why identity columns and sequences aren’t good for situations where you absolutely need contiguous data, such as invoice numbers or check numbers.

Persistent Storage and Kubernetes

Chris Adkin explains the concepts behind persistent storage in containers:

A question that often crops up is “Can I use local storage”, the answer is “It depends”. Kubernetes is essentially a container scheduler at its most basic and fundamental level. The ‘Pod’ is the unit of scheduling, containers in the same pod share the same life cycle and always run on the same node. For stateless pods life is reasonably simple and straight forward, for state-full pods, life is a bit more nuanced. If for any reason a node fails, the pods that ran on that node have to be rescheduled to run on a working node, and their  storage needs to follow them. This involves un-mounting the volume from the failed node and then mounting it on the node the pod(s) are rescheduled to run on. With basic vanilla hyper-converged storage, i.e. storage and compute in the same chassis, this will ultimately lead to scheduling problems. However, software defined solutions exist that enable this kind of infrastructure to be turned into a storage cluster which allows state to follow pods around the cluster. Some people automatically associated HDFS with local storage, the reason for this is probably because “Back in the day”, the most cost efficient way for Google to scale out its infrastructure was via commodity servers with local disks. 

Read the whole thing.

Finding Column Usage Anywhere on an Instance

Pamela M. has a script for a difficult scenario:

This is what I use when these moments happen. This script will go out and search for occurrences of the search parameter in every table, view, synonym, stored procedure and function on the instance and will return the results in a single set.

Click through for the script, which Pamela warns us will be slow.

Saving Time with Choclatey

Craig Porteous explains how you can use Chocolatey to manage Windows packages:

In my own words, it’s the quickest way to get a machine up and running, ready to start working. Chocolatey is a package manager for Windows that allows you to script out the installation of (what feels like) almost anything. I was able to get up and running on a new laptop in just a few hours, while I worked on something else. Forget constant Next > Next > Next installation wizards. I had all the tools I needed to do my job, not just apps but PowerShell modules too, all installed automatically.

I’ve pulled packages but never created one. It’s definitely something that should go on my to-learn list given how often I build and rebuild machines and VMs.


April 2019
« Mar