The 2 main types of stream processing done are:
1. Stateless: Where every event is handled completely independent from the preceding events.
2. Stateful: Where a “state” is shared between events and therefore past events can influence the way current events are processed.
Stateless stream processing is easy to scale up because events are processed independently. But Stateful stream processing is difficult to scale up because the “state” needs to be shared across the events.
Himanshu does point out alternatives, but this isn’t a comparison exercise.
We used a single topic for our write operations with a partition count set to either 3 or 6, depending on the number of brokers in each test cluster. As the test clusters were regular Aiven services, the partitions and replicas were spread out across availability zones.
Messages were produced via the librdkafka_performance tool with a message size of 512 bytes, a default batch size of 10,000 and no compression. Continuing our quest to simulate real-world use, client connections were made over TLS.
We used Kafka version 2.1 running with Java 8; as a side note, it’ll be interesting to benchmark Aiven Kafka running with Java 11 in future tests because we expect Java improvements to positively impact its performance.
During the test, we kept increasing the number of producing clients until we reached the maximum throughput rate each plan tier’s cluster could accept. To verify our readings, we left the load running for some time.
There are some interesting results here.
This post is about something new I have tried last week. The goal was to create simulated streaming data source, feed it into Power BI as a streaming dataset, create a report out of the streaming dataset, and then embed it to an web application. With proper directions provided by my teammates, I finished the implementation from end to end within 1.5 hours. I was super impressed by how awesome it is and how easy it is to implement so that I want to share those directions to you.
The source data is simulated but the process is the same with real data sets.
One of the new phrases coming out of Microsoft is that “SQL is just SQL” regardless of what operating system it resides on. This was echoed during the keynote at SQL Bits 2019 by the Microsoft team, which you can watch here. Later that weekend, I gave a session about database internals. My presentation is about how data is structured within a row and why that matters. Understanding the internals of table structures, even in today’s age of technology, include SQL Server 2019 (which will be released in Q3/Q4 of 2019) is important. During my session, a question came up about how a data page is structured if SQL Server is sitting on top of a Linux server, such as Ubuntu. Does the data page have the same size and shape in Linux as it does in Windows?
They do. Click through to see John prove it.
If you run SQL Server Reporting Services, part of your DR plan needs to include a backup of the encryption key for SSRS. This sadly is an all to often overlooked part of the solution, even though it is incredibly easy to do. If you don’t have a backup of the encryption key during a restore, the report server will never be able to decrypt the encrypted content (connection strings, passwords, etc) stored in the database, and your only recourse would be to delete the encrypted content and recreate it manually or through a redeployment of datasources.
Jonathan includes a couple of links to good resources. Your backups are only good if they include all of the keys and certificates you used. But keep those certificates stored someplace other than where the backups are stored.
What about a separate Power BI Date table?
This setup is built for consistency of comparison. As people go deeper into Power BI, they typically add a separate Date table as part of a more robust data model and add relationships between tables. At the same time, they disable the default Auto Date/Time built-in hierarchies. This more advanced setup with a separate Date table allows several conveniences as well as performance and storage benefits. It’s especially true with larger models that include many facttables that each join to Date and other possible dimension tables. Tableau doesn’t currently have a comparable data model. We’ll stay conveniently away from that setup in Power BI because we only have one simple sample table.
I think both of them make this an easy operation, though Tableau is probably easier here.
The mssqluser named volume is going to be mounted as /var/opt/sqlserver and the mssqlsystem volume is going to be mounted as /var/opt/mssql. This is the key to the databases automatically being attached in the new container, /var/opt/mssql is the location of the system databases.
If we didn’t mount a named volume for the system databases any changes to those databases (particularly for the master database) would not be persisted so the new container would have no record of any user databases created.
Read the whole thing.
The March 2019 release of Power BI Desktop has brought us keyboard accessible visual interactions. One of Power BI’s natural strengths is that you can click on a data point within a visual and have it cross-highlight or cross-filter the other visuals on a page. But keyboard-only users weren’t able to use this feature until now. This greatly raises the accessibility of the Power BI report consumption experience.
Click through to see a few of these shortcuts in action.