As a Clojure startup, we’re thrilled that Nubank has elevated the status of Clojure and Datomic. Datomic is clearly more advanced, but we must remember that our aims are distinct from Datomic’s. Only the most profitable, but tiny, segment of the market can afford to use Datomic as a backend database, which is why it was created for AWS and a business climate where the expenses of maintaining and relying on these cloud services are reasonable. Only these design goals will be taken into consideration when Datomic is released as open-source. For our part, we’re looking to employ Datalog as a distributed systems environment and expand Datahike to all endpoints, including web browsers and Internet of Things (IoT) devices. It wasn’t our primary goal to create an open-source version of Datomic. However, doing so as a first step made much too much sense. Our objective is that Datomic will be open-sourced so that our work may be combined. Clojure and Datomic’s present governance models do not allow us to expect that open-sourcing Datomic alone will solve all of our problems. There are a few places where we have an advantage over Datomic:
Compared to Datomic, we’ve invested in the creation of ClojureScript support, for example, and this is something we’ve been working on for a long time. Despite the fact that we switched from replikativ to Datahike, this made it more easier for us to adapt and reuse our stack.
During quarantine, we put in a lot of time and effort to improve our maturity in the following areas:
There has been a large increase in write throughput, and Datomic performance is nearing completion (it will be released soon). Two. We’ve released an early beta version of our server API that will be further developed in the coming months to enable Datomic-style local querying. Our Java bindings have recently been published at https://lambdaforge.io/2020/05/25/java-api.html.
A cooperative of more than five full-time employees has been established during the last year, and we want to expand even more in the next year to bring Datalog into the mainstream. We’ll get there much faster if Datomic is genuinely open-sourced. the 25th of July, 2020, synthc | parent
I appreciate your elaboration. Open source Datomic will not solve the problem of Datomic’s costs overnight, but a lightweight and cost-effective alternative would be ideal. In order to access a Datahike backend from the browser, what security measures would you implement? I’ve used Datomic from the browser in the past for internal tools, but it wasn’t apparent how to limit the searches to the sections of the database that the user had access to for external access. The Internet of Things (IOT) is a hot topic right now. Do you think Datahike has any benefits over Datomic for dealing with timeseries data? The idea of handling time series data is intriguing, and I believe we should be able to give unique indices and enhance Datalog with better query primitives if the need arises. What exactly do you mean by that? The query engine will ultimately need to know how to effectively combine external index data structures, like I did with HDF5 binary blobs for tensors of experimental recordings (parameter evolution in spiking neural networks) in Datomic a few years ago. Security is now being addressed by sharding databases and providing several databases, one for each user, with separate access privileges and encryption. Obviously, this isn’t the most efficient method, but it’s the most generic. Users’ access credentials and data may be shared structurally, which allows us to factorize even further. During a single query, we expect to be able to unite hundreds of dispersed Datahike instances in a global address space. It’s counterproductive to encrypt portions of an index for different users because doing so violates the optimality guarantees of B+ trees, resulting in very bad scan behavior over large ranges of encrypted Datoms. Since indices are amortized data structures, encrypting portions of them is unnecessary. Your data has been partitioned in a variety of ways. This is a fascinating issue to consider.
In addition, we may make the datahike-server query endpoint available directly, allowing you to implement access rights controls using static checks. While we limit the use of these features to safe ones, you can easily do the same for more elaborate access controls. For Datahike, there has been some progress in this direction: https://github.com/theronic/eacl Datalog engines offer strong query planners, and we can limit our runtime to a restricted budget as well, so we don’t have to worry about denial of service attacks while doing this over the internet.
Datoms were sorted by the timestamp by applying an order-preserving mapping to encode entityId,timespan,attribute to a large integer. In order to get datoms with specific timestamps, for example, we could use seek-datoms, but the speed was bad. A bespoke index, in my opinion, would be really beneficial in this situation. We also ran into issues when the database grew in size, necessitating periodic manual sharding. It would be wonderful if there were a datalog akin to TimescaleDB (which adds timeseries optimizations and time-based database partitioning to Postgres).
This is how many graphql frameworks handle defining access rules for clients, and I attempted to describe this using datalog rules. User/items and item/blabla may be accessed by any user with access to:user/items, therefore a user X can access [X:user/items Z]. A few of my tests showed promise, but I couldn’t get them to work together since they were taking too long.
I see, so your issue was that you required an EVAT index in order to scan all Datoms for a single entity over a period of time? Adding additional indices of this type would be a breeze in Datahike. In order to keep costs down, many systems have their own restricted method of expressing and tracking rules. Based on this assumption, I believe it’s still best if we maintain it all in Datalog and optimize the search for these (perhaps limited) criteria and relations. However, it was unclear how to restrict searches to to those areas of the database that the user was authorized to see while using external access. A subset of datoms in your database on the server can be excluded by applying a filter. Although I tried it, it was too sluggish for huge datasets that I was working with.