FFI Refactoring in Rust

Engineering Programming

FFI refactoring is an alternative to microservices and can be used to rewrite small pieces of code in a more efficient language.

Rust is a programming language that is both quick and dynamic, and it performs just as well as C and C++ do. It may be tempting to rewrite an old monolithic app in Rust, but complete rewrites can be challenging for a number of reasons. Rust is a relatively new programming language. They frequently take much longer than anticipated, introduce new bugs, cause confusion about how the previous method worked, and do not rectify issues with the way the system is constructed. They run the risk of looking foolish because they are unaware of what is occurring. Complete rewrite projects almost always run into some sort of difficulty.

The most significant takeaways from this article are the assertions that FFI refactoring can be used as an alternative to microservices and that it is possible to take a small piece of code, rewrite it in a language that is more efficient (in this example, Rust), and then use CFFI to connect it to the original codebase. This is somewhat simplified for us, and some of the binding libraries that we will be using will make things easier for us to accomplish. The underlying technology is known as CFFI, and the names given to the two programming languages—the old one being referred to as the host language and the new one being referred to as the guest language—are descriptive. When is it appropriate to make use of the FFI refactoring strategy? Consideration for the use of a microservice ought to be given in the event that the location in which we are employed possesses a robust infrastructure for collaborating with a large number of microservices.

If there is already a pattern established, it is preferable to stick to the established path rather than attempting to create a brand new one. When we add more languages to our program, we need to have a conversation about some unfavorable aspects, such as the fact that deployments are going to be more difficult and that we need to be on the lookout for bugs in the translation. We also need to consider the peculiarities of Python and Rust, as well as the peculiarities of converting Python data structures to Rust data structures. These are both important considerations. On a more manageable scale, we might be able to introduce errors, just as we might if we were doing a complete rewrite.
What are the characteristics of a successful FFI refactoring project? It’s the same thing as comparing microservices to monolithic applications. Microservices are capable of scaling on their own, updating and deploying themselves, and deploying new versions on their own. If you have a monolith and you want to make it a little faster, or if you need to keep everything in memory and in a single process, FFI refactoring is something to think about. Additionally, you ought to give some consideration to the language that will serve as your host language, as some of them are considerably slower than Rust.
Rust’s performance is comparable to that of C and C++ most of the time, and certain other programming languages, such as Ruby, Python, Node.js, and Lua, have excellent tools for integrating with Rust. These are the most vital considerations to take away from this article. Go is an intriguing option because its performance is comparable to that of Rust, despite the fact that its tool set is less impressive. Additionally, creating an FFI link between Go and Rust incurs a significant performance cost, which is why it is essential to consider various other alternatives. If you are working with something that doesn’t have good tools or won’t get much of a performance boost from being refactored to Rust, you should think about other options in addition to the languages that are located in the upper right quadrant of this chart. These languages are good choices for an FFI refactor. It takes a list of numbers from the JSON request POST body and calculates several statistical properties, such as the range, the quartiles, the mean, the average of all the numbers, the standard deviation, and the FFI refactor. It also calculates the FFI refactor. You can download a free copy of “The Rust Programming Language” by Steve Klabnik and Carol Nichols Goulding from the website doc.rust-lang.org/book. This book was written by Klabnik and Goulding. If you have an interest in Rust in general, then this book is one that you should read.

We will initiate a new Rust project if we type cargo new —lib rstats into the terminal. This will result in the creation of two new files for us: carg.toml and lib.rs. We are also going to add version 0.15 of the statrs crate, which is a statistics tool. Rust’s standard library is significantly more condensed than Python’s, and its features include operating system primitives, files, threads, fundamental timer functionality, and some networking code. Additionally, we will be bringing in version 0.16 of the pyo3 crate. This crate is responsible for the bindings that allow Rust and Python to communicate with one another. In the end, we will change the type of the crate to cdylib. This will force us to use the C calling conventions, which is required so that the Python interpreter can understand how to call our functions.
After the Flask library calls the Python HTTP handler, the request body’s JSON data will be deserialized by the Python handler. This is how the talk is going to be organized in general. The statistics will be computed by the code written in Rust, which will then convert them back into JSON before sending them back to the HTTP client. The basic premise is that we can select a single functionality and rewrite it using the Rust programming language. The ability to parse structured data in both of these languages will be made possible as a result of this. In order to accomplish this, we will develop a new function that will be known as compute stats. This function will accept a Vec containing f64s as an argument and will produce a return type consisting of numbers.
The Python code returns a JSON object that consists of the following properties: range, quartiles, means, and stddev, which stands for standard deviation. In Rust, we make use of structs, and these structs have fields that are of the appropriate type. We need to create a new struct that we’ll call StatisticsResponse and fill it with three f64 values that represent the range, the mean, and the standard deviation before we can make a StatisticsResponse type. Additionally, we need to import the data, distribution, maximum, and minimum values, as well as OrderStatistics, from the statrs library. These are essential in order for us to figure out the statistics we require.
The compute stats function in Rust is responsible for taking a vector of numbers and converting them into a data type that is derived from the statrs crate. In order to enable the computation of some of these statistics, the data have been given the “mutable” attribute. This is going to result in some of the elements in our data structure having a different order after being processed by statrs. In programming languages such as Python, Ruby, and Java, the library will create a copy of the input buffer whenever items need to be rearranged. This does not happen in this case. The opposite strategy is utilized by rust. If there are changes that need to be made behind the scenes, those changes will be communicated to the users. This indicates that if the original order of your data buffer is not absolutely required, you do not have to make any defensive copies at all. If this is the case, then you can save yourself some time. This enables the code to work more quickly with large sets of numbers, such as vectors containing more than a gigabyte of data.

The two most essential points to take away from this piece of writing are that Rust does not have a concept equivalent to “null,” and that the mean function does not directly return a value to the caller. Instead, it returns a value known as an Option, which can be thought of as a container for a variable. The special values known as null values can be assigned to variables of any type. You need to insert checks all over your code if you want it to correctly handle null values. This is absolutely necessary. Rust also provides a match statement, which deals with the scenario in which there is nothing there, in the event that we want to deal with the thing that is contained within the Option. Because an Option is a strongly typed data structure, all necessary checks can be performed in a single location, and an Option can be converted back into a Vec.

Unwrap is a function that can be used to work with Options, and it returns the value (if there is one) if there is. In production code, you should make it a habit to use the appropriate method for handling options by making use of a match statement. If you want to learn more about how to make use of Options, you can do so by reading chapter six of “The Rust Programming Language.”
Both the fact that StatisticsResponse is a Rust function that returns a Rust data type and the fact that we need a Python function that can run from Python and be called from our Flask HTTP handler are among the most significant aspects of this text. In order to accomplish this, we will have to import some additional types from pyo3, which is a Rust crate that enables us to write bindings that connect Python and Rust. To create a Python module that can be imported, in addition to writing a function with the name rstats, which is the same name as our crate, we will need to add an annotation to the top of the function with the name PyModule.

The prelude of pyo3 will automatically grow into a bunch of C code during the compile phase, and then the Python interpreter will be able to read that code and turn it into a module. Because of the way the PyModule macro is defined, we will have to add a couple of parameters to this function that aren’t used together but are required in any case. These parameters are required even though they aren’t used together. The first parameter is denoted by the name Python and indicates that the Global Interpreter Lock (GIL) of the Python interpreter should be taken. The second parameter is denoted by the letter “m,” and it specifies a Python module that is not being used. Within this function, we’ll be adding the new compute stats function that we’ve been working on to the Python module. In addition to this, we will assign the type PyResult as the return value for the PyModule function ().

Because there are a few interesting things happening in this function, we will go over them in a short amount of time. The Result type in Rust is responsible for handling any errors that may occur. It has two paths, one of which is OK and the other of which is an error path. It is strongly typed. In addition to that, it possesses a closed parenthesis, an open parenthesis, and a unit type that is a tuple that is empty. It is possible to assign the unit type to any kind of value; however, it can only be assigned to a variable that is of the same type.
PyResult is a wrapper type that the pyo3 crate uses as its return value, and the error side of its error side is always set to be a pyo3 Python error. Because a result will either state that it worked as intended or that it did not, the unit type has been set to the success side of the spectrum. Only the OK, which represents a successful outcome, and the unit value will be contained within the body of the function. A Python module with the name rstats will be defined by this function, and the Python code will attempt to run it.

Because Python does not have the ability to import a module with the name “rstats,” we need to come up with a solution that not only imports the directory but also the actual Rust code. This is the single most important fact to take away from this piece of writing. In order to accomplish this, we will need to download and operate a program called maturin, which is a tool for software developers. Our Rust code will be compiled, and then Python bindings will be generated for it. If we execute the flask run command, the process will begin normally. We can also add more decorators to our Rust code by putting the pyclass attribute macro on top of our StatisticsResponse struct, the pyo3(get) attribute macro, and the pyfunction annotation. This will change the types of what goes in and what comes out into something that Python can handle.

The steps to make a module definition function, the add function function on the module, the wrap pyfunction macro on the compute stats function, and how to handle errors are the most important parts of this text. Other important parts include the add function function on the module. There is a lot of information about how to use the module definition function, and all of those steps are required. In addition, there is a lot of information about how to use it. We have rewritten the functionality, created the Python bindings, demonstrated how Python should utilize them, and created a Python class that we can use to access our fields. After that, we can recompile our code written in Rust and our code written in Python, and we can also make some modifications to our handler. In the end, we can call the rstats.compute stats method and instruct it to parse our numbers, which are nothing more than the standard numbers that originate from the JSON body of the request.
The most essential information that can be gleaned from this piece of writing is the procedure for importing each field of the StatisticsResponse into Flask’s jsonify function. Because Pyo3 does not automatically generate JSON-deserializable Python classes, we are required to perform manual labor on all of the fields. We use curl to compare the results of our Python handler, the handler that was originally written entirely in Python, and the Python handler that we refactored. According to the findings, the quartiles fields in Python and Rust are distinct from one another. This demonstrates that statistical libraries make use of a variety of approaches when determining how to calculate the quartiles of large data series. To find a solution to this issue, we need to put our heads together and consider the components of our system that are missing. Depending on what our requirements are, there are a few different approaches that we can take to solve this issue.