From Pythonista to Rustacean
Introduction to rust for pythonistas
I’m legally obliged to start this post with an unnecessary introduction.
Python is a general purpose, dynamically typed, interpreted language that aims to be both easy to learn and use. Python has support for classes with all the beauty of the OOP paradigm, it features a plentiful ecosystem of packages, its package manager is called pip
.
Due to its popularity, there are a lot of tutorials and learning resources. Personally I consider it great for prototyping and learning new concepts.
Rust is a general purpose, statically typed, compiled language It focuses on ergonomics, performance and memory safety.
It features lovely error messages, great documentation, the all-mighty turbo fish (<>::
), the glorious match
statement and a thriving ecosystem of packages (crates). Its package manager is called cargo
.
Rust’s secret sauce is the borrow checker. Rust achieves memory safety by enforcing certain rules, these rules prevent common memory-related bugs like: dereferencing dangling pointers, concurrent reads/writes to the same memory, memory leaks.
Python syntax vs Rust syntax
# python syntax
number = 420
string = "hello there!"
my_list = [0, 1, 2]
my_tuple = ("John", 42)
my_dict = {"alice": 600, "bob": 200}
for i in range(10):
print(i)
n = 0
while n < 3:
print("I'm inside a while loop!")
n += 1
foo = 0
if foo == 1:
# foo is 1
elif foo < 1:
# foo is less than 1
else:
# everything else
// rust syntax
let number = 420;
let string: String = String::from("hello there!");
let my_list = vec![0, 1, 2];
let my_tuple: (&str, i32) = ("John", 42);
let mut my_dict: HashMap<&str, i32> = HashMap::new();
my_dict.insert("alice", 600);
my_dict.insert("bob", 200);
for i in 0..10 {
println!("{}", i);
}
let mut n = 0;
while n < 3 {
println!("I'm inside a while loop!");
n += 1;
}
let foo = 0;
if foo == 1 {
// foo is 1
} else if foo < 1 {
// foo is less than 1
} else {
// everything else
}
// or
match foo {
1 => {
// foo is 1
}
foo if foo < 1 => {
// foo is less than 1
}
_ => {
// everything else
}
}
A Simple Function
In python you can define an add function like this:
def add(a: int, b: int):
return a + b
Now in rust:
fn add(a: i32, b: i32) -> i32 {
a + b
}
Since rust is an expression based language we can simply return a + b
without the return keyword. Notice that the last statement is missing the semicolon at the end.
Early returns still use the return
keyword. Let’s take a look to another example.
References and Strings
def get_first_char(string: str) -> str | None: # Optional[str]
if len(string) == 0:
return None
return str[0]
fn get_first_char(string: &str) -> Option<char> {
if string.len() == 0 {
return None;
}
string.chars().next() // ?
}
Ok so what is &str
doing here? &str
represents a reference to a string slice.
You can think of it as two separate components &
means reference and str
means string slice. In a few words you can think of a reference as a pointer to the original value.
You may be wondering why are we using string.chars().next()
instead of
string[0]
and there’s a good reason for that.
Well you see, turns out that some characters take more storage than others.
What do I mean by that, well here’s a good example from the rust book:
String::from("Hola"); // characters: 4, bytes: 4
String::from("Здравствуйте"); // characters: 12, bytes: 24
In the previous code, the first string takes 4 bytes to encode while the last one takes 24 bytes to encode in UTF-8. We can’t index into the string like we we’d do with an array because we may be indexing in the middle of a character.
Instead we need to use the .chars()
method to access the characters. For a more in-depth explanation please read this.
- Fun fact:
chars()
returns achar
iterator
Going back to the get_first_char
function, we can do better.
For example be could use the glorious match statement:
fn get_first_char(string: &str) -> Option<char> {
match string.chars().next() {
Some(ch) => Some(ch),
None => None,
}
}
Or even better!
fn get_first_char(string: &str) -> Option<char> {
string.chars().next()
}
Let’s take a look to an example that mutates values inside a function.
In python we can define a User
class and a function to change its name like this:
class User:
name: str
is_admin: bool
def __init__(self, name, is_admin) -> None:
self.name = name
self.is_admin = is_admin
def change_username(user: User, username: str):
user.name = username
user = User("Bob", True)
change_username(user, "Cool Bob")
Equivalent code in rust:
struct User {
name: String,
is_admin: bool,
}
fn change_username(user: &mut User, username: &str) {
user.name = username.to_string();
}
// inside main()...
let mut user = User {
name: String::from("Bob"),
is_admin: true,
};
change_username(&mut user, "Cool Bob");
Ok looks nice, we know what &str
means, but what is &mut
doing here?.
Same as before &
represents a reference, mut
indicates that this function mutates the original value of this reference. And &mut User
means mutable reference of a User
struct.
We can also create a “constructor” function for the User
struct like this:
You can mentally replace Self
with User
.
impl User {
fn new(name: &str, is_admin: bool) -> Self {
Self { name: String::from(name), is_admin }
}
}
// main()...
let mut user = User::new("Bob", true);
change_username(&mut user, "Cool Bob");
Functions with Lifetime Annotations
def split_by(slice: str, delimiter: str) -> Optional[List[str]]:
result = slice.split(delimiter, maxsplit=1)
return result if len(result) > 1 else None
pair = split_by("snakes & ducks!", "&")
print(pair) # ['snakes ', ' ducks!']
type Pair<'a> = [&'a str; 2];
fn split_by(slice: &str, delimiter: char) -> Option<Pair> {
match slice.split_once(delimiter) {
Some((left, right)) => Some([left, right]),
None => None,
}
}
let pair = split_by("ducks & crabs!", '&');
println!("{:?}", pair); // Some(["ducks ", " crabs!"])
The python version of split_by
splits once the slice by a delimiter. Returning the list if the list contains more than 1 item else returns None
.
The Optional
here means that this function may return None
and List[str]
means a list of strings (python doesn’t have a char
type).
The rust version does virtually the same as the previous function. And everything looks fine until you realize that Pair<'a>
has a weird generic parameter <'a>
. And why is it after &
but before str
?
I’m glad you notice it. You can think of 'a
as an annotation to indicate how long a reference lives (or how long is valid).
Let’s decompose &'a
into smaller components. &
represents a reference, 'a
represents how long is this reference valid.
So &'a
is a reference that’s valid for the lifetime 'a
.
// v--- reference's lifetime
type Pair<'a> = [&'a str; 2];
// ^- generic lifetime
If we take a look at the split_once
definition, we will find lifetime annotations:
// v-- input lifetime v-- Pattern of lifetime 'a
pub fn split_once<'a, P>(&'a self, delimiter: P) -> Option<(&'a str, &'a str)>
// str reference of lifetime 'a ---^
where // <-- generic constraint
P: Pattern<'a>,
The generic parameter 'a
takes the lifetime of the &str
(&'a self
) calling the function and basically says: I may or may not return a tuple of &str
that live as long as the original &str
it came from.
Going back to the example. The reason why we don’t need to annotate every reference with 'a
is because of a little thing called lifetime elision.
Without it the example would look like this:
type Pair<'a> = [&'a str; 2];
fn split_by<'a>(slice: &'a str, delimiter: char) -> Option<Pair<'a>> {
match slice.split_once(delimiter) {
Some((left, right)) => Some([left, right]),
None => None,
}
}
Lifetime elision is what allow us to omit the annotations, you can think of it as a set of rules / patterns that the compiler uses to infer lifetimes.
As such the compiler is totally ok without any lifetime annotations (in some cases). With more complex cases the compiler will leave it up to the programmer.
If you’re curious about the lifetime elision consider reading this chapter of the rust book.
For the sake of completion, I’d like to refactor split_by
.
Instead of match
we could simply use map
to extract the values of the tuple and return the array.
fn split_by(slice: &str, delimiter: char) -> Option<Pair> {
slice
.split_once(delimiter)
.map(|(left, right)| [left, right])
}