From Pythonista to Rustacean

  • #python
  • #rust
  • #tutorial
  • #lifetimes
  • #beginner

Introduction to rust for pythonistas

I’m legally obliged to start this post with an unnecessary introduction. Python is a general purpose, dynamically typed, interpreted language that aims to be both easy to learn and use. Python has support for classes with all the beauty of the OOP paradigm, it features a plentiful ecosystem of packages, its package manager is called pip.

Due to its popularity, there are a lot of tutorials and learning resources. Personally I consider it great for prototyping and learning new concepts.

Rust is a general purpose, statically typed, compiled language It focuses on ergonomics, performance and memory safety.

It features lovely error messages, great documentation, the all-mighty turbo fish (<>::), the glorious match statement and a thriving ecosystem of packages (crates). Its package manager is called cargo.

Rust’s secret sauce is the borrow checker. Rust achieves memory safety by enforcing certain rules, these rules prevent common memory-related bugs like: dereferencing dangling pointers, concurrent reads/writes to the same memory, memory leaks.

Python syntax vs Rust syntax

# python syntax

number = 420
string = "hello there!"
my_list = [0, 1, 2]
my_tuple = ("John", 42)
my_dict = {"alice": 600, "bob": 200}

for i in range(10):
    print(i)

n = 0

while n < 3:
    print("I'm inside a while loop!")
    n += 1

foo = 0

if foo == 1:
    # foo is 1
elif foo < 1:
    # foo is less than 1
else:
    # everything else
// rust syntax

let number = 420;
let string: String = String::from("hello there!");
let my_list = vec![0, 1, 2];
let my_tuple: (&str, i32) = ("John", 42);

let mut my_dict: HashMap<&str, i32> = HashMap::new();
my_dict.insert("alice", 600);
my_dict.insert("bob", 200);

for i in 0..10 {
    println!("{}", i);
}

let mut n = 0;

while n < 3 {
    println!("I'm inside a while loop!");
    n += 1;
}

let foo = 0;

if foo == 1 {
    // foo is 1
} else if foo < 1 {
    // foo is less than 1
} else {
    // everything else
}

// or

match foo {
    1 => {
        // foo is 1
    }
    foo if foo < 1 => {
        // foo is less than 1
    }
    _ => {
        // everything else
    }
}

A Simple Function

In python you can define an add function like this:

def add(a: int, b: int):
    return a + b

Now in rust:

fn add(a: i32, b: i32) -> i32 {
    a + b
}

Since rust is an expression based language we can simply return a + b without the return keyword. Notice that the last statement is missing the semicolon at the end. Early returns still use the return keyword. Let’s take a look to another example.

References and Strings

def get_first_char(string: str) -> str | None: # Optional[str]
    if len(string) == 0:
        return None

    return str[0]
fn get_first_char(string: &str) -> Option<char> {
    if string.len() == 0 {
        return None;
    }

    string.chars().next() // ?
}

Ok so what is &str doing here? &str represents a reference to a string slice. You can think of it as two separate components & means reference and str means string slice. In a few words you can think of a reference as a pointer to the original value.

references

You may be wondering why are we using string.chars().next() instead of string[0] and there’s a good reason for that. Well you see, turns out that some characters take more storage than others.

What do I mean by that, well here’s a good example from the rust book:

String::from("Hola"); // characters: 4, bytes: 4
String::from("Здравствуйте"); // characters: 12, bytes: 24

In the previous code, the first string takes 4 bytes to encode while the last one takes 24 bytes to encode in UTF-8. We can’t index into the string like we we’d do with an array because we may be indexing in the middle of a character.

Instead we need to use the .chars() method to access the characters. For a more in-depth explanation please read this.


Going back to the get_first_char function, we can do better. For example be could use the glorious match statement:

fn get_first_char(string: &str) -> Option<char> {
    match string.chars().next() {
        Some(ch) => Some(ch),
        None => None,
    }
}

Or even better!

fn get_first_char(string: &str) -> Option<char> {
    string.chars().next()
}

Let’s take a look to an example that mutates values inside a function.

In python we can define a User class and a function to change its name like this:

class User:
    name: str
    is_admin: bool

    def __init__(self, name, is_admin) -> None:
        self.name = name
        self.is_admin = is_admin


def change_username(user: User, username: str):
    user.name = username

user = User("Bob", True)
change_username(user, "Cool Bob")

Equivalent code in rust:

struct User {
    name: String,
    is_admin: bool,
}

fn change_username(user: &mut User, username: &str) {
    user.name = username.to_string();
}

// inside main()...
let mut user = User {
    name: String::from("Bob"),
    is_admin: true,
};

change_username(&mut user, "Cool Bob");

Ok looks nice, we know what &str means, but what is &mut doing here?. Same as before & represents a reference, mut indicates that this function mutates the original value of this reference. And &mut User means mutable reference of a User struct.

We can also create a “constructor” function for the User struct like this: You can mentally replace Self with User.

impl User {
    fn new(name: &str, is_admin: bool) -> Self {
        Self { name: String::from(name), is_admin }
    }
}

// main()...
let mut user = User::new("Bob", true);
change_username(&mut user, "Cool Bob");

Functions with Lifetime Annotations

def split_by(slice: str, delimiter: str) -> Optional[List[str]]:
    result = slice.split(delimiter, maxsplit=1)
    return result if len(result) > 1 else None


pair = split_by("snakes & ducks!", "&")
print(pair) # ['snakes ', ' ducks!']
type Pair<'a> = [&'a str; 2];

fn split_by(slice: &str, delimiter: char) -> Option<Pair> {
    match slice.split_once(delimiter) {
        Some((left, right)) => Some([left, right]),
        None => None,
    }
}

let pair = split_by("ducks & crabs!", '&');
println!("{:?}", pair); // Some(["ducks ", " crabs!"])

The python version of split_by splits once the slice by a delimiter. Returning the list if the list contains more than 1 item else returns None. The Optional here means that this function may return None and List[str] means a list of strings (python doesn’t have a char type).

The rust version does virtually the same as the previous function. And everything looks fine until you realize that Pair<'a> has a weird generic parameter <'a>. And why is it after & but before str?

I’m glad you notice it. You can think of 'a as an annotation to indicate how long a reference lives (or how long is valid). Let’s decompose &'a into smaller components. & represents a reference, 'a represents how long is this reference valid. So &'a is a reference that’s valid for the lifetime 'a.

//                v--- reference's lifetime
type Pair<'a> = [&'a str; 2];
//        ^- generic lifetime

If we take a look at the split_once definition, we will find lifetime annotations:

//                 v-- input lifetime         v-- Pattern of lifetime 'a
pub fn split_once<'a, P>(&'a self, delimiter: P) -> Option<(&'a str, &'a str)>
//                                    str reference of lifetime 'a ---^
where // <-- generic constraint
    P: Pattern<'a>,

The generic parameter 'a takes the lifetime of the &str (&'a self) calling the function and basically says: I may or may not return a tuple of &str that live as long as the original &str it came from.

lifetimes

Going back to the example. The reason why we don’t need to annotate every reference with 'a is because of a little thing called lifetime elision. Without it the example would look like this:

type Pair<'a> = [&'a str; 2];

fn split_by<'a>(slice: &'a str, delimiter: char) -> Option<Pair<'a>> {
    match slice.split_once(delimiter) {
        Some((left, right)) => Some([left, right]),
        None => None,
    }
}

Lifetime elision is what allow us to omit the annotations, you can think of it as a set of rules / patterns that the compiler uses to infer lifetimes.

As such the compiler is totally ok without any lifetime annotations (in some cases). With more complex cases the compiler will leave it up to the programmer.

If you’re curious about the lifetime elision consider reading this chapter of the rust book.


For the sake of completion, I’d like to refactor split_by. Instead of match we could simply use map to extract the values of the tuple and return the array.

fn split_by(slice: &str, delimiter: char) -> Option<Pair> {
    slice
        .split_once(delimiter)
        .map(|(left, right)| [left, right])
}

Resources