10

Hello,

I have been learning rust and want to make something exciting so I though why not build a Lisp compiler in Rust B)

so here what I did today. https://gist.github.com/anon2834678263/bcaa06e934f7b478be79203553f170ee

the tokenizer isn't ready and might have horrible bugs but at least I got comfortable declaring immutable variable by default, not surrounding stuff with parentheses unnecessarily. oh and also Enums which is most powerful thing in rust as people say.

I am still not satisfied though since the code looks more like C than Rust xD

maybe some experienced people can correct me :)

top 4 comments
sorted by: hot top controversial new old
[-] TehPers@beehaw.org 3 points 2 days ago* (last edited 2 days ago)

maybe some experienced people can correct me

Well first thing, I'd recommend saving the gist with a .rs extension so we get syntax highlighting :)

You should convert your loop to iterate over input.chars() instead. Your current loop will have issues if someone writes, for example, "naïveté" due to some of those letters being multiple bytes long in UTF-8 (which is the encoding str and String use). What you can do is:

let mut input = input.chars().peekable();
while let Some(ch) = input.next() {
    // ...
}

This also lets you input.peek() in the body to look at the next character without taking it from the iterator. input.peek().is_some() tells you if there's more data (and is_none tells if you're at the end of the input).

Even in C, I'd have made the big if-chain use if-else to avoid evaluating conditions that are known to be false. However, here I'd convert it to a big match statement:

match ch {
    ' ' => {}
    ';' => {
        // ...
    }
    _ if ch.is_numeric() => {
        // ...
    }
    // ...
}

As an optional (and more advanced) thing, you can return slices into the input instead of copies of those slices:

fn tokenize(input: &str) -> Vec<(TokenType, &str)>

If you want to do this, you should do .char_indices() instead of .chars() so you know where to slice the input string at.

Otherwise, you can use std::mem::take(&mut current_token) to replace current_token with an empty string and take (without cloning) the existing value out of that variable:

use std::mem::take;

tokens.push((blah, take(&mut current_token)));
[-] ghodawalaaman@programming.dev 3 points 2 days ago

Thank you kind stranger!

I will take notes and will make these changes. after improving this I will move to create AST which would be more fun! again thank :)

[-] TehPers@beehaw.org 2 points 2 days ago

FYI once you're done you should take a look at some of the parsing libraries out there. Some I'd recommend looking at:

  • pest - grammar based
  • lalrpop - more traditional LR(1) parser generator
  • winnow or nom - parsing combinators, probably the easiest of these to use (and most flexible)
[-] ghodawalaaman@programming.dev 3 points 2 days ago

I just realized that it doesn't work with operators :(

just ignore that, I am aware of the error and will fix that. but the good new is that it works with nested lists :D

also I am checking if ch.is_alphabetic() which clearly won't work with operators like +,-,*,/ etc I wonder how can I fix it. I thought that if a token isn't Number or String it should be a Symbol but dummy me forgot that Symbol can contain special chars not just alphabets huh :(

this post was submitted on 06 Jun 2026
10 points (100.0% liked)

Rust Programming

9286 readers
8 users here now

founded 7 years ago
MODERATORS