11
submitted 22 hours ago by INeedMana@piefed.zip to c/rust@programming.dev

I hope it's ok to ask for some feedback here. Of not, please let me know. The rules did not sound like against it

I'm very new to Rust. And while in general my coding background is ok, Rust still feels alien to me. I think I'm just still at "how to think in Rust" part of the curve.

So I would like to ask here for opinions on the following bit of code. I know that those unwrap are too optimistic for production, and I could figure out how to pass io::Error from the function all the way up to the shell. But what are other choices that I don't see?
What would you write differently? What looks like Pythonisms/C++isms? Or is missing the mark completely?

use std::{fs, io};  
use std::path::PathBuf;  
use std::convert::TryFrom;  

use clap::Parser;  
use parquet::file::reader::SerializedFileReader;  
use parquet::record;  
use csv::WriterBuilder;  

#[derive(Debug, Parser)]  
#[command(version, about, long_about = None)]  
struct Args {  
    dir: String,  
    #[arg(default_value = "0")]  
    count: usize  
}  

fn get_files_in_dir(dir: &str) -> Option<Vec<PathBuf>>  
{  
    let dir = fs::read_dir(dir);  
    if dir.is_err() {  
        return None  
    };  
    let files = dir.unwrap()  
        .map(|res| res.map(|e| e.path()))  
        .collect::<Result<Vec<_>, _>>();  
    if files.is_err() {  
        return None  
    }  
    files.ok()  
}  

fn read_parquet_dir(entries: &Vec<String>) ->  impl Iterator<Item = record::Row> {  
    entries.iter()  
        .map(|p| SerializedFileReader::try_from(p.clone()).unwrap())  
        .flat_map(|r| r.into_iter())  
        .map(|r| r.unwrap())  
}  
                            
fn main() -> Result<(), io::Error> {  
    let args = Args::parse();  
    let entries = match get_files_in_dir(&args.dir)  
    {  
        Some(entries) => entries,  
        None => return Ok(())  
    };  


    let mut wtr = WriterBuilder::new().from_writer(io::stdout());  
    for (idx, row) in read_parquet_dir(&entries.iter().map(|p| p.display().to_string()).collect()).enumerate() {  
        let values: Vec<String> = row.get_column_iter().map(|(_column, value)| value.to_string()).collect();  
        if idx == 0 {  
            wtr.serialize(row.get_column_iter().map(|(column, _value)| column.to_string()).collect::<Vec<String>>())?;  
        }  
        wtr.serialize(values)?;  
        if args.count>0 && idx+1 == args.count {  
            break;  
        }  
    }  
    
    Ok(())  
}  
top 7 comments
sorted by: hot top controversial new old
[-] shape_warrior_t@programming.dev 4 points 8 hours ago

Not very familiar with the libraries, and BB_C has already given a lot of more detailed feedback, but here are some small things:

  • Even without rewriting get_files_in_dir as a single chain of method calls, you can still replace
    if dir.is_err() {  
        return None  
    };
    // use dir.unwrap()
    
    with
    let Ok(dir) = dir else {
        return None;
    };
    // use dir
    
    In general, if you can use if let or let else to pattern match, prefer that over unwrapping -- Clippy has a lint relating to this, actually.
  • Although the formatting doesn't seem horrible, it doesn't seem like Rustfmt was used on it. You might want to make use of it for slightly more consistent formatting. I mainly noticed the lack of trailing commas, and the lack of semicolons after return None in get_files_in_dir.
[-] INeedMana@piefed.zip 2 points 5 hours ago

Thank you very much! TIL about Clippy :D

[-] BB_C@programming.dev 7 points 12 hours ago

Let's do this incrementally, shall we?

First, let's make get_files_in_dir() idiomatic. We will get back to errors later.

fn get_files_in_dir(dir: &str) -> Option<Vec<PathBuf>> {
    fs::read_dir(dir)
        .ok()?
        .map(|res| res.map(|e| e.path()))
        .collect::<Result<Vec<_>, _>>()
        .ok()
}

Now, in read_parquet_dir(), if the unwraps stem from confidence that we will never get errors, then we can confidently ignore them (we will get back to the errors later).

fn read_parquet_dir(entries: &Vec<String>) ->  impl Iterator<Item = record::Row> {
    // ignore all errors
    entries.iter()
        .cloned()
        .filter_map(|p| SerializedFileReader::try_from(p).ok())
        .flat_map(|r| r.into_iter())
        .filter_map(|r| r.ok())
}

Now, let's go back to get_files_in_dir(), and not ignore errors.

fn get_files_in_dir(dir: &str) -> Result<Vec<PathBuf>, io::Error>
{
    fs::read_dir(dir)?
        .map(|res| res.map(|e| e.path()))
        .collect::<Result<Vec<_>, _>>()
}
 
 fn main() -> Result<(), io::Error> {
     let args = Args::parse();
-    let entries = match get_files_in_dir(&args.dir)
-    {
-        Some(entries) => entries,
-        None => return Ok(())
-    };
-
+    let entries = get_files_in_dir(&args.dir)?;
 
     let mut wtr = WriterBuilder::new().from_writer(io::stdout());
     for (idx, row) in read_parquet_dir(&entries.iter().map(|p| p.display().to_string()).collect()).enumerate() {


Now, SerializedFileReader::try_from() is implemented for &Path, and PathBuf derefs to &Path. So your dance of converting to display then to string (which is lossy btw) is not needed.

While we're at it, let's use a slice instead of &Vec<_> in the signature (clippy would tell you about this if you have it set up with rust-analyzer).


fn read_parquet_dir(entries: &[PathBuf]) ->  impl Iterator<Item = record::Row> {
    // ignore all errors
    entries.iter()
        .filter_map(|p| SerializedFileReader::try_from(&**p).ok())
        .flat_map(|r| r.into_iter())
        .filter_map(|r| r.ok())
}
     let entries = get_files_in_dir(&args.dir)?;
 
     let mut wtr = WriterBuilder::new().from_writer(io::stdout());
-    for (idx, row) in read_parquet_dir(&entries.iter().map(|p| p.display().to_string()).collect()).enumerate() {
+    for (idx, row) in read_parquet_dir(&entries).enumerate() {
         let values: Vec<String> = row.get_column_iter().map(|(_column, value)| value.to_string()).collect();
         if idx == 0 {
             wtr.serialize(row.get_column_iter().map(|(column, _value)| column.to_string()).collect::<Vec<String>>())?;



Now let's see what we can do about not ignoring errors in read_parquet_dir().


Approach 1: Save intermediate reader results

This consumes all readers before getting further. So, it's a behavioral change. The signature may also scare some people ๐Ÿ˜‰

fn read_parquet_dir(entries: &Vec<PathBuf>) ->  Result<impl Iterator<Item = Result<record::Row, ParquetError>>, ParquetError> {
    Ok(entries
        .iter()
        .map(|p| SerializedFileReader::try_from(&**p))
        .collect::<Result<Vec<_>, _>>()?
        .into_iter()
        .flat_map(|r| r.into_iter()))
}

Approach 2: Wrapper iterator type

How can we combine errors from readers with flat record results?

This is how.

enum ErrorOrRows {
    Error(Option<ParquetError>),
    Rows(record::reader::RowIter<'static>)
}

impl Iterator for ErrorOrRows {
    type Item = Result<record::Row, ParquetError>;

    fn next(&mut self) -> Option<Self::Item> {
        match self {
            Self::Error(e_opt) => e_opt.take().map(Err),
            Self::Rows(row_iter) => row_iter.next(),
        }
    }
}

fn read_parquet_dir(entries: &[PathBuf]) ->  impl Iterator<Item = Result<record::Row, ParquetError>>
{
    entries
        .iter()
        .flat_map(|p| match  SerializedFileReader::try_from(&**p) {
            Err(e) => ErrorOrRows::Error(Some(e)),
            Ok(sr) => ErrorOrRows::Rows(sr.into_iter()),
        })
}
 
     let mut wtr = WriterBuilder::new().from_writer(io::stdout());
     for (idx, row) in read_parquet_dir(&entries).enumerate() {
+        let row = row?;
         let values: Vec<String> = row.get_column_iter().map(|(_column, value)| value.to_string()).collect();
         if idx == 0 {
             wtr.serialize(row.get_column_iter().map(|(column, _value)| column.to_string()).collect::<Vec<String>>())?;

Approach 3 (bonus): Using unstable #![feature(gen_blocks)]

fn read_parquet_dir(entries: &[PathBuf]) ->  impl Iterator<Item = Result<record::Row, ParquetError>> {
    gen move {
        for p in entries {
            match SerializedFileReader::try_from(&**p) {
                Err(e) => yield Err(e),
                Ok(sr) => for row_res in sr { yield row_res; }
            }
        }
    }
}
[-] INeedMana@piefed.zip 1 points 5 hours ago

Oh wow! Thank you very much for such a deep dive

So try_from(&**p) is not a code smell/poor form in Rust?

[-] BB_C@programming.dev 3 points 3 hours ago

So try_from(&**p) is not a code smell/poor form in Rust?

No. It's how you (explicitly) go from ref to deref.

Here:

  • p is &PathBuf
  • *p is PathBuf
  • **p is Path (Deref)
  • And &**p is &Path.

Since what you started with is a reference to a non-Copy value, you can't do anything that would use/move *p or **p. Furthermore, Path is an unsized type (just like str and [T]), so you need to reference it (or Box it) in any case.

Another way to do this is:

let p: &Path = p.as_ref();

Some APIs use AsRef in signatures to allow passing references of different types directly (e.g. File::open()), but that doesn't apply here.

[-] NGram@piefed.ca 5 points 18 hours ago

Your usage and implementation of get_files_in_dir is quite awkward. It is returning None on errors and then you're returning Ok(()) when it returns None, ignoring a bunch of errors. Why not just have get_files_in_dir return a Result<Vec<PathBuf>, io::Error>? You could drastically simplify the function by just ?ing the results.

[-] INeedMana@piefed.zip 1 points 5 hours ago

Yes, thank you. Mainly, I've noticed this opportunity after I cleaned up the code from all the previous back and forth. I decided to leave it as it was to get an opinion on that. Also, I clearly had wrong understanding about how real Crustaceans use ?

this post was submitted on 02 Feb 2026
11 points (100.0% liked)

Rust

7726 readers
76 users here now

Welcome to the Rust community! This is a place to discuss about the Rust programming language.

Wormhole

!performance@programming.dev

Credits

  • The icon is a modified version of the official rust logo (changing the colors to a gradient and black background)

founded 2 years ago
MODERATORS