7
how to optimize this kind of process
(lemmy.eco.br)
Welcome to the Python community on the programming.dev Lemmy instance!
Past
November 2023
October 2023
July 2023
August 2023
September 2023
Whatever you do, usually as long as the data frame fits in memory it should be pretty fast. Depending on functions you're using applymap on splices of columns might be faster but code readability will suffer.
How big is your dataset? If it's huge or your need are complex you'll get way more performance by switching from Pandas to Polars dataframes rather than trying to optimize Pandas operations.
6M rows (it grows by 35K rows at month aprox), 6 columns, after the function it's go to 17 columns and then finally to 9 where I starts to processes. It currently took 8min the pd.read_cvs() and 20min the creation of the columns. I would like to reduce that 20 min process.