overview for burntsushi

Jiff 0.2.0 is released by burntsushi in c/rust@programming.dev

[-] burntsushi@programming.dev 4 points 1 year ago

As the author of Jiff, I don't like it either. But I didn't see any other feasible way to improve Rust's datetime offering. (The sibling comment links to a more in depth answer.)

It would be better if there was one datetime library. It would be better if chrono had just been done "right" from the start. But it wasn't. So I can either go to the chrono maintainers and say, "please let me, a non-expert in datetimes, have full creative control over the project" or I can go build something on my own and, in the process, become an expert. Otherwise, we stay stuck in our local optimum.

As a fan of xkcd, this is probably least favorite xkcd. On the one hand, yes, it aptly expresses frustration. On the other, it's easy to use as a club against progress itself. Sometimes you need to start fresh to move the needle. Rust is a perfect example of that itself.

uv: Unified Python packaging by burntsushi in c/python@programming.dev

[-] burntsushi@programming.dev 7 points 2 years ago

I'm on the uv team. I am quite partial to this approach as well. Alas, it's difficult culturally to pull this off in a pre-existing ecosystem. And in the case of Python at least, it's not totally clear to me that it would avoid the need for solving NP hard problems. See my other comment in this thread about simplifying PEP 508 marker expressions.

Other than avoiding needing a SAT solver to resolve dependencies, the other thing I like about Go's approach is that it makes it very difficult to "lie" about the dependencies you support. In a maximal environment, it's very easy to "depend" on foo 1.0 but where you actually need foo 1.1 without issues appearing immediately.

uv: Unified Python packaging by burntsushi in c/python@programming.dev

[-] burntsushi@programming.dev 8 points 2 years ago

Interestingly, dependency resolution is not the only NP hard problem uv tries to solve. During development, it also became clear that we needed some way to simplify PEP 508 marker expressions and ask questions like, "are these marker expressions disjoint?"

See: https://github.com/astral-sh/uv/blob/72bd12716225ae48d1e46ec6254d7daf134bdc94/crates/pep508-rs/src/marker/algebra.rs

uv: Unified Python packaging by burntsushi in c/python@programming.dev

[-] burntsushi@programming.dev 5 points 2 years ago

uv 0.3 introduces a cross platform lock file: https://docs.astral.sh/uv/concepts/projects/#lockfile

More precise details on the compatibility of uv pip with pip are documented here: https://docs.astral.sh/uv/pip/compatibility/

Jiff is a new date-time library for Rust that encourages you to jump into the pit of success by burntsushi in c/rust@programming.dev

[-] burntsushi@programming.dev 6 points 2 years ago

You should absolutely not need to handle ISO 8601 and RFC 3339 manually. They are supported via the Display and FromStr trait implementations on every main type in Jiff (Span, Zoned, Timestamp, civil::DateTime, civil::Date and civil::Time). It's technically an implementation of a mixture of ISO 8601, RFC 3339 and RFC 9557, but the grammar is specified precisely by Temporal. See: https://docs.rs/jiff/latest/jiff/fmt/temporal/index.html

Jiff is a new date-time library for Rust that encourages you to jump into the pit of success by burntsushi in c/rust@programming.dev

[-] burntsushi@programming.dev 2 points 2 years ago

The original name I wanted was gigawatt or some variation there of. :)

Jiff is a new date-time library for Rust that encourages you to jump into the pit of success by burntsushi in c/rust@programming.dev

[-] burntsushi@programming.dev 3 points 2 years ago* (last edited 2 years ago)

Is the cache invalidated if system tzdata is updated?

Yes, although at present, there is a TTL. So an update may take "time" to propagate. jiff::tz::db().reset() will force the cache to be invalidated. I expect the cache invalidation logic to get tweaked as we get real experience with it.

And what effect does the answer have on the example from “Jiff supports detecting time zone offset conflicts” if both zoned datetimes used the system timezone which got updated between 1. opening 2. parsing the two zoned datetimes.

It's hard to know precisely what you mean. But once you get a jiff::tz::TimeZone, that value is immutable: https://docs.rs/jiff/latest/jiff/tz/struct.TimeZone.html#a-timezone-is-immutable

New updates to tzdb are only observed when you do a tzdb lookup.

In this section, wouldn’t be more realistic for chrono users to use timezone info around the wire instead of on the wire, rather than using Local+FixedOffset?

That's kinda my point. How do they do that? And does it work with chrono-tz and tzfile? And what happens if tzdb updates lead to a serialized datetime with an incorrect offset in a future update of tzdb? There are all sorts of points of failure here that Jiff will handle for you by virtue of tighter integration with tzdb as a first class concept.

81

Jiff is a new date-time library for Rust that encourages you to jump into the pit of success (github.com)

submitted 2 years ago by burntsushi@programming.dev to c/rust@programming.dev

26 comments fedilink

What are you working on this week? (June. 16, 2024) by burntsushi in c/rust@programming.dev

[-] burntsushi@programming.dev 2 points 2 years ago

How are you doing a date/time library without platform dependencies like libc or windows-sys? Are you rolling your own bindings in order to get the local time zone? (Or perhaps you aren't doing that at all.)

Trying to invent a better substring search algorithm by burntsushi in c/rust@programming.dev

[-] burntsushi@programming.dev 8 points 2 years ago

Disclosure: I'm the author of the memchr crate.

You mention the memchr crate, but you don't seem to have benchmarked it. Instead, you benchmarked the needle crate (last updated 7 years ago). Can you explain a bit more about your methodology?

The memchr crate in particular doesn't just use Rabin-Karp. It also uses Two-Way. And SIMD (with support for x86-64, aarch64 and wasm32).

A comprehensive guide to the dangers of Regular Expressions in JavaScript by burntsushi in c/programming@programming.dev

[-] burntsushi@programming.dev 3 points 2 years ago

Both Perl and Python use backtracking regex engines and are thus susceptible to similar problems as discussed in the OP.

aho-corasick (and thus the regex crate too) now uses SIMD on aarch64 (e.g., Apple silicon) to greatly accelerate some searches by burntsushi in c/rust@programming.dev

[-] burntsushi@programming.dev 17 points 2 years ago

Cross-posting from reddit:

The PR has more details, but here are a few ad hoc benchmarks using ripgrep on my M2 mac mini while searching a 5.5GB file.

This one is just a case insensitive search. A case insensitive regex expands to something like (ignoring Unicode) [Ss][Hh][Ee][Rr]..., which means that it has multiple literal prefixes. In fact, you can enumerate them! As long as the set is small enough, this is something that the new SIMD acceleration on aarch64 can handle (and has done for a long time on x86-64):

$ time rg-before-teddy-aarch64 -i -c 'Sherlock Holmes' OpenSubtitles2018.half.en
3055

real    8.208
user    7.731
sys     0.467
maxmem  5600 MB
faults  191

$ time rg-after-teddy-aarch64 -i -c 'Sherlock Holmes' OpenSubtitles2018.half.en
3055

real    1.137
user    0.695
sys     0.430
maxmem  5904 MB
faults  203

And of course, using multiple literals explicitly also uses this optimization:

$ time rg-before-teddy-aarch64 -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2018.half.en
3804

real    9.055
user    8.580
sys     0.474
maxmem  4912 MB
faults  11

$ time rg-after-teddy-aarch64 -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2018.half.en
3804

real    1.121
user    0.697
sys     0.422
maxmem  4832 MB
faults  11

And it doesn't just work for prefixes, it also works for inner literals too:

$ time rg-before-teddy-aarch64 -c '\w+\s+(Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)\s+\w+' OpenSubtitles2018.half.en
773

real    9.065
user    8.586
sys     0.477
maxmem  6384 MB
faults  11

$ time rg-after-teddy-aarch64 -c '\w+\s+(Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)\s+\w+' OpenSubtitles2018.half.en

773

real    1.124
user    0.702
sys     0.421
maxmem  6784 MB
faults  11

If you're curious about how the SIMD stuff works, you can read my description of Teddy here. I ported this algorithm out of the Hyperscan project several years ago, and it has been one of the killer ingredients for making ripgrep fast in a lot of common cases. But it only worked on x86-64. With the rise and popularity of aarch64 and Apple silicon, I was motivated to port it over. I just recently finished analogous work for the memchr crate as well.

47

aho-corasick (and thus the regex crate too) now uses SIMD on aarch64 (e.g., Apple silicon) to greatly accelerate some searches (github.com)

submitted 2 years ago by burntsushi@programming.dev to c/rust@programming.dev

2 comments fedilink

Leadership change in the Rust Infrastructure Team by burntsushi in c/rust@programming.dev

[-] burntsushi@programming.dev 5 points 2 years ago

Shortly after we resigned, the top-level team leads, project directors to the Foundation, core team members and the new mods got together to form an interim leadership cohort. Sometimes called the "leadership chat." That then evolved into the Leadership Council by way of an RFC on goverance.