[-] apparia@discuss.tchncs.de 2 points 1 hour ago

Yeah, even though I have a bit of background I can't really make heads or tails of that OpenSearch doc at a glance, it's dense stuff.

In my experience knowing the keywords to stick in a search engine is often half the battle; there are plenty of resources out there on "vector databases". "Semantic search" from the lede of the OpenSearch doc might be another good one to have around.

Feel free to ask me any other questions and I can try to answer to the best of my abilities, though again, not an expert and honestly I've never actually used these myself beyond toy examples.

[-] apparia@discuss.tchncs.de 2 points 2 hours ago* (last edited 2 hours ago)

I'm not an expert, but it sounds like you want an embedding+vector database. This essentially extracts the part of an LLM that "understands" (loaded term, note the quotes) the text you put in, and then does a lookup directly on that "understanding", so it's very good at finding alternate phrasings or slightly differing questions.

There's no actual text generation involved, and no need to retrain anything when adding new questions.

OpenSearch has an implementation (which I learned about just now while writing this comment and thus cannot vouch for); you could start there.

48

I have never had a LinkedIn account, both out of general anti-data-vacuuming-social-media, and specifically anti-whatever-the-fucking-corphead-psychos-are-doing-on-LinkedIn tendencies, and managed to find a decent job out of uni just fine (software field). I'm now looking for a job again and the number one piece of advice I'm being given by concerned parties is "get on LinkedIn".

I'm curious how many people into the whole "privacy" thing have had to make this choice, and which way you went with it.

Do the advantages (which it seems mainly boil down to "networking") outweigh the icky feeling I'd get making an account? Of course only I can actually answer that question, but it sums up my conundrum.

[-] apparia@discuss.tchncs.de 2 points 1 week ago

No mention of how it compares to existing spatial indexing methods such as R[*]-trees. That was my first thought reading the article, but they only give a comparison to naïve NxM testing. I assume this method is still an improvement in the presence of sharding, but doubt it's the 400× quoted.

apparia

joined 1 week ago