121

Youtube seems to be blocking access to a seriously large amount of publicly listed videos (lemmy.blahaj.zone)

submitted 1 day ago* (last edited 9 hours ago) by Lyra_Lycan@lemmy.blahaj.zone to c/piracy@lemmy.dbzer0.com

42 comments fedilink hide all child comments

I dont know what to think, really.

The Dekaif channel has 434 videos, but YouTube is only showing 275 to clients, whether logged in or not, whether yt-dlp or official access.

This isn't the first channel I've witnessed this, and weirder stuff, on. Another example is this video - "Belt" meme - it is accessible on Grayjay, yet not on YouTube, meaning (I think) that publicly shared videos are being deindexed, and yet they are still hosted.

You used to be able to take the video code from the URL (everything after '?v=' and before '&') and get the exact video in search results. Not now. The second YouTuber, Sparky, has 35 uploads, only 9 of which are visible. And I can attest that at least one of the remaining 26 is hosted, but invisible. I don't even know how it came up using Grayjay but not YouTube or Revanced.

Basically, there's a TON of shady underhanded shit happening at YTHQ and everyone needs to jump ship to Odysee, Peertube or some platform that won't be clogged with AI. This is bad for everyone.

I'm posting it here mainly because I verified my findings with yt-dlp, and this new bs is successfully thwarting my attempts to archive.

3rd Oct edit: I am seeing massive differences in indexed videos versus archived videos. I am currently aggregating but the definitely affected videos range from 10% to 50%

you are viewing a single comment's thread
view the rest of the comments

[+] riskable@programming.dev -19 points 1 day ago

It's ok: Google and all other ad-supported search is about to go the way of the dinosaur as soon as local AI search catches on. When your own PC runs a search for you, it basically googles on your behalf and you never see those ads.

It's going to change everything.

[-] fubbernuckin@lemmy.dbzer0.com 10 points 1 day ago

It's not going to change everything. Why would you ever use an LLM for anything information related ever? I can make up wrong answers just as fast as it can.

I really hope that this is a joke and I'm making a fool of myself.

[-] riskable@programming.dev 3 points 21 hours ago* (last edited 21 hours ago)

Google search: "scientific articles about (whatever)" Then you get tons of ads and irrelevant results.

LLM search: "Find me scientific articles about (whatever)" Then you get just the titles and links (with maybe a short summary).

It's 100% better and you don't have to worry about hallucinations since you it's wasn't actually trying to find an answer... Just helping you perform a search.

[-] Coopr8@kbin.earth 1 points 1 day ago

You're joking right? "making up answers" in the case of search results just means a dead link. If you get a good link 99% of the time and don't have to use an enshitified service, that's good enough for 99% of people. Try again is the worst case scenario.

[-] fubbernuckin@lemmy.dbzer0.com 1 points 19 hours ago

Finding search terms is the one task I consistently use LLMs for. They did not say that though, they said replacing traditional search with LLMs, that traditional search is about to "go the way of the dinosaur". I dont trust any local LLM to accurately recall anything it read.

Not to mention that once we gain dependence on LLMs, which is something big tech is trying really hard to achieve right now, it will not be all that difficult for the creators to introduce biases that give us many of the same problems as search engines. Product placement, political censorship, etc. There would not be billions of dollars in investment if they thought they weren't going to get anything out of it.

[-] Coopr8@kbin.earth 1 points 14 hours ago

(the best) Local LLMs are FOSS though, if bias is introduced it can be detected and the user base can shift away to another version, unlike centralized cloud LLMs that are private silos.

I also don't think LLMs of any kind will fully replace search engines, but I do think they will be one of a suite of ML tools that will enable running efficient local (or distributed) indexing and search of the web.

[-] fubbernuckin@lemmy.dbzer0.com 1 points 4 hours ago

First of all, they are not FOSS. I know it seems tangential to the discussion, but it's important because biases cannot be reliably detected without the starting data. You should also not trust humans to see bias because humans themselves are quite biased and will generally assume that the LLM is behaving correctly if it aligns with their biases, which can be shifted in various ways over time, too.

Second, local LLMs don't have the benefit of free software where we can modify them freely or make forks if there are problems. Sure, there's fine tuning, but you don't get full control that way, and you need access to your own tuning data set. We would really just have the option to switch products, which doesn't put us much further ahead than using the closed off products available online.

I'm all for adding them to the arsenal of tools, but they are deceptively difficult to use correctly, which makes it so hard for me to be excited about them. I hardly see anyone using these tools for the purposes they are actually good for, and the things they are good for are also deceptively limited.

[-] prole@lemmy.blahaj.zone 2 points 19 hours ago* (last edited 19 hours ago)

Yeah, no thanks. I'll pass.

[-] floofloof@lemmy.ca 10 points 1 day ago* (last edited 1 day ago)

Someone would have to pay for the API calls though. And that tends to mean either pay a subscription or view ads. There's no technical reason your local LLM couldn't call a search engine's API to give you an ad-free search experience, and in fact you don't need an LLM to run a local ad-free search frontend. But there is a commercial reason, namely that whoever runs the search engine API will want payment. It would be some progress to have an ad-free search subscription, but it wouldn't get around all the megacorp fuckery that decides what search results you get.

[-] gravitywell@sh.itjust.works 3 points 1 day ago* (last edited 1 day ago)

APIs are the compromise that sites have to make if they dont want the much more reasource heavy scrapping methods used.

The most they could do is rate limit IP addresses, and that doesbt work too well when jts individual users who can just request a new IP any time

[-] Coopr8@kbin.earth 2 points 1 day ago

Not to mention that the scraped indexes can and should be shared. Unfortunately what OP is seeing may be a move to thwart this type of brute force scraping, and might resolve as dynamically assigned domain addresses, where the URL of a set object is temporarily assigned and streamed only to a single or group of IP addresses that request it within a given timeframe before being rotated out until found in search again and then reassigned a new URL, etc. This is a frankly stupid use of resources, but can effectively be used to prevent crowdsourced indexes from proliferating, and to punish IPs or even MAC addresses or browser fingerprints associated with downloading and reuploading videos which almost certainly have stegnographic fingerprinting embedded that associate with who the video was served up to at the time it was downloaded.

[-] Coopr8@kbin.earth 5 points 1 day ago

Also, you know what would make this all even worse? Laws requiring that people prove their identity in order to consume content or pull videos... just like age verification laws now being passed in several countries. What a coincidence.

[-] Lyra_Lycan@lemmy.blahaj.zone 1 points 20 hours ago

I agree with local search, but I prefer more of a traditional algorithm-based search to generative AI. A solution I've seen (that is far more attainable than building your own search engine) is hosting a metasearch engine, which collates results from search engines, based on your own preferences of results. Or perhaps using someone else's established server if their preferences align with yours. Localised (on-device) search will be a gamechanger in many ways, but I believe a meaningful version of that is far off and potentially impractical to implement.