I don’t fully understand, what is going on? ELI5???
Why would you train your model with reddit?
is it really more effective to search reddit like “site:reddit.com bla bla” then reddit search itself?
then what this post is about
https://apnews.com/article/reddit-openai-chatgpt-bd2291fcc226bc737a44dbef4a31563f
Whats a good alternative to Reddit? Mastodon?
I’ve been planning to make my own search engine that only returns results from forum sites like reddit. Should I open it up to the public?
(you should register at 404media, they have great content!)
fuck off
yandex.com still works with site:reddit.com
It hasn’t been corrupted by Israel just yet. It is far superior to western junk ad networks. It doesn’t just show the biggest company that relates to the words you entered like Google.
Yes, then we could use DuckDuckGo bangs like “r! webdev” to search reddit
It has an awesome search - if you export the whole fucking thing into a database - otherwise it’s awful atrocious shite
“Reddit Believes in an Open Internet”
proceeds to block any search engine apart from Google…
Not quite. Reddit responds with a different robots.txt file based on certain factors.
“There are folks like the Internet Archive, who we’ve talked to already, who will continue to be allowed to crawl Reddit.”
See https://www.reddit.com/r/redditdev/comments/1doc3pt/updating_our_robotstxt_file_and_upholding_our/
For example, the internet archive will see a different robot.txt file.
One such result at the time of this posting:
https://web.archive.org/web/20240426002810/https://www.reddit.com/robots.txt
Same idea if you visit it from:
From my link to the redditdev sub, “If you are a good-faith actor, we want to work with you, and you can reach us here.”
It looks like this change was pretty recent. I’m guessing other search engines will soon follow.
*discouraged
Technically robots.txt doesn’t block anything
Robots.txt is a suggestion, not a rule. It can be, and is often, ignored.
Bad bots don’t care about robots.txt so …
Yes, they hold copyright, but that doesn’t mean that other people aren’t legally allowed to index them. Search engine indexing and holding of previews has been ruled fair use.
Do all crawlers respect robots.txt files?
Reddit believes in an open internet
LOL
yeah and elon musk believes in free speech
supposed to prevent ai training
Supposed to prevent further AI training
Big players already built their databases