Popular Post mikelove Posted March 2, 2025 at 01:30 PM Popular Post Report Posted March 2, 2025 at 01:30 PM Sorry about that; there seem to have been two causes: 1) A huge spike in traffic that was either a DDOS attack or a bunch of badly-configured bots - this is a small enough site that either one of those could have set it off. 2) The server running out of storage space, due to a combination of 1) and a configuration mistake on my end setting the thing up (some very large log files were not being cleared regularly). We've addressed the configuration issue for #2 - and purged the files - and we moved our DNS over to Cloudflare to #1, which will allow us to both manage traffic spikes better and more effectively block overexcited bots. We can also expand the server to a larger one pretty easily if needed. But basically, unexplained traffic spike + configuration error = sad server. 1 1 5 Quote
黄有光 Posted March 2, 2025 at 02:09 PM Report Posted March 2, 2025 at 02:09 PM Thanks so much for the update! The ongoing enshittification of the internet had me worried that I had returned to the forums just in time to see them mysteriously vanish forever. Glad to see my worries were entirely unfounded. Quote
mikelove Posted March 2, 2025 at 03:08 PM Author Report Posted March 2, 2025 at 03:08 PM Update: it seems like it was the latter case in #1 - a couple of AI bots were making an inordinate number of requests per second and so for the moment I've turned on proactive blocking for those. (literally half of the requests we received in the first hour after going back online were from a single IP address belonging to a large American tech company) I would not presume to decide for the entire site whether to allow AI training - though that's probably a discussion we ought to have - but when they degrade service for everyone else, they become a server administration problem, and at that point it's an easy block. 2 1 Quote
Dawei3 Posted March 2, 2025 at 03:35 PM Report Posted March 2, 2025 at 03:35 PM Human to human exchanges are used to train AI. Redditt recently made lucrative deals regarding this: https://www.reddit.com/r/investing/comments/1gfadkb/reddits_shares_surge_on_ai_data_licensing_deals/ One thing that surprised me is that the email system from Enron (that went bankrupt >20 years ago) is being used to teach many AI systems. The book I read didn't explain why Enron's email was used, but I'm guessing it probably became public during litigation during the bankruptcy. The Wall St Journal noted that the internet is not big enough to train the next version of Chatgpt, because the internet lacks sufficient info. AI is likely now looking in every spot to teach itself. Hence, it's easily understandable that AI wants to use Chinese forums for training. It's too bad the site can't benefit from this and instead is being hurt by it. It's why so many sites now have captchas - to prove the reader is human and deter AI. Quote
Lu Posted March 2, 2025 at 04:05 PM Report Posted March 2, 2025 at 04:05 PM On 3/2/2025 at 4:08 PM, mikelove said: I would not presume to decide for the entire site whether to allow AI training - though that's probably a discussion we ought to have Absolutely not, if you ask me. I do not give permission to have my 20 years' worth of posts used to train any AI. 2 Quote
mikelove Posted March 2, 2025 at 04:37 PM Author Report Posted March 2, 2025 at 04:37 PM It's not entirely possible to stop AI training because this is an open website anybody can scrape - indeed, we want it to be accessible to search engines like Google because they send us traffic. Basically the most we can do with AI specifically is a) ask them not to in robots.txt and b) proactively block them by monitoring behavior and banning any client that looks like an AI bot. After this incident we're now doing both of those things. One other measure we could theoretically take would be to require a login to view posts, but I don't know whether that's good on balance, because it also imposes a burden on people who find their way here through a web search looking for information, a group that I believe most people here would be happy to help. This site is not under a Creative Commons license or in any other legal way structured so that anybody has a *positive* right to train an AI on it - they only have the rights to do it if courts decide that it's fair use (TBD), or if they obtain permission from the copyright holder. I haven't revised Roddy's site T&C and I don't believe in their current form that they require anybody to assign copyright to anything they've posted, and as a matter of principle I would be very much against that anyway, so I don't believe that as currently structured I would have the right to license everybody's content to an AI company even if I wanted to, which I very much don't. 1 Quote
Lu Posted March 2, 2025 at 06:24 PM Report Posted March 2, 2025 at 06:24 PM On 3/2/2025 at 5:37 PM, mikelove said: so I don't believe that as currently structured I would have the right to license everybody's content to an AI company even if I wanted to, which I very much don't. Glad to hear it. I understand the site is in general open to Google and other crawlers, and that is still a good thing, I think (how else would people find us). Quote
Recommended Posts
Join the conversation
You can post now and select your username and password later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.