[00:05] i can upload my g4tv.com crap again :-D [00:17] godane, how many gigs you got left to upload? [00:23] alot [00:23] its in the 100s i know [04:41] spinning up some boxes :) [05:03] wonder if i should increase concurrency [05:03] gr, wrong room [06:18] wow, what is yahoo trying to prove this time [06:19] they are good at destroying data? [06:22] whats happening now [06:28] To help focus our efforts on core Yahoo! product experiences, we will discontinue Yahoo! Message Boards by 31st March 2013 [06:29] hm, what might the core experience...? [06:29] be? [06:35] ads and shutting down products [06:37] wait... *WHAT????* [06:37] are they fucking serious? [06:38] What service is this now? [06:38] there goes about 5 groups i'm subscribed to as well as the old gameboy group which i no longer can access because the admin is awol [06:38] GLaDOS: yahoo is killing their groups/messageboards [06:38] i.e. yahoo mailing lists etc [06:39] it is 60 million urls in google [06:39] NOT THE MAILING LISTS [06:39] yes, those [06:40] but not groups.yahoo.com [06:40] jonas__: wait... hold on [06:40] theyre NOT killing groups.yahoo.com? [06:40] only messages.yahoo.com [06:41] correct [06:41] groups is going next quarter [06:41] * chronomex ducks [06:41] So, in other words, not as much panic for us? [06:41] yahoo.com has 700 million urls according to google site:... [06:42] groups is 400 of that [06:42] talk about surface area [06:42] and messages is 60 [06:42] ok. most of my interest personally is in groups.yahoo.com [06:42] wow. yahoo will have no content by the end of this year anymore except a home page full of ads [06:42] Well, time to start scraping groups URLs.. [06:42] chronomex: i hope you're not serious about groups being killed next quarter [06:43] I have no inside line [06:43] yahoo is yahoo [06:43] however planning for the inevitable fall of groups is not a bad idea [06:43] right [06:44] I've spent some time wandering around groups with an eye towards archiving [06:45] by amount of content answers.yahoo comes first [06:46] geocities, before, had 30 million urls answers has 120 ^_^ [06:46] good thing answers isn't going down [06:47] ..yet [06:47] :D [06:47] i wonder why messages.yahoo isnt listed here http://www.alexa.com/siteinfo/yahoo.com [06:47] oh thank god I still need to find out how babby is formed [06:49] lol just looking at the posts on there. so much spam :( http://messages.yahoo.com/Computers_%26_Internet/threadview?m=tm&bn=17887109&tid=3437&mid=3437&tof=6&frt=2 but this.. lol [06:49] swebb: vincentchu: @archiveteam Check this out: https:/​/​t.co/​IRBwrM6QN5 [less than one minute ago] [06:49] They're watching us... [06:49] haha [06:50] guess they took it off from by request but is there a way to find a snapshot of http://www.alexa.com/siteinfo/yahoo.com ? [06:51] that's really interesitng GLaDOS [06:51] site:answers.yahoo.com viagra is only 0.1million urls, while site:answers.yahoo.com science is 10million so thats a good sign ;) [06:56] (while recent users may be spam more often) # http://www.youtube.com/watch?v=_f9oJikx0-I [06:58] https://twitter.com/vincentchu/status/308467595416334336/photo/1 [06:58] that's nice :D [07:04] Yahoo is taking the google approach to products. The will do a once a year sweep in the spring and that will be it. From a business standpoint if a product is costing money, you should get rid of it. Would we be as upset if as soon as they announced it they had uploaded a full copy to IA, or dropbox or on bittorrent [07:04] I know that sites like stackoverflow and stackexchange spoil us but it took years for ideas like that to evolve. [07:05] stackexchange == gamified, more bragging rights, moderated usenet [07:05] Also I have a spider poking around in messages.yahoo.com to see if I can get a full list [07:08] instead of shuting it down or even uploading it somewhere they could just keep it in place but terminate new messages [07:08] *the ability to post new messages [07:08] only [07:08] Yes, but seemingly it doesn't work that way. [07:22] or they could permanently forward the urls to archive.org after makeing sure they exist there [07:25] in favor of the people who still click links to them in every second... [07:43] i think for legal liability reasons they can't just GIVE us the raw databases [07:43] what if god forbid some cartilage-brained moron posted their bank info there [07:44] or social security number [07:44] etc [08:11] even when its already public and the only differnce that action would make, would be to help archive.org archive the site more efficient/complete? [08:12] more complete [08:12] :D [08:19] i mean i can only imagine that they cant provide the data as a whole, when the service beeing shutdown had TOS that would only allow the messages to be published on its website exclusively and when archive.org crawlers would be locked out from it [08:20] but apparently they agree with archival / archive.org [08:50] This is getting more and more into philosophical fanatsy magic land.. Put it in #archiveteam-bs [08:50] no, not really bs material [08:50] -bs is for random off topic side chats [08:50] not philosophy of archivism [08:52] I'd say it is, when it's not actively contributes to archiving things. Oh well. [08:53] I should really figure out how to wire up the off topic siren in weechat [08:56] i just questioned that there a really legal reasons that they can provide whole data dumps like Lord_Nigh assumed [08:57] if not you have to assume its only that they dont care or lack a relationship with archive.org waybackmachine to just get it down easily along with their yearly shutdown acttivites. [09:02] i assumed not to bother anyone with that wondering, since the ongoing work seems to happen in #preposterus [09:05] You're not being a bother in particular, it's just that this conversation is such a empitome of boringness and has been had countless amounts of time. The next topic you can talk about is "Why is IA not fixing the robots issue?" which would be equally boring [09:07] Let's discuss it more. [09:07] Whichever archive team members don't die of boredom, get to reproduce [09:08] We can also talk about copyright and trademarks! [09:08] And filing standards [09:09] SketchCow: or say "forget irc I'm going to have sex for a while" [09:09] which I suppose is the same thing [13:40] Ehm, guys, Mimi12 is an automatic bot [13:41] would you be so kind get rid of it? [13:41] thank you [13:42] ironically it worked pretty dumb [13:42] eh, damn, forgat to change a setting, brb [13:43] Anyone have the bitly url list already downloaded [13:44] I was thinking you could run a search on that using the NYT green blog base url and have a list of article to pull down since it is going offline [13:44] I was going to download it but 33GB is going to take a while [13:44] omf_: There's the URLTeam dataset. Available over torrent [13:44] Oh, hm. [13:45] I am going to set it to download later today [13:45] I am just throwing the idea out incase anyone else wanted to try something like it [13:45] I might download it and store it on my VPS.. [13:46] So people can access the dataset over HTTP, and grab only chunks of it [13:48] You can already do file selection via the torrent [23:18] So, everyone. [23:18] I'm uploading most of the TOSEC I found [23:18] Everything except late Nintendo is going up [23:39] is there anything we can do to help? [23:43] Yes. [23:43] OK, so shortly I'll have them ALL up. [23:43] All that was in this set. [23:43] But I'd like help to find if I'm missing some sets. [23:43] They claim they're at 3.8t [23:43] I had 100g [23:43] Granted, maybe they mean unpacked. [23:44] twitter killed tweetdeck [23:44] they annonunced it by posting on their posterous blog [23:44] which they are also killing [23:44] hahahahaha [23:44] lololol [23:44] really [23:44] https://tweetdeck.posterous.com/an-update-on-tweetdeck [23:44] perfect. [23:50] omf_ [23:50] i have the torrent downloaded [23:51] what do you want me to run against it? [23:52] http://archive.org/details/tosec [23:52] So if people want to see stuff as I'm adding it, there you go [23:52] New stuff showing up every few minutes [23:54] This is relatively big news in the software world, as far as I'm concerned.