[00:25] alard: Around? [00:27] SketchCow: i uploaded the g4 forums [00:28] Thanks [00:28] i still have to get images and everything [00:29] there is also the hided forums that i'm grabing [00:37] Someone help me. [00:37] Why did we decide to go after xanga again? [00:40] Haha [00:40] Good question. Lots of data/small files? [00:40] I'm just wondering. We got 3 terabytes in on it. [00:40] I'm now uploading those. [00:40] I mean, they're perfectly fine grabs. [00:40] But I know we paused. Maybe because we didn't need it or something else? [00:41] Regardless, 3tb on FOS undermines our ability to be reactive to other horseshit [00:41] So I'm emptying FOS of everything I can [00:41] Hmm, wasn't it a proof of concept? I can't remember unfortunally. [00:41] We got 3tb into a proof of concept? [00:41] We're crazy [00:42] We're crazy. [00:42] Because fuck you, that's why [00:42] :-) [00:42] http://i0.kym-cdn.com/photos/images/newsfeed/000/318/627/9b0.jpg [00:43] bwahaha [00:43] Xanga doesn't seem dead, so.. That's great I guess. [00:45] Yeah, no idea what was up. [00:51] my cat did that when he was younger [00:52] he got mad at the paper towel roll on the table [00:52] then we find it every where on the floor the next morring [00:54] ADD Moment: one of the g4 forum members died [00:54] a user named Loki [02:01] We have more than one ongoing project. http://www.archiveteam.org/index.php?title=Current_Projects [02:05] omf_ well, planning to set up a permament warrior as soon I get my new serverparts. :) [02:07] plux_, that is good to hear. We also have long term projects like urlteam as well [02:08] ArchiveTeam has a simple goal but many avenues to it [03:28] plux_: seems like a good idea to work on posterous http://www.archiveteam.org/index.php?title=Posterous#Goal [03:33] anyone here have some experience with https://github.com/internetarchive/warc ? [03:34] arkhive: well, yes. That why I have a warrior working on that. [05:02] this is complicated. [05:04] chronomex, what are trying to do to a warc? [05:04] write a warc from a collection of python HTTPResponse objects [05:05] I'm having trouble composing the appropriate things in warc.py [05:19] * chronomex shrugs [05:19] going to work on this later, no more today [05:43] Odd question: For the Posterous archive project, why not download the spam accounts? It's all a part of the community. Shouldn't we save that too? Might sound stupid, but i think it's a question worth raising. Thoughts? [05:44] And I guess that goes to all websites we save. Too me it's different then a spam/junk email. I don't know... [05:45] at they moment they are being downloaded [05:45] And also not only the spam accounts, but the banned ones as well. (If they are accessible) [05:45] banned ones aren't accessible (they are jsut blank pages) [05:46] the problem is that there are so many spam accounts and we're spending time downloading them, priority should be given to users with actual posts [05:46] Oh. Will the spam ones be deleted before setting up a mirror or place to put them? [05:47] probably not, too hard to sort/filter them [05:55] Oh [05:56] we'd like to download non-spam first [05:56] obviously [05:56] it's hard to sort them by spamminess given that we only have a list of namse [06:02] chronomex, are you writing a python spider with warc support? [06:02] I'm writing a http proxy that dribbles everything to warc [06:02] for personal use [06:02] intend to publish [06:05] chronomex, wouldn't it be faster to just write a plugin for squid to dump the cache at intervals to warc? All the heavy lifting is already done [06:05] heavy lifting in this case means what, exactly? [06:06] also warc kind of requires virgin request/responses, and I suspect that "cached" has been munged [06:11] well lets see. There is handling requests from multiple users, cache checking, purging and locking. Smart url redirection management etc.. How are you going to do cache locking when a page is out of date but current being requested by users [06:11] users in this case == myself [06:11] no caching, just a straight passthrough [06:12] let the browser do caching if he cares [06:14] Oh. I thought you were writing a standalone proxy that could handle many users. [06:15] noper [06:15] @chronomex | for personal use [06:16] reading is a valuable skill, motherfucker [06:19] Is that okay if I run Warrior Posterous [06:19] sure [06:19] poke "go" [06:19] What? [06:19] oh [06:19] yup [06:19] was that a joke? [06:19] or do i need to poke something [06:19] (confused) [06:19] no, just tell it to start [06:19] or however you do that [06:20] poke a button, or whatever [06:20] ah [06:20] my warrior has been acting weird. About a month ago i started doing 2 instances of URL Team [06:21] But only for like an hour.. because i couldn't get it working. Anyway, When I started it up the warrior was still running it ...so I shut it down and started 6 instances of posterous [06:22] but in the current projects nothing has changed. So i am not sure if it is working. [06:26] nevermind. I think I got it. Anyway it's 12:25 here in Colorado and I have to be up at 5:30 in the morning. Goodnight. [06:28] k [06:28] nite [06:34] http://4chandata.org/ [06:34] So if you want to keep any data you should do it now because everything is gonna be destroyed. For any question/proposal, please send me an e-mail at contact@4chandata.org [06:34] WARNING: 4chandata will probably have to close in the weeks to come. Indeed, because of 4chan's reputation and because of its non-filtration, no ad network accepted to put ads on this site. As it hosts more than 400 000 images and about 10 millions messages, the hosting fees are getting too important. [06:44] Anyone contacted them yet? [06:50] joepie92: what happened to joepie91 [06:51] Do we have a form letter to send to sites that are shutting down? [06:51] I thought SketchCow contacted most of the sites [07:02] omf_: It's mostly other people contacting sites that shut down, rather then SketchCow [07:03] chronomex: Yeah, I have some experiance with https://github.com/internetarchive/warc [07:04] chronomex: You could check out my ugly ass piece of shit code at https://github.com/ersi/warcshotter btw [07:09] Yes, it's a terrible idea to have me contact them [07:10] ersi: cool thx [07:12] chanarchive.org has changed their layout awhile back to be like reddit which kind of sux..but they have been archiving 4chan stuff for awhile too [07:12] i wish he would post the size of the data [07:12] i also wonder what kind of legal bs he gets but assumingly not much [07:18] omf_ u going to contact them? [07:18] If GLaDOS doesn't want to. He asked first [07:19] omf_: Go ahead, I'm sure they'll just ignore an email from canadianpharmacyonline@glados.me anyway. [07:21] you really need a new email address. [07:22] okay I am emailing 4chandata [07:24] heres hoping they actually respond [09:28] 4chandata responded. They are going to make us a backup [09:28] I emailed asking for size info and method of transfer [09:29] \o/ [09:30] ᕙ(⇀‸↼‶)ᕗ [09:30] \o/ [09:31] According to the email I got the main issue is copyright crap [09:31] pffff [09:32] yeah like I care about that [09:33] Save all the things [09:33] please do [09:41] ahoy maties [09:42] argggh [09:42] zyphlar, cast off you land lubber :) [09:43] graaargh, me mead is running low [09:43] Looking for a new project? [09:44] nah, i just got warrior running. much nicer than google video [09:44] warrrrioar [09:45] lol is that what it's named after? [09:45] nah [09:45] tonight we dine in THE CLOUD [09:45] * chronomex roars [09:45] THE BUTT [09:46] freeglucosemetersfordiabclick... i will archive you. [09:46] even though dicks don't need glucose meters. [09:47] I think that says click.. [09:48] bad font kerning :) [09:48] Indeed. [09:54] Please go to #preposterus for Posterous talk [10:32] the 4chandata is 104gb [10:32] The guy is wondering if archive.org will take it because of legal concerns [10:33] dude is retarded [10:33] internet archive takes anything [10:33] I will take it. [10:33] om nom nom [10:33] Send him my way if he wants. [10:33] jscott@archive.org if he needs some legit [10:40] At least this guy is very prompt about responding to email [10:46] and there is a constructive dialogue going on. I couldn't really ask for more than that [11:28] hi [12:01] CiaranR: heya [12:01] C-Keen* heya [12:01] ersi: hey! [12:04] I was about to run warrior to help save posterous content, but it says to ask in here first. So this is me asking in here [12:05] marc: There's a possibility that you might get banned from Posterous by running the warrior with that project. If you're fine with that, proceed. [12:05] what do they ban by, IP address ? [12:05] marc: It's however very unlikely you'll get banned these days, since we've taken measures to rate limit and so on [12:05] Yeah, IP. And it's for a few hours to some days. [12:05] that's fine [12:05] thanks [12:06] off it goes [12:06] Feel free to read the FAQ at http://archiveteam.org/index.php?title=Posterous - it got a few bits on this and other stuff :) [12:06] will do [12:06] And join up #preposterus, on the same network - for the project specific channel (that's where all the posterous talk and updates go) :) [12:06] Thanks for joining & welcome around [13:12] Hi, I just spun up an archive team VM to help with the Posterous download. The link in the admin panel says 'Posterous will ban you' and to ask here before starting...what do I need to do? [13:13] read the FAQ and probably nothing else.... http://archivingyoursh.it/Posterous [13:13] Thanks. [13:14] yw! [13:40] Hi! I just set up the Archive Team Warrior and saw the Posterous project said to check in here before starting it. Is there anything I should know? [13:48] Check #preposterus [13:50] Will do [15:47] hey all. :) [16:00] hi [16:00] hullo [18:46] and there is a constructive dialogue going on. I couldn't really ask for more than that [18:46] this makes it a smoother experience than 99% of the archiveteam projects so far [18:46] :P [18:47] joepie92: what happened to joepie91 [18:47] he died :( [18:47] oh look he's back [20:05] https://twitter.com/jarxg/status/311930704495788032 [20:05] \o/ [20:05] Makin' friends [20:05] SketchCow: Hi. [20:08] I like the reply, SketchCow })i({ [20:17] Ubu dude and I are going to work together in mail. [20:17] Now I have to remember why I annoyed alard [20:17] Some of the stuff in your directory was already uploaded. [20:17] Luckily I caught it [20:17] Perhaps it was because you discovered the Xanga directory. [20:17] Well, that might be it. [20:17] Damn, dude, 3tb [20:17] We can make more if you want. [20:19] Hey guys; are you familiar with http://my.opera.com ? [20:19] It's a fair-sized social network run by the people at Opera Software [20:19] does photo-sharing, blogging, forums... [20:20] and word has it, it's going to close down in a few months. [20:22] really? even the support forums? [20:23] I'm not entirely sure how they'll be handled, but I know some form of support forums will exist in future. [20:23] Gnat: I heard that same rumor elsewhere recently [20:23] Not sure whether the same software, databases, etc. will remain [20:23] or whether they'll just re-open a new forum [20:25] There's no restrictions on archive.org indexing the site, so that's good. [20:25] But I'm not sure how comprehensive that covers the content on the site. [20:26] I'm not so confident in Opera Software's capacity for giving users options to export their data [20:32] how does archive.org deal with copyright on non-web materials? [20:34] look up who uses the phrase "no copyright intended" [20:36] Oh, alard - that was the question. Why did we start on Xanga? Why did we stop? [20:40] chronomex: No, I mean, for material that isn't found on the web. [20:42] for example, something like this radio show: http://archive.org/details/afropop_worldwide [20:42] presumably at least some of that music is copyright-encumbered [20:47] SketchCow: We started because you said "Someone look at Xanga", we stopped because you said "COntinue like download it? Not download it. We weren't specifically graing Xanga yet. We just took an assessment. Xanga hasn't shown an explicit death or decay yet." [20:48] SketchCow: I have to admit that I kept the warriors on it for a little bit longer after that. :) [20:49] But yes, we started when someone suggested that Xanga might be aging, to make an assessment of the size. [20:54] Ha ha [20:54] 3tb of "WOAH NELLY" [20:54] (skiiiiiiiiiiiiiiiiiiiiiiiiid) [20:54] Anyway, impressive. [20:54] why not archive the whole thing when we don't have other stuff to do? :P [20:55] (right now we DO have other stuff to do) [20:55] But totally clogging the drive. [20:55] or would that upset IA? [20:56] SketchCow: As far as I'm concerned, feel free to start uploading. The 50GB-directory-making-script should work for Xanga. (And you can run it again later, should you want to, and it will continue filling the current/ directory.) [20:57] At some point I wrote that "the current Xanga estimate is 35TB." [20:58] Already doing it. [20:58] http://archive.org/details/archiveteam_xanga [20:58] HAVE to. [20:58] I need to clear FOS [21:00] if FOS is clogged, we can't react to ANYTHING [21:00] So I'm clearing it out except active projects. [21:09] We're at 7tb of disk space free, so we're in good shape [21:21] so that explans the g4tv not being grabed by you [21:21] FOS is full :P [21:51] Ciao [21:52] I'd like to run a ArchiveTeam Warrior to help with the posterous closing. [21:52] The VM tells me that I should ask here first. [21:52] Do I do something wrong if I just let it run? [21:52] habi: Welcome. More info in #preposterus [21:53] No, the main reason for that warning is that you might get blocked by Posterous. The chance is pretty small, but it could happen. [21:56] OK. I'm getting a bunch of 502 errors intermittent with "Downloaded XX URLs". [21:57] Am I correct in thinking that the message "Tracker rate limiting is in effect. Retrying after 30 seconds..." refers to an overload of the archiveteam server(s), as opposed to an overload of the servers of posterous or any of the other projects? [21:58] No, the tracker is fine, but we limit the number of items it gives out each minute. We don't want to overload Posterous. [21:58] ahh, ok [21:58] What about if it gives that message on other projects? Same idea? [22:03] Yes. If the tracker isn't responding you get a different message. [23:23] hi kids. [23:23] http://www.theverge.com/2013/3/13/4101144/google-shuts-down-reader-rss-aggregation-service [23:23] we can has public feed downloads? [23:24] what the FUCK [23:24] example: http://www.google.com/reader/view/#directory-search/clowns//0 [23:25] I guess that makes it oh so much easier to migrate off of Google now. Let me just get my mail and I'm out [23:26] http://www.google.com/reader/view/#directory-page/0 - the "bundles" specifically (which i admit i never even knew were in there..." [23:26] that first link i sent not sure if it's collected aggregations of rss feeds users have put in (Seems to be), or if there's some index of user-shared feed "bundles" [23:29] so yeah, there is this concept of user-created feed bundles. here, made one: http://www.google.com/reader/bundle/user/03937704522554951353/bundle/Finance%20Test [23:30] which they give embed code for. so there's going to be a lot of broken empty whitespace/errors in websites when they stop rendering that thing. [23:33] here's the full sunsetting list: http://googleblog.blogspot.com/2013/03/a-second-spring-of-cleaning.html [23:37] Fuck Google [23:37] the week I start using Reader, they announce its going away. [23:39] first half of 2013 is a very unfortunate time for the Web... [23:40] Google is becoming Yahoo! [23:40] I LOVED Google Wave [23:40] and they've shut down so many services/websites/communities [23:41] and they got rid of google labs. I feel like the 'Google Spirit' is gone. Now a more traditional corporation. [23:42] at some point someone's going to look at the services and ask "is this making money?", and in the event of a corporation if the answer is no, benevolence on the web unfortunately has a finite shelf life at the whims of profits [23:42] Seriously. Google Labs was awesome. That was one way they were so unique. [23:44] Ya. I get that. My Dad is a CFO of a global manufacturing corporation and he kind of explained that to me. [23:44] 90+ day notice and a well done data export tool is a lot more than the norm we've been seeing. not apologizing for them. [23:44] Ya [23:44] We should've totally done Archive@Home [23:44] But it still sucks at the same time [23:44] yeah but MANY people use this. [23:44] a lot more than used Posterous or Yahoo Messages [23:45] Yeah. I hear you. It just sucks. :P [23:45] balrog_, I doubt your claim reader is bigger than Yahoo Messages. If you could provide some facts to back that statement up [23:45] I wouldn't be surprised if there are more Google Reader users than there were Geocities users [23:45] many people use it with Reeder and other apps [23:45] bigger by what metric [23:45] most people use it to view, not to post. [23:46] I didn't claim. I said "wouldn't be surprised" [23:46] a F@H-style archiving program would be quite cool glados [23:47] but isn't there one already? [23:47] let me check [23:47] Have accounts, leaderboards, etc. [23:47] warrior doesn't count? [23:47] There's Majestic-12, which isn't quite the same, it's a distributed search engine. [23:47] But similar concept. [23:47] Warrior is, well, different. [23:47] Distributed bandwidth kind of thing. [23:47] kind of anyway [23:48] * GLaDOS renames Warrior to Archive@Home [23:48] Example: http://i.imgur.com/GEjauht.png [23:48] Aaaand the homepage: http://www.majestic12.co.uk/ [23:48] Although I do vaguely remember something more similar to what's being suggested from '02 or so [23:48] warrior requires a heavy VM [23:49] F@H is fairly lightweight [23:49] Heh, not if you're running the GPU client :P [23:49] btw an antivirus shouldn't affect what goes into the WARC, does it? [23:49] Warrior legitimately uses less RAM than the Folding@Home GPU core does for some work units. [23:49] Sadly, BOINC only supports AMD and Nvidia GPUs [23:49] Warrior doesn't "require" a heavy VM, it's just the chosen implementation. there's nothing stopping someone from doing a similar functionality in a lightweight win* client [23:49] * GLaDOS has yet to buy a GPU, uses an embedded Ivy Bridge [23:50] the vm is because it's easier to provide a consistent environment [23:50] * wp494 already has a 660 ti [23:50] I know, but why can't that portion be rolled into a lightweight handler process? that might require a ton of work though [23:51] did I hear you just volunteer? ;) [23:52] no time :( [23:52] at least not right now [23:54] is there any way run a VM entirely in a browser at this point? A lightweight hypervisor (java? html5?) that can pull down an OVF and run it right there? click this link, boom, you're running warrior? [23:54] I'm going to get a Titan when I can afford it, which should be soon. [23:54] Graphics Card Power (W): 250 [23:54] That's less than the 690.. [23:56] http://bellard.org/jslinux/ [23:56] presto? [23:58] that's a PC emu in and of itself i know, but can someone make the underlying warrior tasks push down in the base disk image and run? [23:59] looks like it doesnt emulate any network stack at the moment