[00:01] Can I help? [00:02] dashcloud, we also have an etherpad http://pad.archivingyoursh.it/ [00:03] great! [00:04] Also, it looks like the Posterous tracker hasn't processed anything new for hours. [00:07] want a 1 gb internert speed in 1 second to help? [00:08] This is possible [00:09] anyone is interested in this connection? [00:09] n00b, what do you want to help with? [00:10] Raw speed doesn't seem to be an issue, but someone else feel free to correct me if I'm wrong. [00:13] n00b928: I'm not a regular here, but I think you could go get the Warrior, let it do the Archiveteam's choice. [00:15] There's even a video http://youtu.be/_nzD-QpmePE [00:16] hi, anyone who wants to add urls for nwnet.co.uk , here's an etherpad: http://pad.archivingyoursh.it/p/nwnet.co.uk [00:20] Does anyone have a good way to take a url and crawl that looking for other nwnet links on it? [00:21] Like I can generate http://www.nwnet.co.uk/abbey/ for example my script found. [00:33] So, is upcoming done? Any point in keeping my upcoming pipeline running? [00:39] SketchCow, here is another item for the ftp collection 34gb - http://archive.org/details/ftp.ea.com_2012-05-06 [01:03] NSFW: http://www.nwnet.co.uk/blue/ [01:04] omf_: Added [01:05] Will I still be able to upload more files to that item [01:05] I am doing a 2013 ftp grab [01:06] Yes [01:06] Well, sort of [01:06] Yes [01:06] For your purposes, yes [01:08] collection=software&mediatype=ftpsites <-- wrong way round [01:08] Did I do that [01:09] <3 [01:09] Fixed [01:28] Oh man, these nwnets are AMAZING [01:28] A M A Z I N G [01:29] nwnets? [01:31] This is a site I pulled down a while back which is another old ISP that has some cool pages - http://archive.org/details/userpages.monmouth.com [01:35] SketchCow: There are some really cool ones! [01:36] http://www.nwnet.co.uk/craig/ The design on this is so modern compared to many others! [01:36] The reviews of IRC and ICQ clients on here http://www.nwnet.co.uk/cooper/irc.htm is awesome. [01:37] reminds me of this page http://www.techniqueretreat.com/ which rotates color schemes [01:38] Yeah, I can see that. [01:39] How does the warrior work? Do you feed it URL's and then it scrapes or do you need to feed it urls and it just does a wget (or similar) on them? [01:40] ohhhh man pirch 98 [01:47] holy 90s page design [01:48] http://forum.playfish.com/showthread.php?t=2968222 [01:50] the game has 10 million likes [01:59] I'm going to let this bad boy run over night. Hopefully, when I wake up tomorrow I'll have a few hundred URLs for us to use. [03:30] Is there a go-to process for grabbing a single Deja group? [03:33] hah shaqfu. i was just thinking of that game the other day [03:34] I want to poke through a few NGs, but GGroups interface is so abhorrent that I'd rather just grab+grep [03:37] I have been working on gg for a while [03:37] for scrape and save [03:39] omf_: what's your process to grab a group? [03:41] the problem is getting banned [03:41] they have good detection [03:41] omf_: even for small tactical grabs? [03:42] 5-7 pages of results before they notice and that is witch a scripting language. I have moved on to automating browsers and so far it looks promising [03:55] I wonder if various web CEOs get together and complain about SketchCow [04:00] shaqfu, what group are you looking to get [04:00] omf_: A bunch of old MtG ones [04:01] They might already be archived. I have found pockets of that all over [04:01] let me check around [04:01] Yeah, it showed up in a lot of alt.recs [04:01] but I can run my software on actual group names if you got them [04:01] And they're the only source on the game's very early development [04:22] I'm stuck in "The warrior is beginning work on a project" when using the ArchiveTeam's Choice and a blank "Current project" page. Is this expected? [04:23] wow, communication skills. I meant to say that my current project is blank when using archiveteam's choice [04:25] select posterous and set the world afire [04:37] Same issue here... Problems connecting to tracker? [04:40] "archiveteam's choice" isn't working, but if you select a project (posterous seems like the best) it will get to work [04:41] of course all of my threads are hitting the rate limit [04:57] SketchCow: dashcloud, et al, anyone working on nwnet [04:57] http://archive.org/~abuie/nwnet.txt [04:57] Those are all the nwnet top-level things we know about [04:57] (IA) [04:57] [extracted from the cdx hadoop cluster] [04:58] nice [04:59] looks like it has dupes [05:00] apparently capitalization doesn't matter to nwnet [05:00] Which threw off my sort -u [05:00] omf_: Fixed dupes [05:01] also added http://archive.org/~abuie/nwnet_all.txt [05:02] underscor, once you strip off the port number, there are still dupes [05:02] :80 is the default [05:03] Fixed again [05:03] Also, I know. That's just how wayback stores them [05:03] yeah I was trying to fix it on my local copy and it took me a minute to realize the line ending was tripping me off [05:04] CR instead of LF [05:08] i'm seeing about grabing click online urls [05:08] by using google to do it [05:12] now i get the continue urls [05:12] but i think i got 1000 urls [05:18] gcr, TeeCee: Still got problems with the ArchiveTeam's Choice in the Warrior? [05:26] ersi: Dunno... I have changed to that project now, so we'll see when the current items are done.. :) [05:33] TeeCee: Alright :) [06:15] Who added 600k of accounts to the Posterous tracker :P [06:17] we need a way to get spreecast [06:18] spreecast is used for the blazecast shows from the editors of theblaze [06:18] it also has a archived chat in real time [06:19] it plays back the chat in real time [06:19] http://www.spreecast.com/users/scott--50 [06:20] maybe a good idea to make tools to grab this stuff [07:05] What's the highest priority project right now? [07:05] Posterous? [07:09] My Warrior is told to work on Formspring... [07:25] i'm still seeing tracker rate limiting. is that indicative of being blocked by posterous? [07:26] Either that or posterous asked us to slow down again [07:30] just going off of SketchCow's tweets, it seems that they've started blocking. i switched back over to help out and seeing the rate limiting stuff like before. [07:31] is it possible to use a http_proxy for a warrior ? [07:34] as far as i can tell i'm not blocked but the tracker is rate limiting me [07:35] not sure if blocked hosts are eating up slots or something [07:35] i'll leave mine running as i head to bed [08:31] lukeman: Rate-limiting messages comes from the ArchiveTeam Tracker. [09:09] href: yes, the options are shown if you hit the advanced button [09:10] eh I feel stupid now, i looked at the advanced button. maybe not enough :p thanks! [09:12] hmm I looked again and there is nothing about http proxies [09:14] Since when do we have http_proxy settings under advanced? :o [09:15] you don't :p [09:15] Oh wait [09:15] thats for setting a password and username [09:15] doh :< [09:16] And that's your introduction to Smiley ;D [09:17] :-D [09:51] 'k apparently configuring the proxy for wget (in ~/.wgetrc) is enough. cool :) [09:53] :) [09:54] neat :) [10:10] here we go, four more warriors :) [12:05] I've updated http://pad.archivingyoursh.it/p/nwnet.co.uk with about 40 more. I have a meeting tonight, but after work and the meeting, I hope to hack more on my script and improve it's collection ability. [14:13] hmm posterous is heavily rate limited? [14:14] anyone know of a channel where archive.org folk hang? [14:17] closure: #internetarchive if anything [14:17] oh, on this network. suprised [14:18] looks a lot like this channel :P [14:18] Yes, but it's a dedicated channel for IA conversations [14:19] flaushy: Yes. Also #preposterus - project channel :) [14:19] yeah, suddenly realized ;) [14:20] ^_^ [14:41] uhm... http://arstechnica.com/business/2013/04/yahoo-yanks-saturday-night-live-from-hulu/ [15:19] moving off of hulu is a relative good thing. when archived videos were made available it was only for short periods of time. i would often find links to videos that had been pulled from hulu. [15:20] lots of folks who work at snl have mentioned they have the entire archive of sketches easily available internally all tagged up and with video from both air and dress rehearsal. maybe this is a step towards more of that being exposed. [15:23] Moving off from Hulu into Yahoo is not a good thing. [15:24] It means Yahoo wants to continue to be a content hub [15:24] Or [15:24] OR [15:24] People could just get tired of watching that shit [15:31] SketchCow: posterous needs attention [15:31] I have a full window of Tracker rate limiting is in effect. Retrying after 30 seconds... [15:31] welp, so far still banned [15:31] ill try creating a new username/password and verifing the email later [15:31] also, it appears that AT requests go to five posterous servers while other requests go to the rest of the servers, not including those five [15:34] We need alard or one of the people running the tracker. I do not run the tracker. [15:35] Do we have a nwnet channel? [15:35] There's a second domain name for these folks, we have more to scan. [15:35] http://telinco.co.uk/ [15:36] http://www.telinco.co.uk [15:36] doesn't work without www :/ [15:39] Right [15:39] Searching google with that gets some of the stuff. [15:43] I am creating a nwnet wiki page [15:43] someone pick an irc channel name please [15:44] #nwnyet [15:55] http://www.archiveteam.org/index.php?title=Nwnyet [15:55] For the record, generally we make the wiki page the real name [15:56] Maybe a reference to the funny name [15:56] And of course a link to the IRC [15:56] irc is in the template [15:56] Right [15:56] I'm saying that normally we have the normal name, and we reference the funny name and the IRC channel, which has the funny name. [15:57] got it [15:58] (based on the google index) all nwnet.co.uk sites are subdirectories, while many telinco sites are subdomains [15:58] fixed it with a redirect [15:58] http://www.archiveteam.org/index.php?title=Nwnet_telinco [17:51] SketchCow: you alive? [17:51] Need to talk re:posterous - why have we so heavily limited ourselves? [17:52] I don't know, frankly [17:53] ..... [17:53] Orders to raise the limit sir? [17:54] No one has left any messages in any of the channels explaining it, I can only guess someone fudged typing 100 or 1000 and hit 10 accidently. [17:55] Can you do it? [17:55] Turn that fucker up [17:56] SIR YES SIR! [17:56] Set to 1000, watching now [17:56] 55 9737 100 [17:56] 56 4977 1500 [17:56] as I said, it appears that requests with the archive team UA are hitting FIVE servers at posterous [17:56] 54 9968 100 [17:56] Minute/requested/granted [17:57] So my "Archive Team's Choice" stuff is still going to URLTeam... I gather we can't hit Posterous any faster? [17:57] ussjoin: we can try... [17:57] choose it manually in the projects list [17:58] Tried that and got AT rate limited. [17:59] I've *just* increased the rate by 10000* [17:59] I've *just* increased the rate by 10000% [17:59] the rate limit is still in effect [17:59] Smiley: I know you just did, and I'm just trying it. And still, AT Tracker Rate Limiting. [18:00] take a look at http://tracker.archiveteam.org/posterous/ though [18:00] btw, are these all with the AT user agent? [18:00] balrog: I don't know whats happening with the AT agent atm [18:00] so i got more episodes of tekzilla being upload right now [18:00] and yes, teh tracker shows k killing it as normal? [18:00] mine just picked up a task for the first time [18:00] we are pushing out 1000 grants a minute now [18:01] I expect some nice love notes from posterous shortly [18:01] yah [18:02] SketchCow: it looks like posterous has five servers set up for us [18:02] rather than just two [18:02] Good [18:02] balrog: or we aren't using the UA anymore? [18:02] wonder if the guy saw what was going on and decided to quietly help [18:02] If we have 5, hell, we finally have a chance. [18:02] I thought we were, but a few people weren't [18:03] Maybe we need to see how the five hold up [18:03] Who's doing k? [18:03] k is kennethre [18:04] btw k might not be using the UA so you can kind of ignore him [18:04] weeeeee warriors are now spilling in with data too :) [18:05] down to a normal 100~ requests a minute [18:06] anyway I did a bunch of curls with the AT UA and determined that there are five, and they're different from those without the AT UA [18:07] hm it slowed down some [18:07] balrog: sweet, can you put that info somewhere secure and let me, alard, omf_ and SketchCow know. [18:07] as in your getting limited? [18:08] Infact, lets go to #preposterus [19:23] <-- is the seth working on the pull request, SketchCow [19:24] I figured [19:24] What up [19:27] I'm waiting on adam to change the folder names back to without spaces. But otherwise the branch checks out and I have it running in emulation. [19:27] Great [19:28] I realize we did some silliness to the original branch, but I sort of feel like we should keep it. Unless, seriously, we think it's no big deal. The top levels aren't canon. [19:28] When I first read it, I thought we were crippling code and items. [19:29] Nope, just renaming spaces to underscores [19:30] if it isn't a big deal, I can do a few more sanity checks, clean up the docs a bit, and the PR can be accepted. Adam is willing to move stuff back, but apparently gnu make freaks out and has to be handled in some way. So going back to spaces would be more work from him. [19:30] Maybe we should just fucking do that. [19:30] Whichever you prefer [19:30] I can't imagine the original filenames had that long of folder names [19:30] We're not messing with code [19:30] No, I GUARANTEE it didn't. I was THERE [19:30] I DID IT [19:30] I just misread the thing as him changing code and removing things [19:31] I'm not comfortable refactoring code for modern things, I'm comfortable duplicating code for new factors. [19:31] And this isn't either. [19:34] This is mangling folder names for a new piece of code to build the old code [19:34] Yeah, but it's not mangling [19:34] Oh, damn, do I need to look at this again. [19:35] If you are cool with the renaming, I will make that happen. If you do not want the folders renamed, I will make sure the makefile is changed to match. [19:35] These are the top ones, right [19:35] I will also ensure that no code is changed [19:35] Just the top ones. The rest are fine [19:35] They must be. The top ones were me being me [19:37] Ok, I will make it happen [20:32] If Posterous has AT-specific servers, we might have taken them down. [20:34] It's taken over 15 minutes for a request on an empty page. [20:35] pronoiac: I don't think we are using the AT servers atm [20:35] might be good if alard can put in a switch for it on the admin panel ? [20:40] Ah, the requests came through, finally. [20:40] Smiley: From my browser at home, the responses on the pages were quick. [20:41] So I figured there was rate-limiting or some other setup on their side. [20:48] Smiley: Should we dial it back a bit? If responses take 15 minutes, well, that's not a good sign, is it? [20:51] pronoiac: well fpffft [20:51] Smiley: Check the tracker, look at the graph for today, and notice that it's dropped to a fraction of an hour ago. [20:51] it means your banned. [20:51] Ah. [20:51] banned as well :/ [20:51] you can see when I turned the limit up ()and it went thru the roof [20:52] and then they banned everyoen. [20:52] Not much I can do about that I'm afraid guys :/ [20:52] work on formspring while we wait is all I can suggest. Bans do time out randomly. [20:52] i work on both :) [20:52] :) [20:52] I read about the 10000% increase and thought "uh-oh." :) [20:53] I think we need to poke the warrior a bit, have a backup project maybe. [20:53] However not much actually bans us so far. [20:53] more projects would be sweet [20:53] eg urlteam could run in parallel like always [20:53] Oh yeah: is Upcoming totally done? [20:54] I got an "outdated" warning on mine, but Github doesn't show anything new to pull. [20:54] * Smiley checks [20:54] Main queue 0 items [20:55] There is no claims in the queue either, so yes, completely done :) [20:55] \o/ great job! [20:58] I have Formspring going. It's uploading 1G right now. [21:01] \o/ [21:03] Yes, Upcoming is done. Today I redownloaded the incomplete and/or missing events, hence the update warning. The final batch has just finished uploading, is now being indexed: http://archive.org/details/archiveteam_upcoming_20130425151228 [21:07] Woot! [21:15] Archive.org reports 685 million URLs in 3.5 TB of archives. [21:32] why doesn't http://archive.org/details/archiveteam_upcoming_20130425151228 show up under http://archive.org/details/archiveteam ? [21:32] i guess it hasn't been made into a subcollection yet [21:32] (it's under "this just in, i guess) [21:35] shouldn't this be tar not tar.gz: http://archive.org/details/ARCHIVETEAM-YV-1400000-1499989 [21:48] If you are new to AchiveTeam please take a moment to view our current projects list - http://www.archiveteam.org/index.php?title=Current_Projects