#archiveteam 2013-04-25,Thu

↑back Search

Time Nickname Message
00:01 🔗 n00b928 Can I help?
00:02 🔗 omf_ dashcloud, we also have an etherpad http://pad.archivingyoursh.it/
00:03 🔗 dashcloud great!
00:04 🔗 pronoiac Also, it looks like the Posterous tracker hasn't processed anything new for hours.
00:07 🔗 n00b928 want a 1 gb internert speed in 1 second to help?
00:08 🔗 n00b928 This is possible
00:09 🔗 n00b928 anyone is interested in this connection?
00:09 🔗 noahc n00b, what do you want to help with?
00:10 🔗 noahc Raw speed doesn't seem to be an issue, but someone else feel free to correct me if I'm wrong.
00:13 🔗 pronoiac n00b928: I'm not a regular here, but I think you could go get the Warrior, let it do the Archiveteam's choice.
00:15 🔗 jk[SVP] There's even a video http://youtu.be/_nzD-QpmePE
00:16 🔗 dashcloud hi, anyone who wants to add urls for nwnet.co.uk , here's an etherpad: http://pad.archivingyoursh.it/p/nwnet.co.uk
00:20 🔗 noahc Does anyone have a good way to take a url and crawl that looking for other nwnet links on it?
00:21 🔗 noahc Like I can generate http://www.nwnet.co.uk/abbey/ for example my script found.
00:33 🔗 daxelrod So, is upcoming done? Any point in keeping my upcoming pipeline running?
00:39 🔗 omf_ SketchCow, here is another item for the ftp collection 34gb - http://archive.org/details/ftp.ea.com_2012-05-06
01:03 🔗 noahc NSFW: http://www.nwnet.co.uk/blue/
01:04 🔗 SketchCow omf_: Added
01:05 🔗 omf_ Will I still be able to upload more files to that item
01:05 🔗 omf_ I am doing a 2013 ftp grab
01:06 🔗 SketchCow Yes
01:06 🔗 SketchCow Well, sort of
01:06 🔗 SketchCow Yes
01:06 🔗 SketchCow For your purposes, yes
01:08 🔗 DFJustin collection=software&mediatype=ftpsites <-- wrong way round
01:08 🔗 SketchCow Did I do that
01:09 🔗 SketchCow <3
01:09 🔗 SketchCow Fixed
01:28 🔗 SketchCow Oh man, these nwnets are AMAZING
01:28 🔗 SketchCow A M A Z I N G
01:29 🔗 BlueMax nwnets?
01:31 🔗 omf_ This is a site I pulled down a while back which is another old ISP that has some cool pages - http://archive.org/details/userpages.monmouth.com
01:35 🔗 noahc SketchCow: There are some really cool ones!
01:36 🔗 noahc http://www.nwnet.co.uk/craig/ The design on this is so modern compared to many others!
01:36 🔗 noahc The reviews of IRC and ICQ clients on here http://www.nwnet.co.uk/cooper/irc.htm is awesome.
01:37 🔗 omf_ reminds me of this page http://www.techniqueretreat.com/ which rotates color schemes
01:38 🔗 noahc Yeah, I can see that.
01:39 🔗 noahc How does the warrior work? Do you feed it URL's and then it scrapes or do you need to feed it urls and it just does a wget (or similar) on them?
01:40 🔗 chronomex ohhhh man pirch 98
01:47 🔗 InitHello holy 90s page design
01:48 🔗 jjonas http://forum.playfish.com/showthread.php?t=2968222
01:50 🔗 jjonas the game has 10 million likes
01:59 🔗 noahc I'm going to let this bad boy run over night. Hopefully, when I wake up tomorrow I'll have a few hundred URLs for us to use.
03:30 🔗 shaqfu Is there a go-to process for grabbing a single Deja group?
03:33 🔗 illunatic hah shaqfu. i was just thinking of that game the other day
03:34 🔗 shaqfu I want to poke through a few NGs, but GGroups interface is so abhorrent that I'd rather just grab+grep
03:37 🔗 omf_ I have been working on gg for a while
03:37 🔗 omf_ for scrape and save
03:39 🔗 shaqfu omf_: what's your process to grab a group?
03:41 🔗 omf_ the problem is getting banned
03:41 🔗 omf_ they have good detection
03:41 🔗 shaqfu omf_: even for small tactical grabs?
03:42 🔗 omf_ 5-7 pages of results before they notice and that is witch a scripting language. I have moved on to automating browsers and so far it looks promising
03:55 🔗 Rexxar I wonder if various web CEOs get together and complain about SketchCow
04:00 🔗 omf_ shaqfu, what group are you looking to get
04:00 🔗 shaqfu omf_: A bunch of old MtG ones
04:01 🔗 omf_ They might already be archived. I have found pockets of that all over
04:01 🔗 omf_ let me check around
04:01 🔗 shaqfu Yeah, it showed up in a lot of alt.recs
04:01 🔗 omf_ but I can run my software on actual group names if you got them
04:01 🔗 shaqfu And they're the only source on the game's very early development
04:22 🔗 gcr I'm stuck in "The warrior is beginning work on a project" when using the ArchiveTeam's Choice and a blank "Current project" page. Is this expected?
04:23 🔗 gcr wow, communication skills. I meant to say that my current project is blank when using archiveteam's choice
04:25 🔗 omf_ select posterous and set the world afire
04:37 🔗 TeeCee Same issue here... Problems connecting to tracker?
04:40 🔗 lukeman "archiveteam's choice" isn't working, but if you select a project (posterous seems like the best) it will get to work
04:41 🔗 lukeman of course all of my threads are hitting the rate limit
04:57 🔗 underscor SketchCow: dashcloud, et al, anyone working on nwnet
04:57 🔗 underscor http://archive.org/~abuie/nwnet.txt
04:57 🔗 underscor Those are all the nwnet top-level things we know about
04:57 🔗 underscor (IA)
04:57 🔗 underscor [extracted from the cdx hadoop cluster]
04:58 🔗 omf_ nice
04:59 🔗 omf_ looks like it has dupes
05:00 🔗 underscor apparently capitalization doesn't matter to nwnet
05:00 🔗 underscor Which threw off my sort -u
05:00 🔗 underscor omf_: Fixed dupes
05:01 🔗 underscor also added http://archive.org/~abuie/nwnet_all.txt
05:02 🔗 omf_ underscor, once you strip off the port number, there are still dupes
05:02 🔗 omf_ :80 is the default
05:03 🔗 underscor Fixed again
05:03 🔗 underscor Also, I know. That's just how wayback stores them
05:03 🔗 omf_ yeah I was trying to fix it on my local copy and it took me a minute to realize the line ending was tripping me off
05:04 🔗 omf_ CR instead of LF
05:08 🔗 godane i'm seeing about grabing click online urls
05:08 🔗 godane by using google to do it
05:12 🔗 godane now i get the continue urls
05:12 🔗 godane but i think i got 1000 urls
05:18 🔗 ersi gcr, TeeCee: Still got problems with the ArchiveTeam's Choice in the Warrior?
05:26 🔗 TeeCee ersi: Dunno... I have changed to that project now, so we'll see when the current items are done.. :)
05:33 🔗 ersi TeeCee: Alright :)
06:15 🔗 BlueMax Who added 600k of accounts to the Posterous tracker :P
06:17 🔗 godane we need a way to get spreecast
06:18 🔗 godane spreecast is used for the blazecast shows from the editors of theblaze
06:18 🔗 godane it also has a archived chat in real time
06:19 🔗 godane it plays back the chat in real time
06:19 🔗 godane http://www.spreecast.com/users/scott--50
06:20 🔗 godane maybe a good idea to make tools to grab this stuff
07:05 🔗 underscor What's the highest priority project right now?
07:05 🔗 underscor Posterous?
07:09 🔗 TeeCee My Warrior is told to work on Formspring...
07:25 🔗 lukeman i'm still seeing tracker rate limiting. is that indicative of being blocked by posterous?
07:26 🔗 BlueMax Either that or posterous asked us to slow down again
07:30 🔗 lukeman just going off of SketchCow's tweets, it seems that they've started blocking. i switched back over to help out and seeing the rate limiting stuff like before.
07:31 🔗 href is it possible to use a http_proxy for a warrior ?
07:34 🔗 lukeman as far as i can tell i'm not blocked but the tracker is rate limiting me
07:35 🔗 lukeman not sure if blocked hosts are eating up slots or something
07:35 🔗 lukeman i'll leave mine running as i head to bed
08:31 🔗 ersi lukeman: Rate-limiting messages comes from the ArchiveTeam Tracker.
09:09 🔗 Smiley href: yes, the options are shown if you hit the advanced button
09:10 🔗 href eh I feel stupid now, i looked at the advanced button. maybe not enough :p thanks!
09:12 🔗 href hmm I looked again and there is nothing about http proxies
09:14 🔗 ersi Since when do we have http_proxy settings under advanced? :o
09:15 🔗 href you don't :p
09:15 🔗 Smiley Oh wait
09:15 🔗 Smiley thats for setting a password and username
09:15 🔗 Smiley doh :<
09:16 🔗 ersi And that's your introduction to Smiley ;D
09:17 🔗 href :-D
09:51 🔗 href 'k apparently configuring the proxy for wget (in ~/.wgetrc) is enough. cool :)
09:53 🔗 Smiley :)
09:54 🔗 ersi neat :)
10:10 🔗 href here we go, four more warriors :)
12:05 🔗 noahc I've updated http://pad.archivingyoursh.it/p/nwnet.co.uk with about 40 more. I have a meeting tonight, but after work and the meeting, I hope to hack more on my script and improve it's collection ability.
14:13 🔗 flaushy hmm posterous is heavily rate limited?
14:14 🔗 closure anyone know of a channel where archive.org folk hang?
14:17 🔗 GLaDOS closure: #internetarchive if anything
14:17 🔗 closure oh, on this network. suprised
14:18 🔗 closure looks a lot like this channel :P
14:18 🔗 ersi Yes, but it's a dedicated channel for IA conversations
14:19 🔗 ersi flaushy: Yes. Also #preposterus - project channel :)
14:19 🔗 flaushy yeah, suddenly realized ;)
14:20 🔗 ersi ^_^
14:41 🔗 balrog uhm... http://arstechnica.com/business/2013/04/yahoo-yanks-saturday-night-live-from-hulu/
15:19 🔗 lukeman moving off of hulu is a relative good thing. when archived videos were made available it was only for short periods of time. i would often find links to videos that had been pulled from hulu.
15:20 🔗 lukeman lots of folks who work at snl have mentioned they have the entire archive of sketches easily available internally all tagged up and with video from both air and dress rehearsal. maybe this is a step towards more of that being exposed.
15:23 🔗 SketchCow Moving off from Hulu into Yahoo is not a good thing.
15:24 🔗 SketchCow It means Yahoo wants to continue to be a content hub
15:24 🔗 SketchCow Or
15:24 🔗 SketchCow OR
15:24 🔗 SketchCow People could just get tired of watching that shit
15:31 🔗 balrog SketchCow: posterous needs attention
15:31 🔗 balrog I have a full window of Tracker rate limiting is in effect. Retrying after 30 seconds...
15:31 🔗 WiK welp, so far still banned
15:31 🔗 WiK ill try creating a new username/password and verifing the email later
15:31 🔗 balrog also, it appears that AT requests go to five posterous servers while other requests go to the rest of the servers, not including those five
15:34 🔗 SketchCow We need alard or one of the people running the tracker. I do not run the tracker.
15:35 🔗 SketchCow Do we have a nwnet channel?
15:35 🔗 SketchCow There's a second domain name for these folks, we have more to scan.
15:35 🔗 SketchCow http://telinco.co.uk/
15:36 🔗 balrog http://www.telinco.co.uk
15:36 🔗 balrog doesn't work without www :/
15:39 🔗 SketchCow Right
15:39 🔗 SketchCow Searching google with that gets some of the stuff.
15:43 🔗 omf_ I am creating a nwnet wiki page
15:43 🔗 omf_ someone pick an irc channel name please
15:44 🔗 SketchCow #nwnyet
15:55 🔗 omf_ http://www.archiveteam.org/index.php?title=Nwnyet
15:55 🔗 SketchCow For the record, generally we make the wiki page the real name
15:56 🔗 SketchCow Maybe a reference to the funny name
15:56 🔗 SketchCow And of course a link to the IRC
15:56 🔗 omf_ irc is in the template
15:56 🔗 SketchCow Right
15:56 🔗 SketchCow I'm saying that normally we have the normal name, and we reference the funny name and the IRC channel, which has the funny name.
15:57 🔗 omf_ got it
15:58 🔗 balrog (based on the google index) all nwnet.co.uk sites are subdirectories, while many telinco sites are subdomains
15:58 🔗 omf_ fixed it with a redirect
15:58 🔗 omf_ http://www.archiveteam.org/index.php?title=Nwnet_telinco
17:51 🔗 Smiley SketchCow: you alive?
17:51 🔗 Smiley Need to talk re:posterous - why have we so heavily limited ourselves?
17:52 🔗 SketchCow I don't know, frankly
17:53 🔗 Smiley .....
17:53 🔗 Smiley Orders to raise the limit sir?
17:54 🔗 Smiley No one has left any messages in any of the channels explaining it, I can only guess someone fudged typing 100 or 1000 and hit 10 accidently.
17:55 🔗 SketchCow Can you do it?
17:55 🔗 SketchCow Turn that fucker up
17:56 🔗 Smiley SIR YES SIR!
17:56 🔗 Smiley Set to 1000, watching now
17:56 🔗 Smiley 55 9737 100
17:56 🔗 Smiley 56 4977 1500
17:56 🔗 balrog as I said, it appears that requests with the archive team UA are hitting FIVE servers at posterous
17:56 🔗 Smiley 54 9968 100
17:56 🔗 Smiley Minute/requested/granted
17:57 🔗 ussjoin So my "Archive Team's Choice" stuff is still going to URLTeam... I gather we can't hit Posterous any faster?
17:57 🔗 Smiley ussjoin: we can try...
17:57 🔗 Smiley choose it manually in the projects list
17:58 🔗 ussjoin Tried that and got AT rate limited.
17:59 🔗 Smiley I've *just* increased the rate by 10000*
17:59 🔗 Smiley I've *just* increased the rate by 10000%
17:59 🔗 balrog the rate limit is still in effect
17:59 🔗 ussjoin Smiley: I know you just did, and I'm just trying it. And still, AT Tracker Rate Limiting.
18:00 🔗 balrog take a look at http://tracker.archiveteam.org/posterous/ though
18:00 🔗 balrog btw, are these all with the AT user agent?
18:00 🔗 Smiley balrog: I don't know whats happening with the AT agent atm
18:00 🔗 godane so i got more episodes of tekzilla being upload right now
18:00 🔗 Smiley and yes, teh tracker shows k killing it as normal?
18:00 🔗 pft mine just picked up a task for the first time
18:00 🔗 Smiley we are pushing out 1000 grants a minute now
18:01 🔗 SketchCow I expect some nice love notes from posterous shortly
18:01 🔗 Smiley yah
18:02 🔗 balrog SketchCow: it looks like posterous has five servers set up for us
18:02 🔗 balrog rather than just two
18:02 🔗 SketchCow Good
18:02 🔗 Smiley balrog: or we aren't using the UA anymore?
18:02 🔗 balrog wonder if the guy saw what was going on and decided to quietly help
18:02 🔗 Smiley If we have 5, hell, we finally have a chance.
18:02 🔗 balrog I thought we were, but a few people weren't
18:03 🔗 SketchCow Maybe we need to see how the five hold up
18:03 🔗 SketchCow Who's doing k?
18:03 🔗 balrog k is kennethre
18:04 🔗 Smiley btw k might not be using the UA so you can kind of ignore him
18:04 🔗 Smiley weeeeee warriors are now spilling in with data too :)
18:05 🔗 Smiley down to a normal 100~ requests a minute
18:06 🔗 balrog anyway I did a bunch of curls with the AT UA and determined that there are five, and they're different from those without the AT UA
18:07 🔗 balrog hm it slowed down some
18:07 🔗 Smiley balrog: sweet, can you put that info somewhere secure and let me, alard, omf_ and SketchCow know.
18:07 🔗 Smiley as in your getting limited?
18:08 🔗 Smiley Infact, lets go to #preposterus
19:23 🔗 sethish <-- is the seth working on the pull request, SketchCow
19:24 🔗 SketchCow I figured
19:24 🔗 SketchCow What up
19:27 🔗 sethish I'm waiting on adam to change the folder names back to without spaces. But otherwise the branch checks out and I have it running in emulation.
19:27 🔗 SketchCow Great
19:28 🔗 SketchCow I realize we did some silliness to the original branch, but I sort of feel like we should keep it. Unless, seriously, we think it's no big deal. The top levels aren't canon.
19:28 🔗 SketchCow When I first read it, I thought we were crippling code and items.
19:29 🔗 sethish Nope, just renaming spaces to underscores
19:30 🔗 sethish if it isn't a big deal, I can do a few more sanity checks, clean up the docs a bit, and the PR can be accepted. Adam is willing to move stuff back, but apparently gnu make freaks out and has to be handled in some way. So going back to spaces would be more work from him.
19:30 🔗 SketchCow Maybe we should just fucking do that.
19:30 🔗 sethish Whichever you prefer
19:30 🔗 sethish I can't imagine the original filenames had that long of folder names
19:30 🔗 SketchCow We're not messing with code
19:30 🔗 SketchCow No, I GUARANTEE it didn't. I was THERE
19:30 🔗 SketchCow I DID IT
19:30 🔗 SketchCow I just misread the thing as him changing code and removing things
19:31 🔗 SketchCow I'm not comfortable refactoring code for modern things, I'm comfortable duplicating code for new factors.
19:31 🔗 SketchCow And this isn't either.
19:34 🔗 sethish This is mangling folder names for a new piece of code to build the old code
19:34 🔗 SketchCow Yeah, but it's not mangling
19:34 🔗 SketchCow Oh, damn, do I need to look at this again.
19:35 🔗 sethish If you are cool with the renaming, I will make that happen. If you do not want the folders renamed, I will make sure the makefile is changed to match.
19:35 🔗 SketchCow These are the top ones, right
19:35 🔗 sethish I will also ensure that no code is changed
19:35 🔗 SketchCow Just the top ones. The rest are fine
19:35 🔗 SketchCow They must be. The top ones were me being me
19:37 🔗 sethish Ok, I will make it happen
20:32 🔗 pronoiac If Posterous has AT-specific servers, we might have taken them down.
20:34 🔗 pronoiac It's taken over 15 minutes for a request on an empty page.
20:35 🔗 Smiley pronoiac: I don't think we are using the AT servers atm
20:35 🔗 Smiley might be good if alard can put in a switch for it on the admin panel ?
20:40 🔗 pronoiac Ah, the requests came through, finally.
20:40 🔗 pronoiac Smiley: From my browser at home, the responses on the pages were quick.
20:41 🔗 pronoiac So I figured there was rate-limiting or some other setup on their side.
20:48 🔗 pronoiac Smiley: Should we dial it back a bit? If responses take 15 minutes, well, that's not a good sign, is it?
20:51 🔗 Smiley pronoiac: well fpffft
20:51 🔗 pronoiac Smiley: Check the tracker, look at the graph for today, and notice that it's dropped to a fraction of an hour ago.
20:51 🔗 Smiley it means your banned.
20:51 🔗 pronoiac Ah.
20:51 🔗 flaushy banned as well :/
20:51 🔗 Smiley you can see when I turned the limit up ()and it went thru the roof
20:52 🔗 Smiley and then they banned everyoen.
20:52 🔗 Smiley Not much I can do about that I'm afraid guys :/
20:52 🔗 Smiley work on formspring while we wait is all I can suggest. Bans do time out randomly.
20:52 🔗 flaushy i work on both :)
20:52 🔗 Smiley :)
20:52 🔗 pronoiac I read about the 10000% increase and thought "uh-oh." :)
20:53 🔗 Smiley I think we need to poke the warrior a bit, have a backup project maybe.
20:53 🔗 Smiley However not much actually bans us so far.
20:53 🔗 flaushy more projects would be sweet
20:53 🔗 flaushy eg urlteam could run in parallel like always
20:53 🔗 pronoiac Oh yeah: is Upcoming totally done?
20:54 🔗 pronoiac I got an "outdated" warning on mine, but Github doesn't show anything new to pull.
20:54 🔗 * Smiley checks
20:54 🔗 Smiley Main queue 0 items
20:55 🔗 Smiley There is no claims in the queue either, so yes, completely done :)
20:55 🔗 flaushy \o/ great job!
20:58 🔗 pronoiac I have Formspring going. It's uploading 1G right now.
21:01 🔗 Smiley \o/
21:03 🔗 alard Yes, Upcoming is done. Today I redownloaded the incomplete and/or missing events, hence the update warning. The final batch has just finished uploading, is now being indexed: http://archive.org/details/archiveteam_upcoming_20130425151228
21:07 🔗 pronoiac Woot!
21:15 🔗 alard Archive.org reports 685 million URLs in 3.5 TB of archives.
21:32 🔗 jdunck why doesn't http://archive.org/details/archiveteam_upcoming_20130425151228 show up under http://archive.org/details/archiveteam ?
21:32 🔗 jdunck i guess it hasn't been made into a subcollection yet
21:32 🔗 jdunck (it's under "this just in, i guess)
21:35 🔗 godane shouldn't this be tar not tar.gz: http://archive.org/details/ARCHIVETEAM-YV-1400000-1499989
21:48 🔗 omf_ If you are new to AchiveTeam please take a moment to view our current projects list - http://www.archiveteam.org/index.php?title=Current_Projects

irclogger-viewer