#archiveteam-bs 2014-08-15,Fri

โ†‘back Search

Time Nickname Message
02:19 ๐Ÿ”— xmc following up on some words in #archivebot here
02:20 ๐Ÿ”— xmc I used to be very much in the "yes we grab everything" position
02:20 ๐Ÿ”— xmc I've gut a slightly finer point on it recently
02:20 ๐Ÿ”— xmc not sure what I mean exactly, putting this out in case someone wishes to discuss
04:18 ๐Ÿ”— godane i'm starting to upload more funny ore die videos and hp manuals
09:14 ๐Ÿ”— schbirid https://business.twitter.com/en-gb/products/pricing -> Youรขย€ย™ll only be charged when people follow your Promoted Account or retweet, reply, favourite or click on your Promoted Tweets. Youรขย€ย™ll never be charged for your organic activity on Twitter.
09:15 ๐Ÿ”— * schbirid has been replying to each and every promoted tweet since i found that
09:21 ๐Ÿ”— midas schbirid: for some reason i think it would be awesome to combine https://twitter.com/markovs with promoted tweets
09:23 ๐Ÿ”— schbirid ooooh hohoho >:D
12:50 ๐Ÿ”— SadDM xmc: I hear you re:Archivebot... we've thrown some HUGE stuff at it without much thought. That really ties it up, and when something like Ferguson happens and we really need it, it's busy downloading Linux kernel mailing lists or Edgar Rice Burroughs fan sites.
12:51 ๐Ÿ”— midas maybe we need 1 pipeline empty for shit that is going down now
12:51 ๐Ÿ”— SadDM Maybe we need to ask ourselves why folks are using it instead of running a wget themselves.
12:52 ๐Ÿ”— midas also a option
12:52 ๐Ÿ”— midas most likely, ease of usage
12:53 ๐Ÿ”— SadDM yeah, definitly, but what are the parts that it makes easy?
12:53 ๐Ÿ”— SadDM for example...
12:53 ๐Ÿ”— SadDM I *LOVE* that it automatically grabs media hosted on other domains.
12:55 ๐Ÿ”— SadDM If somebody smarter than me could extract that bit of magic from archivebot and add a description of how to do it to the wiki's "mirroring with wget" page, I'd probably do a *bunch* more small-medium sized grabs on my own.
12:55 ๐Ÿ”— midas I think that the biggest issue is with the steep learningcurve of wgetting a complete domain + warc + ignore patterns and uploading it to ia that might be the biggest issue
12:55 ๐Ÿ”— midas yeah
12:55 ๐Ÿ”— schbirid we should all be able to run our own archivebot
12:56 ๐Ÿ”— SadDM if it's easy enough to set up.. yeah
12:56 ๐Ÿ”— SadDM I suppose the one thing that it does that is really magical is the way it uplkoads the warcs on a daily basis.
12:57 ๐Ÿ”— SadDM If we were all to do our own little caputes then we'd have to constantly be bugging SketchCow to move them for us.
12:58 ๐Ÿ”— midas well, we could ask SketchCow to create a dumpcollection or 1 rsync target we dump it to
12:58 ๐Ÿ”— SadDM hmm, that's a though
12:58 ๐Ÿ”— midas (dump it to? that sounds way too dutch)
13:41 ๐Ÿ”— yipdw so
13:41 ๐Ÿ”— yipdw midas: that already exists
13:42 ๐Ÿ”— yipdw in some form, at least -- that's the idea behind separate !ao pipelines, and !ao < FILE, and was also the idea behind pipeline IDs (which admittedly are not yet all that usable since they're auto-generated, Zooko's Triangle etc)
13:51 ๐Ÿ”— SadDM yipdw: would it be possible for a person to set up an autonomous archivebot pipeline... one that doesn't talk to the main control channel or report to the public dashboard?
13:52 ๐Ÿ”— yipdw yeah
13:52 ๐Ÿ”— yipdw I do that for testing
13:52 ๐Ÿ”— yipdw it is somewhat documented in INSTALL; however there's a lot of bits in the bot that should really just be CLI tools
13:53 ๐Ÿ”— yipdw so there's a dependency on an IRC server (and CouchDB server for that matter) that is a bit odd
13:53 ๐Ÿ”— SadDM wow really? I didn't expect that answer... I expected domething along the lines of "Pffft... go figure it out yourself. I'm busy doing God's work" :-D
13:53 ๐Ÿ”— yipdw there's a branch in the archivebot repo that is aimed at fixing this
13:53 ๐Ÿ”— SadDM Nice... I'll be keeping an eye on that
13:54 ๐Ÿ”— yipdw it's the taco-bell branch
13:55 ๐Ÿ”— SadDM O_o interesting name
13:55 ๐Ÿ”— yipdw http://widgetsandshit.com/teddziuba/2010/10/taco-bell-programming.html
13:56 ๐Ÿ”— yipdw it's not really as extreme as that post espouses but it is nevertheless a simplification
13:59 ๐Ÿ”— SadDM I've never heard that term, but the concept is familiar... "You have simple yeat powerful tools... use them"
14:01 ๐Ÿ”— SadDM so, which pieces are you looking to simplify out (just out of curiosity)?
14:02 ๐Ÿ”— yipdw cogs was a pretty big mess of objects that also leaked a lot of memory
14:02 ๐Ÿ”— yipdw that's now a few pipelines
14:02 ๐Ÿ”— yipdw (and doesn't leak)
14:03 ๐Ÿ”— yipdw the dashboard used to do a fair amount of JSON processing before it output data; that's mostly gone now and the dashboard is also part of a pipeline
14:03 ๐Ÿ”— SketchCow Wut
14:03 ๐Ÿ”— yipdw those were changes done out of necessity to keep the bot from destroying its host
14:04 ๐Ÿ”— yipdw everything else is really more of an aesthetic thing -- "I don't like that this code is duplicated here, so I'm going to make it common"
14:04 ๐Ÿ”— yipdw so less urgent :P
14:07 ๐Ÿ”— SadDM I am so thankful that the world is filled with intelligent people who have a bit of time on their hands and are into cool stuff.
14:20 ๐Ÿ”— yipdw SadDM: yeah, me too
14:20 ๐Ÿ”— yipdw archivebot wouldn't really exist without redis+wpull
14:54 ๐Ÿ”— midas urgh
14:55 ๐Ÿ”— midas that first
14:55 ๐Ÿ”— midas now, i dont like my collegues anymore
14:55 ๐Ÿ”— midas one of them kinda broke my great deployment idea from git
14:58 ๐Ÿ”— midas they made a new repo containing multiple folders before getting to the source of the files
14:59 ๐Ÿ”— deathy empty folders?
15:00 ๐Ÿ”— midas nope
15:00 ๐Ÿ”— midas well sort of
15:00 ๐Ÿ”— midas it's project/public_html/files <-- i want to clone the files directly
15:01 ๐Ÿ”— midas hm maybe i can branch it
15:03 ๐Ÿ”— joepie91 https://imgur.com/gallery/Qd9ksk5
15:06 ๐Ÿ”— norbert79 Archive.org material :)
15:10 ๐Ÿ”— xmc derployment
15:14 ๐Ÿ”— joepie91 norbert79: yes, was thinking that
15:23 ๐Ÿ”— swebb Happy friday! https://www.youtube.com/watch?v=8PVal8Fy7CM
15:33 ๐Ÿ”— joepie91 .t
15:33 ๐Ÿ”— botpie91 Fri, 15 Aug 2014 15:33:46 GMT
15:33 ๐Ÿ”— joepie91 um..
15:34 ๐Ÿ”— joepie91 .t https://www.youtube.com/watch?v=8PVal8Fy7CM
15:34 ๐Ÿ”— joepie91 no?
15:34 ๐Ÿ”— * joepie91 boggles
15:34 ๐Ÿ”— joepie91 .title
15:34 ๐Ÿ”— botpie91 joepie91: My Name is John Daker - BEST VERSION w/ SUBTITLES - YouTube
15:34 ๐Ÿ”— joepie91 ah there we go
17:02 ๐Ÿ”— godane just know some videos of funny or die say the description twice
17:03 ๐Ÿ”— godane this is cause some videos so up twice in my xml dump but i add code so i could get all thing into one line
17:03 ๐Ÿ”— godane keywords also appear twice with these videos too
17:05 ๐Ÿ”— godane also i'm past 26k
17:05 ๐Ÿ”— godane in godaneinbox
17:06 ๐Ÿ”— godane also i'm close to getting number 46k for the manuals collection
18:02 ๐Ÿ”— phuzion Anyone know if IA offers downloads of files by anything other than HTTP? rsync? FTP? I know about the torrents
18:02 ๐Ÿ”— phuzion I wanna get the WL insurance C file, and it's friggin huge
18:05 ๐Ÿ”— aaaaaaaaa It doesn't appear so, but you could try an accelerator like axel.
18:05 ๐Ÿ”— aaaaaaaaa I think they got rid of ftp downloads a while ago.
18:05 ๐Ÿ”— DFJustin SadDM: another thing is that the archivebot machines mostly have way better connectivity, if I crawled a 100gb site myself it would take weeks to upload to ia
18:06 ๐Ÿ”— DFJustin and it's more work which means less likely to actually get done
20:06 ๐Ÿ”— godane i'm up to 202k files that i have uploaded
20:21 ๐Ÿ”— joepie91 phuzion: I wonder how hard it'd be to build an rsync proxy for IA...
20:21 ๐Ÿ”— phuzion joepie91: Not sure. Wanna try?
20:22 ๐Ÿ”— phuzion I'll test it on the wikileaks insurance file if you wanna blow 325GB of data on it :)
20:26 ๐Ÿ”— joepie91 heh
20:26 ๐Ÿ”— joepie91 pft, 325GB :P
20:27 ๐Ÿ”— joepie91 phuzion: no rsyncd lib for node :(
20:28 ๐Ÿ”— phuzion nodejs?
20:29 ๐Ÿ”— joepie91 ya
20:34 ๐Ÿ”— yipdw for some reason this conversation got me interested in implementing archivebot on Plan 9
20:34 ๐Ÿ”— yipdw I don't know why
21:34 ๐Ÿ”— aaaaaaaaa On the off chance anyone knows the answer: How long should the whole BGP messing up routes last? I'm getting weird behavior the past few days that I think may be related but my ISP insists everything is fine.
21:45 ๐Ÿ”— yipdw aaaaaaaaa: indefinite, if you're referring to recent problems with routers not having enough memory
21:48 ๐Ÿ”— aaaaaaaaa Figured that is what I was going to get. Of course, they'd never admit there was a problem, but I've got packets that get stuck going in loops according to traceroute, just disappear to nowhere, etc and only for certain destinations.
21:48 ๐Ÿ”— aaaaaaaaa Oh well. That's the service you get from a duopoly.
21:55 ๐Ÿ”— yipdw aaaaaaaaa: which part, Comcast or AT&T
22:02 ๐Ÿ”— Smiley arketype: forever until they upgrade.
22:06 ๐Ÿ”— aaaaaaaaa Comcast, I've not seen a packet go through at&t on any traceroute
22:06 ๐Ÿ”— aaaaaaaaa I think my ISP is trying to route around them
22:07 ๐Ÿ”— aaaaaaaaa around at&t
22:09 ๐Ÿ”— yipdw that's one way to route around any network neutrality laws
22:09 ๐Ÿ”— yipdw "pay us for premium TCAM space"
22:09 ๐Ÿ”— aaaaaaaaa Usually my packets go through AT&T to level3 but now they seem to be going through comcast
22:13 ๐Ÿ”— aaaaaaaaa Oh well.
22:42 ๐Ÿ”— yipdw https://github.com/paypal/merchant-sdk-java/blob/master/merchantsample/src/main/java/com/sample/merchant/CheckoutServlet.java <-- this is what Java developers think is a reasonable "sample" program
23:23 ๐Ÿ”— deathy as a Java developer.. *sigh* ..no comment
23:24 ๐Ÿ”— deathy though it would be same crap with a single .php file, servlets are simple..

irclogger-viewer