#archiveteam 2012-01-16,Mon

โ†‘back Search

Time Nickname Message
00:10 ๐Ÿ”— chronomex bsmith094: mostly north america and europe and australia.
00:10 ๐Ÿ”— bsmith094 1well that explains a lot
00:11 ๐Ÿ”— bsmith094 austrailia is 16hrs ahead of me in ny
00:11 ๐Ÿ”— bsmith094 eu is atleast 6hrs ahead
00:17 ๐Ÿ”— chronomex seattle is 23 hours ahead
00:18 ๐Ÿ”— chronomex er, not quite, 21 hours ahead
01:20 ๐Ÿ”— bsmith094 maybe for u, for me , im in new york
01:29 ๐Ÿ”— chronomex I think the time offset between seattle and new york is more than simply a matter of perspective ...
01:32 ๐Ÿ”— bsmith094 whoops seattle wa is on the other coast, man i need to look at a map once in a while
01:33 ๐Ÿ”— chronomex hah
01:36 ๐Ÿ”— bsmith094 from several hrs ago (06:22:18 PM) bsmith094: repeat from earlier yipdw: seriously, how do i run this now, what do i use, hoard, ffgrab, what, is hoard expecting a file with id numbers, because i have that id really like an answer, this method seems to be MUCH faster than the other thing i was using
01:46 ๐Ÿ”— bsmith094 what im walkign about https://github.com/ArchiveTeam/ffnet-grab
01:46 ๐Ÿ”— bsmith094 *talking
01:54 ๐Ÿ”— balrog btw how IS the mac.com project going?
01:54 ๐Ÿ”— balrog or mobileme
05:11 ๐Ÿ”— yipdw bsmith094: as I said, it's not complete
05:11 ๐Ÿ”— yipdw bsmith094: however, the idea is that you run hoard and stalk simultaneously
05:11 ๐Ÿ”— bsmith094 ohh ok
05:11 ๐Ÿ”— yipdw hoard retrieves stories; stalk retrieves profiles
05:11 ๐Ÿ”— yipdw or will retrieve profiles
05:11 ๐Ÿ”— bsmith094 but hoard has no args?
05:11 ๐Ÿ”— yipdw it's supposed to pull story IDs from a Redis instance
05:11 ๐Ÿ”— yipdw the code to do that doesn't yet exist
05:12 ๐Ÿ”— yipdw long story short, that stuff doesn't work yet
05:12 ๐Ÿ”— yipdw I'm pretty sure it could be simpler but whatever
05:12 ๐Ÿ”— bsmith094 ah ok so ill stop tyring to run it then
05:12 ๐Ÿ”— yipdw (I expect that nobody will want to distribute this)
05:12 ๐Ÿ”— yipdw distribute the workload that is
05:12 ๐Ÿ”— bsmith094 btw i synced the redis db with coderjoes instance, so thats fine right
05:12 ๐Ÿ”— yipdw yes
05:12 ๐Ÿ”— yipdw his is old; I've been rerunning ffgrab and have so far come up with about 3,000 more stories
05:12 ๐Ÿ”— yipdw but it's close enough
05:13 ๐Ÿ”— bsmith094 no such file to load -- connection_pool (LoadError) from stalk.rb:1
05:13 ๐Ÿ”— yipdw it requires the same load convention as hoard.rb
05:13 ๐Ÿ”— bsmith094 is there any way i could make this stop happening, ive install all these things at least 5 times
05:13 ๐Ÿ”— yipdw yeah, make sure you're using the right Ruby installation
05:13 ๐Ÿ”— yipdw I guess I should modify ffgrab to insert a "date of latest fetch"
05:13 ๐Ÿ”— yipdw so that one can version fetches
05:14 ๐Ÿ”— bsmith094 apparently it remembers if i just use rvm use 1.9.2 so that works great then
05:14 ๐Ÿ”— bsmith094 and yes great idea
05:15 ๐Ÿ”— bsmith094 might as well update the db when we start scraping the site, as well
05:16 ๐Ÿ”— bsmith094 whats your local time?
05:16 ๐Ÿ”— yipdw I would version it in UTC
05:17 ๐Ÿ”— yipdw or perhaps use TAI64 just to be an asshole
05:19 ๐Ÿ”— bsmith094 just asking because youre usually one here either very late or early my time gmt-5
05:22 ๐Ÿ”— bsmith094 thats why i couldnt find any definition for sid? it uses the redis instance? huh, neat
05:36 ๐Ÿ”— underscor <Hydriz> and BTW the Internet Archive just broke down
05:36 ๐Ÿ”— underscor Yeah, basically, the entire ia7* datacenter went offline
05:37 ๐Ÿ”— underscor The machine that manages IP address assignment for that DC died, afaik
05:37 ๐Ÿ”— underscor And that fucked everything
06:05 ๐Ÿ”— yipdw ha, that's annoying
06:05 ๐Ÿ”— yipdw http://www.fanfiction.net/s/7128202
06:05 ๐Ÿ”— yipdw "story not found" returned with HTTP 200
06:53 ๐Ÿ”— balrog hey all
06:55 ๐Ÿ”— chronomex y0
06:56 ๐Ÿ”— balrog what's up?
06:56 ๐Ÿ”— balrog oh not sure if any of you saw -- http://minnie.tuhs.org/pipermail/tuhs/2011-December/002538.html
06:56 ๐Ÿ”— balrog basically, deleted blocks are still worth preserving.
06:56 ๐Ÿ”— balrog :p
06:57 ๐Ÿ”— chronomex fascinating
07:04 ๐Ÿ”— balrog which makes me wonder; does anyone here have equipment to read old DEC media?
07:04 ๐Ÿ”— balrog disk packs and tapes
07:28 ๐Ÿ”— yipdw balrog: wow, that is cool
07:28 ๐Ÿ”— yipdw balrog: what makes it even cooler (for me) is that it's going through the archives of someone who's no longer here
07:28 ๐Ÿ”— yipdw makes you realize how long computing has been around
07:30 ๐Ÿ”— balrog yipdw: yupรขย€ยฆ
07:31 ๐Ÿ”— balrog I have access to stacks of archives
07:31 ๐Ÿ”— balrog but most, no way to read
07:31 ๐Ÿ”— balrog much is original DEC software too :/
07:31 ๐Ÿ”— yipdw neat
07:31 ๐Ÿ”— yipdw unfortunately I have no devices that can read those, nor do I think my employer has any
07:32 ๐Ÿ”— balrog they're rare
07:32 ๐Ÿ”— balrog :/
07:32 ๐Ÿ”— balrog I have some devices, but probably not all the interfacing stuff
07:41 ๐Ÿ”— chronomex I'm going to my local retrocomputing society meeting soon
07:41 ๐Ÿ”— chronomex I'll ask around
07:48 ๐Ÿ”— balrog ciik
07:48 ๐Ÿ”— balrog cool*
07:48 ๐Ÿ”— balrog mostly have here RL01 and RK05 packs, and tape
07:55 ๐Ÿ”— OneManArm http://www.youtube.com/watch?v=p0t7g38sd7Y
08:21 ๐Ÿ”— Hydriz great to see that Archive.org has resolved its issues
08:46 ๐Ÿ”— Hydriz The weirdness about the silence of this channel...
09:05 ๐Ÿ”— chronomex yeah?
09:26 ๐Ÿ”— ersi What's so weird about it?
09:27 ๐Ÿ”— ersi It's mostly silence in here, unless there's a big project going on - or someone is discussing an project
09:36 ๐Ÿ”— Hydriz I see the archives quite long, but when I am here its mostly silent
09:39 ๐Ÿ”— * ersi rolls eyes and goes back to work
09:48 ๐Ÿ”— arrith Hydriz: while you're waiting for irc chan activity you could work on the wiki :D
09:56 ๐Ÿ”— yipdw Hydriz: it's also early morning in the US, and a lot of people here are in those time zones
09:56 ๐Ÿ”— yipdw like me, and I gotta to bed :P
09:56 ๐Ÿ”— yipdw +get
09:57 ๐Ÿ”— chronomex ditto
10:11 ๐Ÿ”— Hydriz I would love to work on the wiki, but I don't have the sufficient right
10:11 ๐Ÿ”— Hydriz *rights
10:11 ๐Ÿ”— Hydriz well its close to evening here...
14:21 ๐Ÿ”— tef SketchCow: I need an rsync slot thingy for splinder; Nemo_bis has been nagging me to get my butt into gear and upload splinder stuffs
18:41 ๐Ÿ”— yipdw from Thin's example suite:
18:41 ๐Ÿ”— yipdw it 'should not fuck up on stupid fucked IE6 headers' do
18:42 ๐Ÿ”— yipdw BDD in action, ladies and gentlemen
19:04 ๐Ÿ”— bsmith094 just did a git pull on ffnetgrab, ive got a redis instance running, what do i run, in what orderZ?
19:07 ๐Ÿ”— bsmith094 yipdw:
19:07 ๐Ÿ”— yipdw the scripts can be run in any order
19:07 ๐Ÿ”— yipdw but they are not ready for use
19:07 ๐Ÿ”— yipdw because they don't retrieve CSS and JS
19:08 ๐Ÿ”— yipdw due to b.fanfiction.net's always-gzip-even-if-not-requested behavior
19:08 ๐Ÿ”— bsmith094 also, minor thing hoard is pulling nothing from the redis queue
19:08 ๐Ÿ”— yipdw you probably have nothing in Redis
19:08 ๐Ÿ”— bsmith094 i synced the db yessterday]
19:09 ๐Ÿ”— yipdw redis-cli scard stories
19:09 ๐Ÿ”— yipdw what does that return
19:09 ๐Ÿ”— SketchCow tef: msg me, and I'll do it
19:09 ๐Ÿ”— bsmith094 0? thats odd
19:10 ๐Ÿ”— yipdw if you're not starting up redis-server in the same directory as the dump file, or if your redis.conf doesn't point to one, then redis-server will not load the dump
19:10 ๐Ÿ”— yipdw the fanfiction.net grab database is not insubstantial: on my amd64 machine, Redis consumes 500 MB of memory for it
19:10 ๐Ÿ”— tef yipdw: gzip regardless is technically http/1.1 compliant
19:11 ๐Ÿ”— yipdw tef: not if the client doesn't send Accept-Encoding: gzip
19:11 ๐Ÿ”— bsmith094 wheres the dump file by defualt
19:12 ๐Ÿ”— tef yipdw: what are you sending in the accept-encoding field ?
19:12 ๐Ÿ”— yipdw tef: whatever wget is sending
19:12 ๐Ÿ”— tef because 'identity' was removed in the errata
19:12 ๐Ÿ”— yipdw (also, I don't think wget is an HTTP 1.1 client)
19:13 ๐Ÿ”— tef oh it doesn't send an accept-encoding header so
19:14 ๐Ÿ”— tef If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding.
19:14 ๐Ÿ”— yipdw yeah, I just read that
19:14 ๐Ÿ”— tef and yeah identity was removed
19:14 ๐Ÿ”— tef in the errata - I know the httpbis draft better than 2616 - I might be confused
19:14 ๐Ÿ”— tef although iirc you can also send chunked back if the client doesn't ask for it
19:16 ๐Ÿ”— yipdw well, all that said, even setting e.g. Accept-Encoding to chunked will still give you gzipped data
19:17 ๐Ÿ”— yipdw b.fanfiction.net really just sends that out, and wget can't handle it
19:17 ๐Ÿ”— yipdw where "handle it", for my purposes, means "decompresses the data and writes the decompressed data into a WARC"
19:18 ๐Ÿ”— yipdw it's important that wget do that because some WARC tools, like Wayback, prepend uncompressed data to archived assets
19:18 ๐Ÿ”— tef I could always hack warc2warc in warctools to have a 'write uncompressed http' option
19:18 ๐Ÿ”— yipdw :P
19:18 ๐Ÿ”— yipdw and if you do that to compressed data then really bad things happen
19:18 ๐Ÿ”— tef hmm ?
19:18 ๐Ÿ”— yipdw well, "really bad" meaning "it's unreadable"
19:18 ๐Ÿ”— tef no I mean post-processing the warcs to find http responses and decompress them if necessary, before recompressing the warc record
19:19 ๐Ÿ”— yipdw oh yaeh
19:19 ๐Ÿ”— yipdw that'd work
19:19 ๐Ÿ”— SketchCow bsmith094: I assume the machine being down meant you couldn't finish uploading nickel.7z
19:19 ๐Ÿ”— bsmith094 yes
19:19 ๐Ÿ”— SketchCow It's back.
19:19 ๐Ÿ”— yipdw tef: alternatively, I should be able to retrieve the compressed assets separately and append them to the WARC, right?
19:19 ๐Ÿ”— SketchCow How big is it and how did you grab it all
19:19 ๐Ÿ”— bsmith094 thanks, rsync up
19:19 ๐Ÿ”— yipdw tef: the WARCs generated by downloading bits of fanfiction.net aren't too huge, so either approach will be fine
19:20 ๐Ÿ”— bsmith094 actually about 1.4gb, and with down them all
19:21 ๐Ÿ”— emijrp wikipedia is going to blackout the site on wednesday
19:21 ๐Ÿ”— emijrp anyone interested on creating a mirror ?
19:22 ๐Ÿ”— emijrp English Wikipedia
19:22 ๐Ÿ”— yipdw of the blacked-out site or of all of Wikipedia?
19:22 ๐Ÿ”— ersi emijrp: Calm down, it's just for a few hours/a day
19:22 ๐Ÿ”— emijrp only enwp
19:22 ๐Ÿ”— yipdw I don't think I have enough bandwidth to mirror the English Wikipedia in two days
19:22 ๐Ÿ”— yipdw let alone storage space
19:23 ๐Ÿ”— chronomex use ru. ad google translate
19:23 ๐Ÿ”— yipdw and yeah, this isn't a permanent thing
19:23 ๐Ÿ”— chronomex it will be 100% same
19:24 ๐Ÿ”— emijrp ok, no badnwidth
19:25 ๐Ÿ”— emijrp a firefo addon to broke the CSS/JS trick to hide the content
19:25 ๐Ÿ”— yipdw how do you know it's just hiding content?
19:25 ๐Ÿ”— yipdw also I can't see this as a really huge problem, sorry :P
19:26 ๐Ÿ”— emijrp i have privilege info
19:26 ๐Ÿ”— emijrp ; )
19:27 ๐Ÿ”— yipdw well if it really is just that, just use Firebug and disable the offending styles
19:27 ๐Ÿ”— yipdw (or whatever)
19:28 ๐Ÿ”— emijrp yes, but i want an easy method for the million visitors that will need to read wikipedia on wednesday
19:28 ๐Ÿ”— yipdw write one then
19:29 ๐Ÿ”— ersi Who don't you just roll your thumbs for a day and let them have their campaign?
19:29 ๐Ÿ”— ersi It's not like they're deleting everything
19:29 ๐Ÿ”— yipdw my opinion is that circumventing a blackout defeats its point
19:29 ๐Ÿ”— ersi Same here.
19:29 ๐Ÿ”— yipdw and I don't really care for SOPA/PIPA so
19:30 ๐Ÿ”— yipdw if Wikipedia can leverage their Alexa ranking to further the anti-SOPA/PIPA agenda, I'm all for letting them do that
19:30 ๐Ÿ”— bsmith094 ok ive got the dump file in the ffnet dir, what should hoard be diong because it spolulated the todo list and its just hanging there, also yes, let them have their blackout, make an anti sopa video or something
19:31 ๐Ÿ”— bsmith094 or link to the eff's video
19:31 ๐Ÿ”— yipdw bsmith094: paste console output, I have no idea what "hanging" means
19:31 ๐Ÿ”— bsmith094 I, [2012-01-16T14:28:06.883313 #10492] INFO -- : Populating todo queue.
19:31 ๐Ÿ”— bsmith094 ben@ben-laptop:~/ffnet-grab$ ruby hoard.rb
19:31 ๐Ÿ”— bsmith094 I, [2012-01-16T14:28:11.107122 #10492] INFO -- : Todo queue populated with 3658953 story IDs.
19:31 ๐Ÿ”— yipdw not here
19:31 ๐Ÿ”— yipdw oh well
19:31 ๐Ÿ”— bsmith094 damn line breaks
19:32 ๐Ÿ”— yipdw oh
19:32 ๐Ÿ”— yipdw I can tell you why
19:32 ๐Ÿ”— yipdw because there is no code to fetch anything
19:32 ๐Ÿ”— yipdw https://github.com/ArchiveTeam/ffnet-grab/blob/master/hoard.rb#L107-109
19:32 ๐Ÿ”— yipdw and that was intentional.
19:32 ๐Ÿ”— yipdw as I said, the scripts are not ready
19:33 ๐Ÿ”— yipdw they won't be ready until I can work out a method to retrieve usable CSS and JS, either via tef or some other way
19:33 ๐Ÿ”— bsmith094 ohhh, i see you commented out the cmd, sneaky, one line out, and it does nothing
19:33 ๐Ÿ”— yipdw I commented out nothing
19:33 ๐Ÿ”— yipdw there is simply no code there
19:36 ๐Ÿ”— yipdw #{...} in Ruby strings doesn't mean "comment", it means "interpolate"
19:36 ๐Ÿ”— yipdw if you're referring to https://github.com/ArchiveTeam/ffnet-grab/blob/master/hoard.rb#L72
19:52 ๐Ÿ”— tef yipdw: fwiw there is now a -D option in warctools: warc2warc.py
19:53 ๐Ÿ”— tef so python warc2warc.py -D -Z in.warc > out.warc.gz
19:53 ๐Ÿ”— tef should work
19:53 ๐Ÿ”— tef it's been pushed to code.hanzoarchives.com
19:54 ๐Ÿ”— yipdw tef: awesome, thanks
19:54 ๐Ÿ”— tef i've tested it by hand on the warcs we create
19:54 ๐Ÿ”— tef haven't tested warc-wget
19:54 ๐Ÿ”— yipdw will that recompress the WARC record-by-record?
19:54 ๐Ÿ”— tef yes
19:54 ๐Ÿ”— yipdw cool
19:55 ๐Ÿ”— yipdw I think that should work; I'll let you know
19:55 ๐Ÿ”— tef if I am not around on irc, file a bug or email me at my work address: thomas.figg@hanzoarchives.com
19:56 ๐Ÿ”— yipdw tef: actually, am I missing something? commit history for warc-tools on code.hanzoarchives.com doesn't show any commits later than 2011-12-07 -> http://code.hanzoarchives.com/warc-tools/changesets
19:56 ๐Ÿ”— tef oh i'm a muppet
19:56 ๐Ÿ”— yipdw heh
19:56 ๐Ÿ”— tef *actually pushed now*
19:56 ๐Ÿ”— yipdw ah ha, there we go
19:56 ๐Ÿ”— yipdw cool
19:56 ๐Ÿ”— tef memo to self: work repo != public repo
21:59 ๐Ÿ”— SketchCow Serious talk underway to filter or banner archive.org for SOPA
22:00 ๐Ÿ”— rixard what the heck is this SOPA I keep hearing about? I suppose it's a US law?
22:00 ๐Ÿ”— rixard or is it an organization?
22:01 ๐Ÿ”— SketchCow US Law
22:01 ๐Ÿ”— SketchCow Being discussed
22:01 ๐Ÿ”— SketchCow It adds henious restrictions to internet activity
22:01 ๐Ÿ”— SketchCow That's why people don't like it.
22:02 ๐Ÿ”— SketchCow The act of streaming copyrighted movie is a felony
22:02 ๐Ÿ”— SketchCow There's lots of things
22:04 ๐Ÿ”— rixard I think the internet as we know it is worth fighting over for a number of reasons. One, it's the first time humanity can mass communicate ideas, words, thoughts etc in an unrestricted way. And yes it has some downfalls but looking to the big picture it's a really awesome invention. We can learn about stuff without it being filtered thru the regular media for example.
22:07 ๐Ÿ”— rixard I am reading about SOPA on wikipedia and man, it doesn't sound like anything I'd like to see the future internet become.
22:07 ๐Ÿ”— Coderjoe I think some in big content that crafted and are pushing for SOPA can see the "unintended" consequences, such as sites with user content like youtube thinking it is too much trouble and closing down. If such sites closed, they would again have most of the control on distribution to consumers.
22:09 ๐Ÿ”— yipdw SOPA is actually Spanish for soup, I dunno what all this internet talk is
22:09 ๐Ÿ”— nitro2k01 It's better in Swedish
22:09 ๐Ÿ”— nitro2k01 where it means trash
22:09 ๐Ÿ”— yipdw heh
22:09 ๐Ÿ”— Coderjoe http://theswash.com/liberty/10-technologies-that-congress-tried-to-kill
22:10 ๐Ÿ”— chronomex http://en.wikipedia.org/wiki/Red_Flag_Act
22:10 ๐Ÿ”— rixard I don't know in what world some politicians live in. SOPA reminds me of some EU politician who wanted email to have a postage fee much like regular mail *doh*
22:11 ๐Ÿ”— nitro2k01 If that could help stop spam, maybe...
22:11 ๐Ÿ”— nitro2k01 As in, the receiver has the right to charge a small fee
22:12 ๐Ÿ”— yipdw or something like hashcash
22:12 ๐Ÿ”— nitro2k01 Yeah, that might work too
22:12 ๐Ÿ”— rixard well I'd rather let my spam filters do their work than risk step by step being brought into something where sending email might require postage, a paypal account, a valid VISA card etc
22:13 ๐Ÿ”— rixard perhaps it's time to go back to BBSes
22:13 ๐Ÿ”— nitro2k01 And pay via your phone bill...
22:13 ๐Ÿ”— yipdw not sure how that solves the spam issue
22:14 ๐Ÿ”— rixard my referral to BBSes was to get away from SOPA.
22:14 ๐Ÿ”— yipdw oh
23:46 ๐Ÿ”— tef yipdw: any luck with warctools ?
23:46 ๐Ÿ”— yipdw tef: haven't tried; I'll give them a shot when I get hom
23:46 ๐Ÿ”— yipdw e
23:46 ๐Ÿ”— yipdw (hopefully in a half-hour or so when this test suite goes green)
23:50 ๐Ÿ”— tef cool

irclogger-viewer