#archiveteam 2011-12-05,Mon

↑back Search

Time Nickname Message
00:00 🔗 db48x2 let's see. error 7 is failure to connect to the host, and 56 is failure in receiving network data
00:01 🔗 bsmith093 xargs runs curl as many times as insatnce is set for, but the instant any one of them hit an error they all are left to finish up then the sript quits
00:01 🔗 db48x2 right
00:01 🔗 db48x2 that's how xargs works
00:02 🔗 bsmith093 oh ok, so is therea way to print an error and keep xarge going
00:03 🔗 bsmith093 http://tracker.archive.org/ff.net/numbers this file has the full list of ids to check, thats that the curl func does, xargs check multiple at once, or at least its supposed to
00:04 🔗 yipdw if you're getting error 7 or 56 "seemingly at random", it may be fanfiction.net
00:04 🔗 yipdw also, that script you posted is kind of nuts
00:05 🔗 bsmith093 i agree, not my scriot but the best arrith could do on short notice
00:05 🔗 yipdw you're creating a subshell and are evaluating a function and then executing it for every job
00:05 🔗 bsmith093 yipdw: but why is it nuts
00:05 🔗 yipdw there's a simpler way to do it :P
00:05 🔗 yipdw one moment
00:05 🔗 bsmith093 yay, so whats the better way
00:06 🔗 db48x2 well, you could put the contents of the function into another sh file, and call it that way :)
00:06 🔗 bsmith093 the numbers, cause they are
00:06 🔗 yipdw I'm still not quite sure why there even needs to be another function
00:06 🔗 db48x2 but that's a minor thing
00:07 🔗 bsmith093 this is why i gave the link, so you all could tweak and offer suggestions :P
00:13 🔗 yipdw oh, I see why
00:13 🔗 yipdw I guess xargs won't execute a shell function
00:13 🔗 db48x2 no
00:14 🔗 yipdw bsmith093: out of curiosity, do you get the curl error if you run with instance_count=1
00:14 🔗 bsmith093 not sure hold on
00:15 🔗 yipdw also, is it always on the same IDs
00:16 🔗 yipdw finally, I'm wondering if it would be more efficient to do this via spidering the fanfiction.net story indices
00:16 🔗 yipdw 76.2 megabytes of IDs is a lot
00:16 🔗 yipdw ?
00:16 🔗 yipdw how many of those IDs actually reference stories
00:16 🔗 bsmith093 probably several million
00:16 🔗 bsmith093 and thers only 10 mil max so why not be methodical
00:18 🔗 bsmith093 appears not
00:18 🔗 yipdw finally, I don't think that script actually archives stories in full
00:19 🔗 bsmith093 inst count =1 no problem, but its not in paralell, so it kinda defeats the puroose
00:19 🔗 bsmith093 purpose
00:19 🔗 yipdw how does it behave on stories with multiple chapters, e.g. http://www.fanfiction.net/s/5909536/?
00:19 🔗 bsmith093 wait just crashed yep same error
00:20 🔗 bsmith093 it doesn't actually grab them just checks if the id is valid
00:20 🔗 bsmith093 im building a linklist
00:21 🔗 yipdw hm
00:21 🔗 yipdw I maintain it would be more efficient (for you and for them) to start at the roots on http://www.fanfiction.net/
00:21 🔗 yipdw and trace from there
00:21 🔗 yipdw by using WWW::Mechanize/Mechanize/etc.
00:21 🔗 yipdw I've got to run, though, so I can't provide an example
00:21 🔗 yipdw maybe later
00:22 🔗 yipdw usage of those tools does mean leaving bash and using Perl, Python, Ruby, or whatnot, but IMO those are better languages for this sort of stuff anyway
00:22 🔗 yipdw bbl for real
00:26 🔗 bsmith093 connection dropped out, what'd i miss
00:41 🔗 underscor Nothing
00:42 🔗 bsmith093 how do i get wget --spider to give up a linklist fort he ehole site
00:42 🔗 underscor Dunno off the top of my head
00:43 🔗 underscor On a side note, I'm almost to 1,500,000
00:43 🔗 underscor Simply using this
00:43 🔗 underscor for i in `cat numbers_[a-e][n-z] `;do var=`curl -A "ArchiveTeam/1.0 - Email archiveteam@k-srv.info for misbehavior or complaints" -I http://www.fanfiction.net/s/$i|grep Last`;echo -n "$i - ";if [ -z $var ]; then echo "Not a story";else echo "Story";echo $i>>stories_aa;fi;done
01:01 🔗 chronomex k-srv.info, who is that?
01:07 🔗 arrith yipdw: took me like an hour to work out that xargs subshell thing. seriously.
01:08 🔗 arrith yipdw: they want you to put stuff into another script then have xargs in your original script run *that*
01:08 🔗 arrith yipdw: and i am quite proud of my (crazy) workaround :D
01:12 🔗 arrith yipdw: btw underscor is doing his own script that's more thorough, what i'm doing is just a dirty/fast grab for the stories as really just a proof of concept.
01:20 🔗 bsmith093 hey im running underscor's thing with his files from ffnet tracker, and its picking up where he left off
01:21 🔗 bsmith093 still 81 days though
01:22 🔗 underscor bsmith093: What do you mean where I left off?
01:22 🔗 bsmith093 the file stories aa is growning
01:22 🔗 bsmith093 growing
01:24 🔗 underscor I know, I'm saying I didn't leave off anywhere
01:25 🔗 arrith bsmith093: afaik that's a snapshot of his work, he's probably farther along than that
01:25 🔗 bsmith093 well ok then
01:25 🔗 underscor oh, yeah
01:25 🔗 underscor sorry
01:25 🔗 underscor my bad
01:26 🔗 underscor http://tracker.archive.org/ff.net/stories_0-1299999
01:26 🔗 underscor That might be of interest though
01:26 🔗 underscor Those are all the valid ones
02:16 🔗 bsmith094 underscor: im running your script now, since yoru so much further ahead than me, and it keeps failing Running storyinator on id 0000004 Let's get some metadata. Frontpage Gotten Title is Little Helper Writen by Sheryl Nantus, whose userid is 3284 Placed in tv>>X-Files Tags are Rated: K+, English, F. Mulder & D. Scully, P:3-16-99 Published 3-16-99, updated Story has 38 reviews, which is 3 pages chapters in this story Making dir
02:17 🔗 underscor That all looks correct
02:17 🔗 underscor Do you have php
02:17 🔗 underscor and do you have the php file xmlr.php?
02:17 🔗 bsmith094 yeah about, that could not open imput file xmlr.php
02:18 🔗 underscor Did you download it?
02:18 🔗 underscor :P
02:18 🔗 bsmith094 yes
02:19 🔗 bsmith094 man php says yes i do have php, but maybe not the right version or something
02:20 🔗 arrith bsmith094: what's wrong with my script :(
02:21 🔗 bsmith094 still running arrith
02:21 🔗 arrith it gets the job done of finding IDs. and in parallel!
02:21 🔗 arrith oh
02:21 🔗 arrith bsmith094: looks like it's working?
02:21 🔗 bsmith094 apparently
02:21 🔗 bsmith094 using underscors numbers because ho's got so many
02:24 🔗 bsmith094 now as for the actual downloading of the stories, well thats more complicated, according to whatever black arts this this is using http://pastebin.com/e5e4tvK5
02:39 🔗 arrith bsmith094: Unknown Paste ID!
04:18 🔗 yipdw arrith: I see
04:19 🔗 yipdw arrith: at that point, in my opinion, it probably is clearer to switch to a different programming language and use e.g. a thread pool
04:22 🔗 arrith psssshh
04:23 🔗 arrith yeah. i have a very small and light and elegant reimplementation of that in python written by a friend of mine using a threadpool even but eh, i don't know python yet
04:23 🔗 yipdw or, more appropriately
04:23 🔗 yipdw a queue
04:23 🔗 yipdw thread pool being an implementation detail, obviously :P
04:24 🔗 arrith http://paste.pocoo.org/show/516501/
04:25 🔗 arrith ah, i don't think i know the difference between a pool and a queue
04:25 🔗 yipdw yeah, pretty much
04:25 🔗 yipdw they're different structures, not directly related
04:26 🔗 yipdw the idea being that you throw all of your tasks (IDs, in this case) into a queue, and then there exist multiple executors that dequeue a task, work on it, and then check it in
04:26 🔗 yipdw the thread pool is a way to limit the number of concurrent executors
04:26 🔗 arrith ah
04:26 🔗 arrith does python's multiprocess use a queue?
04:27 🔗 yipdw the map() function probably does
04:28 🔗 arrith ah
04:36 🔗 bsmith094 im back, is the python code just an example?
04:37 🔗 bsmith094 anyway, now im trying to get underscor's storyinator.sh to work
04:49 🔗 godane i have 193 episodes of crankygeeks now
04:50 🔗 godane i also have all crankygeeks episodes posts
04:50 🔗 bsmith094 publicly availible yet
04:50 🔗 godane i have not uploaded anything yet
04:51 🔗 godane i have backed them up on dvd
04:51 🔗 godane i have md5sum file for making sure data is right
04:51 🔗 bsmith094 how many dvds
04:51 🔗 godane 3 so far
04:52 🔗 godane it will be at least 6 dvds when fully done
04:52 🔗 dnova nice. that's hardly anything
04:52 🔗 bsmith094 wow
04:52 🔗 godane *6 single layer dvds
04:53 🔗 godane this is only one format
04:53 🔗 dnova there's no reason to get the other formats
04:53 🔗 dnova if you're getting the best one
04:53 🔗 godane i'm getting ipod one
04:53 🔗 dnova the smaller ones can be recreated from those if necessary.
04:53 🔗 dnova wait... really?
04:53 🔗 bsmith094 so mp3
04:53 🔗 dnova oh they're just podcasts?
04:53 🔗 dnova I thought they were videos
04:54 🔗 godane there just podcasts
04:54 🔗 dnova that should be fine then.
04:54 🔗 godane but its video podcasts
04:54 🔗 dnova ... uh
04:54 🔗 dnova ok... well, mp3 has no video
04:54 🔗 godane video with audio podcasts
04:55 🔗 dnova you should be getting the highest quality version of them
04:55 🔗 godane i did for the first 70
04:55 🔗 godane mpeg4 would have had to change to quicktime
04:56 🔗 godane cause mpeg4 became the ipod format
04:57 🔗 bsmith094 i thought mpeg4 was basically quicktime
04:58 🔗 godane there is .mp4 files then there is .mov files
05:00 🔗 dnova mp4 is independent of quicktime
05:01 🔗 godane anyways the videos are not that big
05:01 🔗 godane backing up something is better then nothing
05:01 🔗 Coderjoe the isomedia mp4 container format is largely based on the quicktime container format. (note, quicktime is like AVI in this regard: both can contain a number of different codecs. the mp4 container is a bit more limited)
05:02 🔗 godane you don't need to backup the 1.6gb 720p videos of podcasts
05:02 🔗 dnova godane: says who?
05:02 🔗 dnova I guess if it's not that important to you, that's fine
05:03 🔗 dnova I don't know anything about that podcast and I am personally not too concerned about it
05:03 🔗 dnova but if it's worth doing, it's worth doing right, isn't it?
05:03 🔗 godane it takes along time to download and upload 1.6gb file
05:03 🔗 dnova what's the deadline?
05:04 🔗 godane also crankygeeks doesn't have HD
05:04 🔗 godane the biggest file is like 140ishmb
05:04 🔗 dnova so where did 1.6gb come from?
05:04 🔗 dnova get the 140mb or whatever the best quality files are
05:05 🔗 godane i'm just getting the ipod one
05:05 🔗 godane sorry
05:05 🔗 dnova I don't give a shit, but it sounds like you do
05:05 🔗 dnova but only enough to half-ass it
05:05 🔗 godane i watch the videos
05:05 🔗 godane the is not big differents between the too
05:05 🔗 dnova maybe they'll upload them to youtube for near-term preservation and availability
05:05 🔗 godane *twno
05:06 🔗 bsmith094 dnova: hey, i don't particularly care for most of the stuff on ffnet, either, but im still saving them preemptively, cause it would really suck if that much creativity went into the bitbucket
05:07 🔗 dnova right
05:07 🔗 dnova and I don't care about splinder on a personal level either
05:07 🔗 bsmith094 nor me
05:07 🔗 dnova but I've spent lots of time and a decent amount of money to grab as much as I can
05:07 🔗 dnova and I'm still grabbing.
05:07 🔗 bsmith094 at all, but as long as it was that easy to help out, i did'
05:08 🔗 dnova hell yeah man!
05:08 🔗 dnova they did a bang-up job with that.
05:08 🔗 bsmith094 now, ffnet, for being a fully automated site, is a pita to grab, all of it any way, pinging the ids just to see which urls are valid will take about a month
05:09 🔗 dnova I will help with that if possible. just let me know.
05:09 🔗 yipdw I'm still not sure why you're going through all the IDs
05:09 🔗 dnova it's not a huge time crunch with that project so don't stress too much about it
05:09 🔗 yipdw have you identified some problem with using the fanfiction.net indices?
05:09 🔗 yipdw e.g. have they blocked some stories from showing up?
05:09 🔗 bsmith094 we dont really have a script yet, weve, * and i mean underscor and arrith , have some tentative efforts going
05:10 🔗 dnova yeah.. when it's a little more fleshed out I'll throw some hardware at it
05:10 🔗 bsmith094 yipdw: uhh no, i just want to save them bc thats a lot of work to dissappear
05:10 🔗 yipdw that's not the question I asked
05:10 🔗 dnova he means why are you brute forcing the ID list
05:10 🔗 yipdw yes
05:10 🔗 yipdw fanfiction.net has, from what I can see, a perfectly usable story index
05:10 🔗 bsmith094 oh well , umm, thats the easiest way ive found
05:10 🔗 * yipdw sighs
05:10 🔗 bsmith094 where?
05:11 🔗 yipdw their web page
05:11 🔗 yipdw one moment
05:11 🔗 yipdw I have some time now, let me whip up a demo
05:11 🔗 bsmith094 we could scrape the feed, butthat goes forward not back
05:11 🔗 yipdw no no
05:11 🔗 yipdw I mean the page itself
05:11 🔗 yipdw e.g. http://www.fanfiction.net/play/
05:11 🔗 bsmith094 ok then whip away, cause u lost me
05:11 🔗 yipdw every story is linked from these lists
05:11 🔗 yipdw (as far as I can tell)
05:12 🔗 yipdw if you have some counterexamples, I would like to hear them
05:12 🔗 bsmith094 ummm, but its easier just to grab the story ids directly
05:13 🔗 bsmith094 all u need then is to find out how many chapters each on is, and thats on the first page of each one
05:13 🔗 yipdw the initial implementation is easier, but:
05:13 🔗 yipdw (1) it requires a pre-filtering step that (you say) will take a month
05:13 🔗 yipdw (2) it's really inconsiderate
05:13 🔗 yipdw (fanfiction.net isn't dying)
05:13 🔗 dnova heh
05:13 🔗 yipdw and the point of archiveteam, as far as I know, is to archive, not be assholes
05:14 🔗 yipdw the latter sometimes happens but not as an objective
05:14 🔗 dnova to be fair, he's not trying to be an asshole at all
05:14 🔗 yipdw bear in mind that every GET for a story ID likely requires a database lookup, unless ff.net has done some caching along those lines
05:14 🔗 yipdw I know he's not
05:14 🔗 yipdw I'm just saying that brute-forcing is a pretty inconsiderate way to do thngs
05:14 🔗 yipdw which is also why I'm writing up an alternative
05:14 🔗 bsmith094 im just using curl to scrape the head of the urls
05:16 🔗 bsmith094 yipdw: so what's your alternative
05:16 🔗 yipdw you have a set of roots, right
05:16 🔗 bsmith094 yeah
05:16 🔗 yipdw namely, the fanfiction categories on the main page
05:16 🔗 yipdw ok
05:16 🔗 bsmith094 with u so far
05:16 🔗 yipdw each root contains a set of categories
05:16 🔗 yipdw each category contains a set of stories
05:16 🔗 yipdw therefore, there is no need to test each ID
05:17 🔗 yipdw and you can begin archiving stories immediately
05:17 🔗 bsmith094 uhuh, a so what wget --spider -m?
05:17 🔗 yipdw I don't know what tool you'll use; I'm writing a tool in Ruby at the moment
05:30 🔗 yipdw whoa my rubinius install is out of date
05:30 🔗 yipdw time to update
05:34 🔗 dnova update it good
05:41 🔗 bsmith094 so anyway, wget -m witha ua changed to firefox seems to be saving the same links tructure as well, so no resorting of ids back into categories
05:47 🔗 Coderjoe did you apply a wait time (possibly with the random wait options as well?)
05:52 🔗 yipdw ok
05:53 🔗 yipdw https://gist.github.com/1432483
05:53 🔗 yipdw that doesn't actually save anything yet, but it can be extended to do so
05:53 🔗 yipdw the idea is to demonstrate a more targeted approach
05:54 🔗 yipdw if you run that (use Ruby 1.9.3, JRuby in 1.9 mode, or Rubinius 2.0.0 in 1.9 mode) you'll see how it works
05:54 🔗 yipdw attaching an example run log now
05:54 🔗 yipdw attached
05:54 🔗 yipdw note, too, that paginated categories are treated as just more categories
05:55 🔗 yipdw there's some deduplication work to be done there, but
05:55 🔗 yipdw eh
05:56 🔗 yipdw one possibility for saving with the script I linked is to save each story as its own WARC, reviews and all; that'd eliminate the need for a separate review queue
05:56 🔗 yipdw that assumes that the unit of work you want to save is the story
05:56 🔗 yipdw which I think is true.
05:57 🔗 underscor yipdw: That's pretty spiffy!
05:58 🔗 yipdw I think it's probably buggy
05:58 🔗 yipdw there are some duplicate names showing up; the link selection logic probably needs to be refined
05:58 🔗 yipdw but that's the idea
05:58 🔗 yipdw as a bonus, the number of instances you run can be carefully controlled by simply changing the size of the connection pool
06:01 🔗 arrith aha
06:02 🔗 arrith underscor: i was going to ping you to make sure you saw this discussion, yeah some interesting stuff
06:02 🔗 yipdw oh oops
06:02 🔗 yipdw my category-detection scheme fails on crossovers
06:08 🔗 yipdw heh, that's annoying
06:08 🔗 yipdw http://www.fanfiction.net/crossovers/movie/ has broken HTML
06:09 🔗 bsmith094 yipdw: well, its official, your ruby kicks my wget's ass
06:09 🔗 bsmith094 probably more efficient, too
06:10 🔗 yipdw keep in mind that this code doesn't actually save anything yet
06:10 🔗 yipdw I'm not sure how you want to do that
06:10 🔗 bsmith094 im fine with category/show/userid/story
06:11 🔗 bsmith094 and you have a repo, which makes updating SO much easier
06:11 🔗 yipdw also, I'm not sure how hard it would be to get wget-warc to do this
06:11 🔗 yipdw (haven't tried)
06:11 🔗 yipdw there are advantages to using that, such as making it easier to replicate fanfiction.net's structure
06:11 🔗 bsmith094 i still don't get why warc is important?
06:11 🔗 dnova why did we have to compile wget-warc for splinder?
06:12 🔗 yipdw dnova: there's no official release of wget + WARC capabilities
06:12 🔗 Coderjoe because the warc features are not in most distro's package repos yet
06:12 🔗 bsmith094 this code would work for fictionpress as well, since they're identical
06:12 🔗 yipdw bsmith094: I think it's important to capture not only the story data but also the circumstances under which the capture was done
06:12 🔗 Coderjoe the warc features have been accepted into wget's mainline, however
06:12 🔗 yipdw WARC provides that
06:12 🔗 dnova ah, interesting.
06:12 🔗 bsmith094 huh, well ok then
06:13 🔗 yipdw also, IA is set up to ingest WARCs, I think
06:13 🔗 Coderjoe yes, the wayback is set up to ingest warc pretty much automatically (once someone feeds the warc to it)
06:14 🔗 bsmith094 so something like Books/Harry Potter/1234567/2345678/blah.html
06:14 🔗 yipdw so do you just want to archive the text of the stories?
06:14 🔗 yipdw or are you after more than that?
06:14 🔗 bsmith094 ok its late or early, so gnight yall
06:14 🔗 yipdw because if it's just text, fanfiction.net's mobile site is actually better suited for this
06:14 🔗 yipdw (it's simpler)
06:15 🔗 Coderjoe yipdw: he's after just the text. I'd prefer a full warc set
06:15 🔗 yipdw Coderjoe: full WARC set of all stories, one story per WARC?
06:15 🔗 yipdw or a WARC archive of the whole site
06:15 🔗 Coderjoe well, IIRC, he wants the text, author comments, and reviews
06:15 🔗 yipdw ok
06:15 🔗 dnova a warc for the entire site would require LOTS of ram, I think
06:15 🔗 yipdw dnova: yeah
06:16 🔗 yipdw I guess what I should be asking is
06:16 🔗 bsmith094 actually that would be a great bonus but ill take jus the stories if that all i can grab
06:16 🔗 yipdw what's the objective here
06:17 🔗 yipdw is the idea to take e.g. http://www.fanfiction.net/s/6635497/1/Plotting_The_Unknown_Future and wrap it into a WARC, comments, reviews and all?
06:17 🔗 yipdw for ingestion into IA?
06:18 🔗 yipdw anyway, I'll clean up that ff Ruby code and dump it into an AT repo on github
06:18 🔗 yipdw that loop { sleep 5 } bullshit needs to go
06:19 🔗 yipdw PSA: if anyone is doing sleeps like that in threads and you're not waiting on a periodic source, you have sinned
06:20 🔗 dnova I'll take your word for it
06:20 🔗 yipdw arguably sleeping on periodic sources is a bad idea anyway
06:21 🔗 yipdw er, as a wait for
06:21 🔗 bsmith094 not a huge thing, and i feel like a jerk since i cant code wotrh a damn, but it would be just fantastic, if you could put the author profile page in there somewhere, as well as the reviews for each story as html, with the story
06:21 🔗 arrith yipdw: underscor is kinda leading the design on that
06:22 🔗 yipdw arrith: cool
06:22 🔗 yipdw again, this Ruby stuff is just a PoC
06:22 🔗 arrith i'm not sure what he's including but i'm hoping as much as possible
06:22 🔗 yipdw feel free to use or not use as needed
06:22 🔗 arrith once he's done with his bash+php+perl thing i want to look over it and try to convert it to python as much as possible, make sure it's getting everything comprehensively enough, then integrate it with the universal tracker for periodic scrapes
06:23 🔗 bsmith094 true, that why i feel like a jerk, im throwing out ideas, that equal more work for the rest of you guys, and i cant really contribute anything, but bandwidth to run whatever scripts you finally come up with
06:23 🔗 arrith alright. i don't know ruby but it looks pretty neat. i'll try to make sense of it
06:23 🔗 Wyatt|Wor Before I forget yet again, SketchCow, can I have an rsync slot? I've got some berlios and a wayward chunk of Google Groups.
06:23 🔗 arrith bsmith094: you could glance over a python tutorial :P
06:23 🔗 Wyatt|Wor arrith: Ruby is perl in a dress.
06:23 🔗 arrith bsmith094: http://learnpythonthehardway.org/
06:23 🔗 dnova arrith: do you know a good one?
06:23 🔗 dnova beat me.
06:23 🔗 arrith dnova: http://learnpythonthehardway.org/
06:23 🔗 dnova LOL
06:23 🔗 yipdw arrith: it starts at the roots -- the major subdivisions of the site
06:23 🔗 SketchCow OK, one moment.
06:23 🔗 dnova thanks :P
06:24 🔗 yipdw arrith: each root is thrown into the discovery queue, which generates more categories or story URLs
06:24 🔗 arrith dnova: that one and How To Think Like A Computer Scientist
06:24 🔗 yipdw arrith: from there, categories are sent to the discovery queue, story URLs are sent to the grab queue
06:24 🔗 bsmith094 that, ust now, was more activity in 2 min, than this feed hashad in a week
06:24 🔗 yipdw arrith: there's four executors for each queue, and four HTTP connections shared amongst all queues
06:25 🔗 yipdw it's similar in structure to what one might do with the multiprocessing package in python
06:25 🔗 yipdw just different names.
06:25 🔗 dnova this looks great, I'm going to check it out, thanks arrith.
06:25 🔗 arrith dnova: good :)
06:25 🔗 arrith yipdw: hmm yeah i'm hoping that's not too difficult to translate into python
06:25 🔗 yipdw arrith: it shouldn't be, Python has much of the same tools
06:26 🔗 yipdw one second
06:26 🔗 yipdw updating support.rb with smarter logic
06:26 🔗 arrith Wyatt|Wor: sounds about right
06:31 🔗 bsmith094 while were all here, has anyone checked out storyinator.sh from here, www.tracker.archive.org/ffnet
06:32 🔗 yipdw alrighty
06:32 🔗 yipdw https://gist.github.com/1432483/cdbfa4c8e9779e009838235da543fc0a08754862
06:32 🔗 Wyatt|Wor Oh? Hey now
06:32 🔗 Wyatt|Wor bsmith094: I'm getting 404
06:33 🔗 Wyatt|Wor Or not even 404
06:33 🔗 bsmith094 http://tracker.archive.org/ff.net
06:33 🔗 bsmith094 wrong link
06:33 🔗 Wyatt|Wor AH
06:34 🔗 arrith Wyatt|Wor: that's a bit of underscor's work so far
06:34 🔗 zetathust yeah
06:34 🔗 bsmith094 yeah i know
06:34 🔗 arrith mk
06:35 🔗 arrith just he's further along and it's a non portable proof of concept atm
06:37 🔗 arrith yipdw: do you generally prefer ruby to python for quick projects?
06:37 🔗 Wyatt|Wor non....portable? But it runs on anything with a bash interpreter...
06:37 🔗 Wyatt|Wor ;)
06:38 🔗 yipdw arrith: I've used Ruby more recently
06:38 🔗 yipdw so I find it easier to express programs in it
06:38 🔗 yipdw I have nothing against Python, though; I usually use it to script Blender
06:38 🔗 yipdw no complaints about Python there
06:38 🔗 arrith Wyatt|Wor: heh well, requires php. a novice user getting php up and running for a small script isn't the easiest
06:39 🔗 bsmith094 i have a layout idea for what to grab for the stories http://pastebin.com/W6tUR1VE
06:39 🔗 arrith yipdw: ah, i was wondering if you had experience with both or just knew ruby more
06:39 🔗 yipdw arrith: both :P
06:40 🔗 arrith yipdw: that's exactly why i'm writing things in bash and not python ;)
06:40 🔗 yipdw eh?
06:40 🔗 yipdw well
06:40 🔗 yipdw here's my problem with bash
06:40 🔗 yipdw the language is arcane as hell, it's not really THAT portable due to lots of differences between shell versions
06:40 🔗 yipdw and even if you have the same version, the installed utilities can differ
06:40 🔗 yipdw GNU du does not accept the same options as e.g. BSD du, for instance
06:40 🔗 yipdw so you end up coding abstractions for stuff like that
06:41 🔗 arrith oh yeah i have no defense for any of that
06:41 🔗 yipdw in the end I've found Python, Perl, Ruby to be more portable than bash :P
06:41 🔗 arrith yeah definitely
06:41 🔗 arrith i blame it on being 'raised wrong'. it's all i know!
06:41 🔗 arrith for now at least
06:41 🔗 Wyatt|Wor I love bash for the beauty that comes from some of the ugliest code on the planet.
06:42 🔗 bsmith094 ditto
06:42 🔗 bsmith094 i can actually follow most of it
06:42 🔗 Wyatt|Wor But I'm not going to pretend it's more than glue.
06:42 🔗 Wyatt|Wor Moreso than perl, even.
06:43 🔗 arrith i've had an unfortunate feedback loop of mainly knowing bash, so i start a project in it then google to fill in areas that i lack, and not just starting over doing the hard very beginning stuff with a new lang
06:43 🔗 yipdw for my next project, I'll get ArchiveTeam using Factor
06:43 🔗 no2p Why use bash when you can use ksh? ;)
06:43 🔗 yipdw http://factorcode.org/
06:44 🔗 Wyatt|Wor no2p: Actually people ask me this seriously here. I even developed an answer for it: Because Bash is everywhere.
06:44 🔗 no2p Oh, no doubt. I was joking in terms of 'looks'.
06:44 🔗 yipdw so is Java, but that hasn't helped much :P
06:44 🔗 yipdw well, that's unfair
06:45 🔗 yipdw in a server context it's fine
06:45 🔗 Wyatt|Wor yipdw: But Java is a boilerplate language not a glue language
06:45 🔗 yipdw re: portability
06:45 🔗 yipdw Wyatt|Wor: I don't understand the distinction
06:45 🔗 bsmith094 hey, i love java for its ubiquity
06:45 🔗 Coderjoe geh
06:46 🔗 Wyatt|Wor yipdw: Java you spend most of your time writing long strings of boilerplate code. Bash, you spend a lot of time gluing other things together until it does what you want.
06:46 🔗 bsmith094 not saying its good, or fast, but a jar will run on anything with a jvm
06:46 🔗 Coderjoe stop with all the esoteric languages. for the distributed downloading stuff, it should use a well-featured and widly-installed language (like python or perl)
06:46 🔗 yipdw I wasn't serious about Factor
06:46 🔗 bsmith094 nor me with java
06:46 🔗 arrith yipdw: AT needs easier code not harder code :P
06:47 🔗 bsmith094 **shudder**
06:47 🔗 arrith s'why i'm evangelizing python
06:47 🔗 yipdw python's fine
06:47 🔗 bsmith094 ive heard great things about it
06:47 🔗 yipdw Wyatt|Wor: I guess, although a lot of that applies to Java programming too; it's just that you write more to glue bits from libraries together
06:47 🔗 Coderjoe Wyatt|Wor: except bash relies on other userland tools (like gnu userland or bsd userland), which are not always completely compatible. Plus there were issues with your centos being out of date, buggy versions of grep, etc
06:47 🔗 arrith at least in terms of getting beginners up to speed and helping out with it
06:47 🔗 arrith i suppose if a person already knows java then other jvm-ish things might be easier for them
06:47 🔗 bsmith094 check out the archive box channel
06:48 🔗 Wyatt|Wor I prefer perl's flavour of sugar to python's, but that's personal preference.
06:48 🔗 bsmith094 .join #archivebox
06:48 🔗 yipdw arrith: I dunno, how many Java programmers do you know who have picked up Clojure :P
06:48 🔗 arrith yipdw: none that weren't told to, which i guess the AT would be doing heh
06:48 🔗 Wyatt|Wor Coderjoe: Right, gluing things together. I'm not advocating for doing all AT stuff in Bash, don't misunderstand
06:49 🔗 Wyatt|Wor (Or any of it, really)
06:50 🔗 Coderjoe (and I'm not saying python is free of problems either. I had to write some hacks recently to work around problems with python's win32 file api interaction...)
06:50 🔗 arrith Coderjoe: out of curiosity, what kind of problems?
06:50 🔗 Coderjoe for paths longer than 256 characters
06:50 🔗 arrith ah interesting
06:51 🔗 Coderjoe os.walk and such have the needed hacks in the main python code, but stat and open do not
06:52 🔗 Coderjoe https://gist.github.com/1432614
06:53 🔗 Coderjoe but it is partly windows' fault for being stupid with the paths
06:54 🔗 Wyatt|Wor That looks really bizarre
06:54 🔗 yipdw goddamnit, I just spent five minutes looking for my phone's TV-out cable and it was right next to me
06:55 🔗 Wyatt|Wor It's going to take a while to get used to this idea that mobile phones can output 1080p video over HDMI.
06:56 🔗 Coderjoe Wyatt|Wor: the \\?\ thing has to do with some win32 api hacks. see under "lpFileName" on http://msdn.microsoft.com/en-us/library/windows/desktop/aa363858%28v=vs.85%29.aspx
06:57 🔗 Coderjoe I would have thought the unicode version would be free of this MAX_PATH stupidity, but apparently not
06:59 🔗 yipdw Wyatt|Wor: I'm syncing contacts between two phones; one phone's screen is shattered, so need TV-out to enable Bluetooth
06:59 🔗 yipdw I'm impressed that the sync actually seems to be working
06:59 🔗 yipdw (granted, they're both Nokia products, but even so)
06:59 🔗 Wyatt|Wor Ooh, bummer. What handset?
07:00 🔗 yipdw N900 and N9
07:00 🔗 Wyatt|Wor Ah, those are nice. A pal of mine really digs his.
07:00 🔗 yipdw their biggest problem is that they've both been left for dead :P
07:01 🔗 Wyatt|Wor I know; that's really sad.
07:02 🔗 Wyatt|Wor MeeGo, from what I've seen, is really nice, too
07:03 🔗 yipdw I like the UI paradigm; the infrastructure has some rough spots
07:03 🔗 yipdw like the capabilities framework
07:03 🔗 yipdw I think a lot of that is because it was never finished
07:03 🔗 yipdw but chronomex is probably gonna ring the off-topic bell on me so I'll shut up now :P
07:06 🔗 arrith yipdw: well, i don't want to be ot but i commend your bravery for going for the N9 after what happened with the N900 and especially all that's happened around it. i was eying an N900 for a long time but at this point i'm waiting for cyanogenmod to get more debianish or to see what tizen turns out to be
07:07 🔗 yipdw arrith: heh, not so much bravery as "ooh, shiny"
07:07 🔗 yipdw MeeGo (or more precisely Nokia's Harmattan layer) irks me in that I'm trying to fix some of its problems (like no generic Jabber support) but so much of it is closed-source
07:08 🔗 yipdw so there's a lot more "huh, I guess I'll just have to poke at it" than IMO is necessary
07:09 🔗 arrith ah dang
07:09 🔗 bsmith094 i realize this is kind of random, but does I have an official IRC channel?
07:09 🔗 bsmith094 IA
07:10 🔗 arrith maemo was the closest i've seen to 'debian on a phone' but it's gotten pretty weird since that point
07:10 🔗 bsmith094 feel free to bite my hea off, but.... android
07:10 🔗 Coderjoe mmm
07:10 🔗 yipdw bsmith094: I don't think it does; poke underscor or SketchCow
07:10 🔗 Coderjoe 4k-16bit pngs of sintel are large
07:10 🔗 arrith bsmith094: #archive on freenode was mentioned back in 2005
07:11 🔗 bsmith094 underscor SketchCow
07:11 🔗 arrith Coderjoe: they still need to do a 4k render. this max of 720p is insulting.
07:11 🔗 Coderjoe they have a 4k and a 4k-16bit render
07:11 🔗 Coderjoe and a 1080p render
07:11 🔗 bsmith094 well its not here
07:11 🔗 arrith bsmith094: http://www.google.com/search?q=archive.org+irc+channel
07:12 🔗 arrith Coderjoe: eh? those weren't on the download page last i saw. they must be hidden
07:12 🔗 Coderjoe http://media.xiph.org/sintel/
07:12 🔗 arrith mm nice
07:13 🔗 bsmith094 ita empty, and automatedly dead
07:15 🔗 arrith bsmith094: you can bide your time with that python tutorial
07:17 🔗 bsmith094 k then, hey, speaking of sintel, whatever happened to that other foss movie, elephants dream, the dvd iso torrents are deader than luna, and i would really like them, yes i did check ia no they dont have it
07:18 🔗 Coderjoe i don't know. I have the 1080p pngs and flac audio for it, though
07:18 🔗 Coderjoe and I think I have the dvd somewhere too
07:20 🔗 bsmith094 well i found a torrent its running and i am so uploading this to IA when its finished in 2 days
07:21 🔗 dnova bsmith094: flesh out the FanFiction.Net wiki page please.
07:21 🔗 bsmith094 do i have edit rights?
07:22 🔗 dnova if you have any account yes
07:23 🔗 dnova I added it to http://archiveteam.org/index.php?title=Projects#Other_Projects
07:23 🔗 dnova but it needs some info
07:23 🔗 dnova even if it's very preliminary
07:25 🔗 bsmith094 <titleblacklist-forbidden-new-account>
07:26 🔗 bsmith094 so do i have an acoun tor not
07:26 🔗 dnova ... are you logged in?
07:26 🔗 dnova did you make an account?
07:26 🔗 dnova I'm not sure what to say
07:26 🔗 bsmith094 im trying toc reate one and i keep getting that error
07:27 🔗 Coderjoe I see nothing recent for the user creation log
07:27 🔗 Coderjoe http://www.archiveteam.org/index.php?title=Special:Log/newusers
07:28 🔗 Coderjoe what account are you trying to make?
07:28 🔗 bsmith094 bsmith093
07:28 🔗 dnova there is no user "bsmith*"
07:28 🔗 Coderjoe i wonder if something is filtering it thinking it looks too much like a spambot username?
07:28 🔗 bsmith094 k then ill try something else
07:29 🔗 Coderjoe thoygh we appear to have other spambots in the roost
07:29 🔗 Coderjoe http://www.archiveteam.org/index.php?title=Special:Contributions/Fdhbgj
07:30 🔗 bsmith094 EntropyWins tried that same error only thing i can think of is i screwed up the captcha
07:30 🔗 bsmith094 but not 6 times in a row
07:31 🔗 Coderjoe i don't know then.
07:31 🔗 Wyatt|Wor New signups may be turned off for the moment because SketchCow was hunting another SEO spammer
07:31 🔗 Coderjoe and I should probably hop in the time machine and go to bed 2 hours ago
07:32 🔗 Wyatt|Wor That's my hypothesis, at least.
07:33 🔗 Coderjoe arrith: btw, if you go to media.xiph.org, the page there lists the sizes of the different versions
07:33 🔗 arrith ah alright
07:35 🔗 arrith bsmith094: try a different nick type
07:35 🔗 arrith i think spammers put numbers on their usernames at some point
07:38 🔗 bsmith094 ok imin as NonCoderBen, now what do i say
07:38 🔗 arrith bsmith094: what you must
07:40 🔗 bsmith094 check it now
07:43 🔗 SketchCow No, new signings should be fine.
07:49 🔗 Wyatt|Wor Huh.
07:49 🔗 Wyatt|Wor Okay then, weird.
07:49 🔗 Wyatt|Wor Oh yeah, rsync for me?
07:51 🔗 bsmith094 arrith: that curl script is running 400 at once
08:09 🔗 SketchCow Ah yes, slot
08:12 🔗 bsmith094 gnoght/gmorning all
08:14 🔗 * kennethre yawns
08:26 🔗 SketchCow Sorry, got hung up on stupid thing.
08:26 🔗 SketchCow See, I moved bbsdocumentary.com to the new server, but it still has php infestation.
08:28 🔗 Wyatt|Wor No, no, it's cool. Those are nasty.
08:28 🔗 Wyatt|Wor What sort of infestation?
08:28 🔗 SketchCow php additions.
08:29 🔗 Wyatt|Wor Sorry to hear that. :/ Can you diff against a backup?
08:30 🔗 SketchCow Well, I can find the culripts, and I can shut off PHP on the new server.
08:30 🔗 SketchCow New server uses no PHP.
08:30 🔗 SketchCow PHP is garbage.
08:30 🔗 SketchCow People who like it like leaving "just one" door unlocked for convenience, but it's OK because all the other doors are locked.
08:31 🔗 SketchCow i.e. retards
08:31 🔗 chronomex PHP is the wrong tool for any job
08:32 🔗 chronomex kind of like a tin vise grip
08:32 🔗 Wyatt|Wor I've never been a fan and certainly haven't grown fonder. Wordpress has killed any good will it might have had from me.
08:32 🔗 chronomex heh
08:32 🔗 chronomex anyway.
08:35 🔗 SketchCow What the fuck is nef format
08:35 🔗 chronomex it's a raw file from a camera
08:36 🔗 chronomex I don't know what it stands for.
08:36 🔗 Wyatt|Wor Google sayeth Nikon Electronic Format.
08:36 🔗 chronomex https://www.google.com/search?q=nef+format
08:37 🔗 chronomex "Nikon exclusive NEF format"
08:37 🔗 SketchCow Well, I am excited to see what happens when I dump NEF format into archive.org
08:37 🔗 SketchCow Theory: Nothing
08:37 🔗 chronomex unlike products, guys, an "exclusive" designation on a file format is NOT a bonus.
08:37 🔗 Wyatt|Wor I still don't understand why there are so many different formats for raw image data.
08:37 🔗 db48x2 lol
08:38 🔗 * Wyatt|Wor never put any dots in photography
08:38 🔗 ersi SketchCow: Whoa man, that was a nice test.
08:39 🔗 SketchCow It'd better be, for $13k of new equipment!
08:39 🔗 ersi Wyatt|Wor: Because there's several Large Photo Corps and they all have different sensors, which most often just dumps out the raw sensor data
08:40 🔗 ersi SketchCow: Heh, might be that you grabbed Chris for the test as well ^_^
08:40 🔗 chronomex Wyatt|Wor: http://en.wikipedia.org/wiki/Raw_image_format#Rationale
08:40 🔗 SketchCow Right now, I'm cleaning up french magazines so this dead end with raw formats won't be miserable.
08:40 🔗 SketchCow Do you know Chris?
08:41 🔗 ersi SketchCow: No, but it feels like I do, now.
08:41 🔗 SketchCow I can fix a french magazine item in 5 seconds now.
08:41 🔗 Wyatt|Wor SketchCow: Is that using the new lights? If so, I totally agree with your decision to halogen.
08:41 🔗 SketchCow Gotta type fast, but I can do it.
08:41 🔗 SketchCow Well, the new lights are just new copies of the old lights.
08:41 🔗 SketchCow Same light as GET LAMP
08:41 🔗 Wyatt|Wor And it looked good there.
08:42 🔗 SketchCow Here's the command I'm doing:
08:42 🔗 SketchCow mv */* .;rmdir *;mv *.txt txt.txt;exit
08:42 🔗 * ersi shrugs
08:42 🔗 balrog SketchCow: does that free package support NEF?
08:43 🔗 SketchCow No idea
08:43 🔗 balrog dcraw
08:43 🔗 balrog http://www.cybercom.net/~dcoffin/dcraw/
08:43 🔗 SketchCow Let me look.
08:43 🔗 balrog yeah but cameras only
08:43 🔗 ersi "There are dozens of raw photo formats: CRW, CR2, MRW, NEF, RAF, etc. "RAW Format" does not exist; it is an illusion created by dcraw's ability to read all raw formats. "
08:43 🔗 balrog he provides this code for scanners: http://www.cybercom.net/~dcoffin/dcraw/scan.c
08:43 🔗 SketchCow Wait, wait
08:43 🔗 balrog the NEFs you have
08:43 🔗 SketchCow I THINK the donator donated .TIFFs as well
08:43 🔗 balrog are they from cameras or scanners
08:43 🔗 SketchCow In that case, who gives a shit, I'll include all three.
08:43 🔗 balrog well then probably use the tiffs
08:44 🔗 ersi Yeah, that's the best, really.
08:44 🔗 balrog the benefit of raw images are that you can make adjustments later
08:44 🔗 SketchCow .tif, .nef, and the thing one
08:44 🔗 balrog if you have the software to process them, that is.
08:44 🔗 ersi balrog: Or you just chuck them all in, nothing is lost that way
08:44 🔗 balrog but a .NEF from a scanner is no more useful than a .TIFF
08:44 🔗 balrog ersi: true
08:44 🔗 balrog :/
08:44 🔗 balrog scanner .NEFs don't have additional data, like camera ones do
08:44 🔗 chronomex balrog: tifs are really damn useful.
08:44 🔗 balrog idk why nikon even did that
08:44 🔗 SketchCow These are 190x newspapers the guy took photos of.
08:45 🔗 balrog SketchCow: photos with a camera?
08:45 🔗 SketchCow They're not 100% perfect but it's a nice collection to add.
08:45 🔗 SketchCow Yeah.
08:45 🔗 ersi Whoa, that be many.
08:45 🔗 balrog keep the .NEFs
08:45 🔗 chronomex balrog: you just don't like tif because it's a pain to view on windows, but tif is perfect for actually working with images.
08:45 🔗 balrog if someone needs to do white balance correction or such … will need them.
08:45 🔗 SketchCow 190x is the date, not the number
08:45 🔗 balrog chronomex: I didn't say I don't like .tif
08:45 🔗 balrog I actually do
08:45 🔗 balrog but raw camera images contain more data
08:45 🔗 chronomex 00:45:02 < balrog> but a .NEF from a scanner is no more useful than a .TIFF
08:45 🔗 chronomex looks like you said "tiff and nef are not useful"
08:45 🔗 balrog chronomex: I was saying that a scanner .NEF is junk
08:45 🔗 SketchCow /newspapers/Jimmy Swinnerton/On And Off The Ark - 1902/26b.tif' saved [2083747]
08:45 🔗 SketchCow 2mb TIF
08:45 🔗 balrog since it has nothing that the .tiff doesn't have
08:46 🔗 balrog yeah I have played with nikon scanners that generate .nefs
08:46 🔗 chronomex balrog: NEF is a special case of TIFF.
08:46 🔗 ersi SketchCow: Oh. Heh.
08:46 🔗 ersi SketchCow: That be plenty old then, sweet find
08:47 🔗 SketchCow So, just to explain what's going on.
08:47 🔗 SketchCow So archive.org chokes on some characters sets.
08:47 🔗 balrog :[
08:47 🔗 ersi chronomex: Well, he did say that NEFs from a Nikon SCANNER is bullshit. Since it does not provide any more information than a TIFF would. Nothing was said about NEFs vs. TIFF or anything.
08:47 🔗 SketchCow These French computer magazines? The filenames have some of those.
08:47 🔗 ersi Now I'll stop caring
08:47 🔗 balrog they're not utf-8?
08:47 🔗 SketchCow So I have this script.
08:47 🔗 chronomex ersi: good plan.
08:48 🔗 SketchCow It takes the .zip, unpacks it, drops me into a shell so I "fix" them, then when I exist the shell, it re-packs, and re-uploads to archive.org.
08:48 🔗 ersi I would assume it's ISO-8859-*, because they're French
08:49 🔗 SketchCow This french magazine collection is unfortunately an embarassment of riches, because they have a LOT of issues, and some of the filenames and other things have failures.
08:49 🔗 chronomex SketchCow: hm, that's a nice design pattern. I should remember that.
08:49 🔗 SketchCow What, the script?>
08:49 🔗 balrog ok night all
08:49 🔗 chronomex SketchCow: yeah.
08:51 🔗 SketchCow It gets worse.
08:51 🔗 SketchCow Now I'm running my two step process THROUGH A LOOP
08:52 🔗 SketchCow So I am looping a two step process to make it less than 5 seconds because I'm no longer typing in the up arrow to make the slight number change.
08:52 🔗 SketchCow This is the ONLY way I can get so much done, as people seem to think I'm capable of superhuman productivity
08:53 🔗 chronomex ogod
08:54 🔗 SketchCow I just fixed 8 of them.
08:55 🔗 SketchCow This is going to add a brutal amount of material up, like a few thousand issues.
08:55 🔗 SketchCow All french, but still very good.
08:55 🔗 SketchCow Occasional sub-par scanning, wouldn't mind seeing some redone.
08:55 🔗 SketchCow Missing issues here and there, etc.
08:56 🔗 SketchCow Newspapers still downloading from the drop point - now that I see he made three versions of each page, it makes more sense.
08:57 🔗 SketchCow mv */* .;rmdir *;mv *.txt txt.txt;exit
08:57 🔗 SketchCow I mean
08:57 🔗 SketchCow for each in 133 132 131 130 129 128 127 126 125 124 123 122 121 120;do ./cleanorator.sh generation4_numero_${each}_images.zip;done
08:58 🔗 SketchCow See, do "cleanorator" to the .zip. Then next one
08:58 🔗 SketchCow In cleanorator, I then do this simple operation they all share.
08:59 🔗 SketchCow That mv. Which is "move them out of the weirdly named subdirectory, make the stupidly named .txt description file into a txt.txt file.
08:59 🔗 SketchCow "
08:59 🔗 SketchCow Simple, but tedious
08:59 🔗 SketchCow But each one adds a 150-200pp magazine to the archive.
08:59 🔗 SketchCow So I'll do it.
09:00 🔗 SketchCow Yeah, see, up in the 12x range, the individual issues are 230pages
09:00 🔗 SketchCow Which is crazy
09:00 🔗 SketchCow "Generation 4" magazine
09:00 🔗 SketchCow Circa 1999-2000
09:03 🔗 SketchCow Mostly sharing this to give people insight into how I get so much stuff done.
09:03 🔗 zetathust the insight
09:03 🔗 SketchCow ...
09:03 🔗 SketchCow blah blah insight blah
09:03 🔗 SketchCow Weird.
09:04 🔗 kennethre Wyatt|Wor: they're 100% raw dumps from the sensors, so every camera has it's own format
09:05 🔗 kennethre Wyatt|Wor: Adobe's DNG is the only open standard for archival of "raw" images
09:05 🔗 kennethre Wyatt|Wor : http://en.wikipedia.org/wiki/Digital_Negative
09:06 🔗 ersi Nice at joining ages later
09:06 🔗 kennethre nodded off ;)
09:07 🔗 ersi We're past RAW Formats since atleast 20 min ago
09:07 🔗 kennethre better late than never
09:07 🔗 SketchCow kennethre: Mahdi came onto my Google Hangout. We chatted.
09:08 🔗 kennethre SketchCow: ah, nice. We're good pals
09:08 🔗 kennethre SketchCow: or he's just a crazy stalker and has me fooled
09:10 🔗 SketchCow Still slaming through issues of Generation 4 magazine.
09:12 🔗 SketchCow Good lord, some issues were 280p
09:14 🔗 SketchCow I see, just browsing an issue, that some games would get 4 page spreads.
09:14 🔗 SketchCow That'll do it.
09:15 🔗 SketchCow Wing Commander III article - 8 pages
09:19 🔗 ersi Awesome
09:25 🔗 SketchCow It's fascinating what a mess some of these archives are.
09:29 🔗 chronomex curation is a fuckload of work.
09:30 🔗 SketchCow Yeah, my big "innovation" is doing layered qualities of curation.
09:31 🔗 chronomex good curation is an order of magnitude harder than adequate curation
09:32 🔗 chronomex wonderful curation is an order of magnitude still
09:32 🔗 chronomex granted, "good" curation is maybe 5-10 minutes per item
09:33 🔗 SketchCow Yeah, for me, it's mostly concentrating on "was heading straight for oblivion" to "stable"
09:33 🔗 chronomex right
09:33 🔗 chronomex you're going from "fucked" to somewhere between "adequate" and "good"
09:34 🔗 SketchCow VERY occasionally I get people pushing back, and I say "motherfucker, this shit was going into a fire"
09:34 🔗 chronomex "you want it better, here go make it better"
09:34 🔗 SketchCow Hence metadata warriors
09:34 🔗 * chronomex nod
09:36 🔗 SketchCow http://www.archive.org/details/generation4-magazine
09:36 🔗 SketchCow And there we go!
09:36 🔗 SketchCow Now they're being rendered, added, etc.
09:37 🔗 SketchCow But they're not redrows anymore.
09:40 🔗 SketchCow Oo, oo... I can now do the magazines and clean them BEFORE going up!
09:40 🔗 SketchCow 93 issues of some magazine (Player One) in French... total size: 7.5gb of JPGs
09:40 🔗 SketchCow So heavy again.
09:43 🔗 SketchCow http://video.constantvzw.org/VJ13/
09:43 🔗 SketchCow There's my talk at the bottom (Jason)
09:52 🔗 SketchCow Yeah, bless you french archiving team - and your wild, WILD inconsistency from zip file to zip file.
09:55 🔗 chronomex <3
09:55 🔗 arrith SketchCow: if the issue is just how the filenames are you could convert the unicode to its compatible equivalent
09:55 🔗 arrith like the unicode snowman is xn--n3h
09:55 🔗 SketchCow I'm doing something similar.
09:56 🔗 SketchCow Sadly, there's little consistency to the inconsistencies.
09:56 🔗 SketchCow Obviously this was a weird labor of love dumped from all directions.
09:56 🔗 SketchCow I'm making them somewhat more negotiable.
09:57 🔗 SketchCow Sometimes it went into subdirectories, sometimes not.
09:57 🔗 SketchCow Sometimes two pages a scan, sometimes one.
09:57 🔗 SketchCow Sometimes it was with no weird characters. Sometimes so.
09:58 🔗 SketchCow Like, just now, someone included the included booklet as a subdirectory.
09:58 🔗 SketchCow Now I'm making it its own item.
10:00 🔗 SketchCow Bonus Thumbs.db!
10:01 🔗 arrith ah. well i wonder if there's enough in common with the majority that you could script those. then manually do the leftovers
10:01 🔗 arrith after a quite google i'm actually not quite sure how url unicode encoding is done, but it's done somehow
10:02 🔗 chronomex %-encoding of "unicode" values is a two-step process.
10:02 🔗 SketchCow Also, there's a bigger issue at hand.
10:02 🔗 chronomex first, the characters are turned into bytes somehow
10:02 🔗 chronomex the most common way is utf-8
10:02 🔗 chronomex then, the bytes are coded.
10:02 🔗 SketchCow It's only SOMETIMES that it's a unicode issue. Sometimes it's a directory structure issue, a filename issue.
10:03 🔗 SketchCow These really are quite a mess.
10:03 🔗 SketchCow I now have a thing where I can fix it and make it consistent in less than 20 seconds per issue.
10:03 🔗 db48x2 sounds like it's worse than the poetry archive
10:03 🔗 SketchCow Google Groups is the nightmare
10:03 🔗 arrith ah
10:04 🔗 arrith well i found, but don't really understand, this on how to: http://stackoverflow.com/questions/804336/best-way-to-convert-a-unicode-url-to-ascii-utf-8-percent-escaped-in-python
10:04 🔗 arrith if anyone is curious
10:09 🔗 SketchCow Oool, bonus for naming HALF the files in an archive .jpg and the other half .jpeg
10:09 🔗 chronomex SketchCow: I fucking hate that.
10:09 🔗 db48x2 heh
10:10 🔗 chronomex .jpg: because 8.3 is enough for anyone.
10:10 🔗 arrith i like it when there's JPG and JPEG and i forgot to handle case
10:10 🔗 chronomex .JPG: because your software REALLY misses 1972
10:10 🔗 arrith haha
10:10 🔗 SketchCow Gets better - some of these, they photograph pages 1-96, then 99-140
10:10 🔗 SketchCow WHY
10:10 🔗 chronomex shivvvv
10:14 🔗 SketchCow Now I'm blasting This American Life while slamming through these 96 issues.
10:16 🔗 SketchCow Up to 43.
10:17 🔗 Wyatt|Wor Oh, my rsyncs finished. Cool
10:23 🔗 db48x2 yea, my splinder upload finished as well
10:23 🔗 db48x2 18 gigs
10:23 🔗 SketchCow Damn, it is STILL downloading those newspaper issues.
10:23 🔗 SketchCow At 4mb a second.
10:23 🔗 db48x2 heh
10:24 🔗 Wyatt|Wor Oh yeah, I need to massage my Splinder stuff and consolidate it all in one place
10:25 🔗 db48x2 interesting
10:25 🔗 db48x2 I'm uploading mobileme at 2 MB/s
10:25 🔗 Wyatt|Wor Respectable
10:26 🔗 db48x2 especially since I only pay for 1 MB/s
10:26 🔗 Wyatt|Wor Haha
10:26 🔗 SketchCow 22G .
10:26 🔗 SketchCow root@teamarchive-0:/2/thenews# du -sh .
10:27 🔗 SketchCow And growing.
10:27 🔗 SketchCow 1.3G Jimmy Swinnerton
10:27 🔗 SketchCow 1.4G Frederick Opper
10:27 🔗 SketchCow 11G The World
10:27 🔗 SketchCow 9.2G Mutt n Jeff
10:27 🔗 Wyatt|Wor The world fits nicely on a spinning magnetic platter.
10:28 🔗 db48x2 heh
10:28 🔗 db48x2 actually, I'm suprised I can upload at all
10:28 🔗 db48x2 I expected comcast to cut me off already
10:29 🔗 db48x2 they've called me up to threaten me every month since I signed up
10:29 🔗 SketchCow http://www.archive.org/details/playerone-magazine-001
10:30 🔗 arrith db48x2: if you can afford it the business plans have no caps
10:30 🔗 arrith i don't know in particular how much it costs more
10:31 🔗 db48x2 gobs more
10:31 🔗 arrith ah ;/
10:31 🔗 db48x2 $200-300 more per month
10:31 🔗 db48x2 I've signed up with a dsl provider though
10:31 🔗 db48x2 half the cost for similar bandwidth
10:31 🔗 db48x2 and no caps
10:31 🔗 db48x2 wish I'd known about them before
10:32 🔗 Wyatt|Wor "I am inquiring about our website, awholeservices.com..." at which point I break down laughing.
10:32 🔗 db48x2 lol
10:32 🔗 db48x2 I need to finish up the poetry archive
10:32 🔗 db48x2 I still have 362 files that are duplicated, where one of the duplicates isn't a poem
10:32 🔗 db48x2 haven't figured out how to distinguish them reliably
10:34 🔗 Wyatt|Wor duplicated...in name?
10:34 🔗 arrith fuzzy duplicate finding is a tricky business
10:34 🔗 db48x2 Wyatt|Wor: sorta
10:34 🔗 Wyatt|Wor Ah, I think I see
10:34 🔗 db48x2 the poetry was originally downloaded by many people
10:34 🔗 db48x2 some of them downloaded the same thing
10:34 🔗 db48x2 so when I combined them into a single unified directory structure, I checked for duplicates and gave them sequential names
10:34 🔗 db48x2 [db48x@celebdil poems]$ ll ./000/901/103/
10:34 🔗 db48x2 drwxrwxr-x. 2 db48x db48x 4.0K Nov 25 04:42 .
10:34 🔗 db48x2 drwxrwxr-x. 1002 db48x db48x 20K Nov 25 04:42 ..
10:34 🔗 db48x2 total 48K
10:35 🔗 db48x2 -rw-r--r--. 1 db48x db48x 8 May 2 2011 000901103a.html
10:35 🔗 db48x2 -rw-r--r--. 1 db48x db48x 17K Nov 23 12:32 000901103.html
10:35 🔗 db48x2 is an example
10:35 🔗 db48x2 here the bad one is only 8 bytes of junk
10:37 🔗 Wyatt|Wor How have you been approaching it?
10:37 🔗 db48x2 I haven't; I've been putting it off
10:38 🔗 Wyatt|Wor Nonsense! You're just planning on how to Do It Right. :P
10:39 🔗 db48x2 lol
10:40 🔗 db48x2 actually, a quick check shows that all of these files are 8 bytes long
10:40 🔗 db48x2 all of the corrupt ones
10:40 🔗 db48x2 so I can just delete them all in one go
10:41 🔗 db48x2 that just leaves going through and renaming the ones that are left over
10:41 🔗 arrith hopefully not many of those
10:42 🔗 db48x2 arrith: there were 35349 files that were just 8 bytes of garbage
10:42 🔗 db48x2 there are 362 left
10:42 🔗 db48x2 all of them have an alternate file that at least has html in it
10:43 🔗 db48x2 and now there are none
11:28 🔗 Wyatt|Wor All right, job done. Cheers, all!
11:42 🔗 emijrp Do you think that Archive Team is a bit English-centric?
11:46 🔗 SketchCow Somewhat
11:46 🔗 SketchCow But that will change
11:49 🔗 ersi It is, what you make of it
11:55 🔗 SketchCow http://www.archive.org/stream/l-atarien-magazine-01/l-atarien-01#page/n0/mode/2up
11:55 🔗 SketchCow The Magazine of Club Atari (French)
12:15 🔗 arrith hard enough to write the wiki let alone translate it
12:15 🔗 arrith although if people are up for translating, i think there are mediawiki plugins for that
12:16 🔗 db48x2 we just archived a big italian website
12:17 🔗 emijrp Sure, but there are 200+ countries and more than 6000+ languages in the world.
12:17 🔗 emijrp Only talking about it, not a complaint
12:19 🔗 arrith if someone wants to look into that i think they could. there's various pieces of software out there to ease translation
12:20 🔗 underscor SketchCow: Are you still awake from yeaterday, or did you get up really early?
12:21 🔗 emijrp arrith: i spoke about archiving websites in other languages, not translating our wiki
12:21 🔗 SketchCow TRADE SECRET
12:23 🔗 db48x2 7700+
12:26 🔗 underscor SketchCow: :(
12:26 🔗 emijrp what is your case underscor ? you are on the Us too right?
12:26 🔗 underscor I'm up for school
12:26 🔗 underscor Although I'm not going because I'm awfully sick :(
12:26 🔗 emijrp ha
12:28 🔗 arrith oh
12:28 🔗 emijrp http://code.google.com/p/wikiteam/downloads/detail?name=archiveteamorg-20111203-history.xml.7z
12:31 🔗 SketchCow http://www.archive.org/details/cyberstratege-magazine&reCache=1
12:33 🔗 emijrp a google waves bots wiki http://code.google.com/p/wikiteam/downloads/detail?name=googlewavebotsinfo_wiki-20111201-current.xml.7z
12:36 🔗 emijrp musicmen in black
12:37 🔗 emijrp fucking window focus, searching for men in black OST on youtube
12:40 🔗 underscor lol
12:45 🔗 ersi emijrp: "< db48x2> we just archived a big italian website" <- how is that Not working on Non-english stuff?
12:46 🔗 emijrp man, what is your problem with me?
12:47 🔗 ersi That you apparently can't read :(
12:47 🔗 ersi And that you often post stuff without any context
12:47 🔗 ersi that's about it.
12:48 🔗 SketchCow Boys, boys
12:48 🔗 emijrp you just have to reply all my messages in bad mood, stop it
12:48 🔗 ersi But I wasn't being a cranky asshole this time, I just asked; How is *that* not working on non-english
12:49 🔗 ersi You asked, I replied. I've ignored you mostly
12:49 🔗 ersi Maybe we should take this in a PM
12:52 🔗 emijrp i have nothing to talk with you, /ignore ersi and end of story
12:55 🔗 ersi Truth hurts.
12:56 🔗 emijrp french open data http://www.data.gouv.fr/
12:57 🔗 emijrp (it is a new website)
14:21 🔗 emijrp I'm making a list of PDF linked from English Wikipedia.
14:22 🔗 emijrp An experiment with Spanish Wikipedia (800,000 articles) shows 70,000 different PDF linked.
14:23 🔗 emijrp English version is probably 5-10x bigger.
14:23 🔗 emijrp But about 50% of links will be 404 errors.
14:23 🔗 emijrp Anyone interested on this idea?
14:26 🔗 emijrp Around 500,000 random PDFs. Lol.
15:20 🔗 rude___ SketchCow re: NEF format- it's the least destructive format to manipulate if people are going to use those files to stitch together entire spreads or comic strips. If IA won't take NEF, converting them to TIFF 16-bit is the way to go, and Bibble is probably the best way to handle that batch conversion.
15:49 🔗 underscor Readability is pretty much the best thing ever
15:58 🔗 Paradoks Re: Archive Team being English-centric. While true, it seems odd to hear that when we've been spending most our resources archiving an Italian website.
16:00 🔗 Paradoks Personally, I occasionally try to find Spanish-language sites that I enjoy reading, but I'm just not immersed enough that I find out about things like I do with English things. So it also makes sense that I wouldn't hear about sites closing in Spain or latin America.
16:01 🔗 Paradoks And it seems unlikely that that problem would entirely go away until we have lots of Archive-Team members who are immersed in lots of other languages.
17:27 🔗 SketchCow Also, english language is superior
17:40 🔗 SketchCow http://www.archive.org/details/computermagazinesfrench coming along.
17:41 🔗 SketchCow rude___: Agreed. It's just annoying, until I found out the guy had put up multiple versions regardless.
17:45 🔗 rude___ he did? I mean, I did?
17:45 🔗 SketchCow You did, there's TIFFs ahoy
17:46 🔗 SketchCow Pardon my complaining, we lash out to pass the time down here in the boiler room
17:46 🔗 SketchCow Look at this amazing utility I wrote
17:46 🔗 SketchCow Who is numero uno? I think we all know.
17:46 🔗 SketchCow oot@teamarchive-0:/3/MAGS/FRENCH/magazines/PC Assemblage# ../numero.sh
17:46 🔗 SketchCow root@teamarchive-0:/3/MAGS/FRENCH/magazines/PC Assemblage#
17:47 🔗 yipdw well, it isn't root, because root is numero cero
17:47 🔗 SketchCow Fine, fine, it actually has use and converts filenames like pcassemblage_numero06.zip to pcassemblage_numero_06_images.zip
17:47 🔗 yipdw on some systems numero uno is daemon
17:47 🔗 SketchCow So my OTHER script can see that 06 and do the right thing, and _images.zip will make the archive.org machines turn it into all those previews.
17:48 🔗 yipdw that sounds like an import process I wrote for work -- it's a series of 27 Ruby scripts that all feed transformations into each other
17:48 🔗 Schbirid yay, found another quake ad http://www.quaddicted.com/_media/quake/quake_is_good_for_you_2pages.jpg
17:48 🔗 yipdw not exactly the fastest, but at least there's diagnostic output out the ass
17:48 🔗 yipdw correctness over speed, etc.
17:48 🔗 SketchCow Hurrah, http://www.archive.org/details/computermagazinesspanish is now populating.
17:49 🔗 rude___ no problem, some of the items were scanned hence going straight to TIFF. The newspapers were photographed so alls you get is NEF and lower res jpg proofs. Exporting TIFFs for everything would've turned my 20 gig upload into a 160 gig upload
17:51 🔗 yipdw on that note, I recently learned just how crazy good modern DSLRs are compared to readily-available flatbed scanners, assuming you have some knowledge of perspective, the right optics, and lighting
17:52 🔗 yipdw a friend wanted to archive a massive painting she's donating to Child's Play
17:52 🔗 yipdw we first tried a flatbed, which sucked
17:52 🔗 yipdw next try was a 5D Mark II
17:52 🔗 yipdw the sensor on that thing blows my mind every time I see things from it imported into Lightroom.
17:53 🔗 rude___ digital backs are good for that kind of stuff
17:53 🔗 yipdw we were using a pretty rudimentary lighting setup, too; just bounce flash
17:53 🔗 SketchCow That's what archive.org uses.
17:53 🔗 SketchCow For the mongo things, like rude's newspapers, they have an oversize from-above scanner.
17:53 🔗 yipdw seems like a good choice
17:54 🔗 SketchCow The last time I was in the scanning room, they were digitizing 1930s geological surveys.
17:54 🔗 yipdw I don't suppose IA does tours, do they :P
17:55 🔗 rude___ we attempted commissioning a scanner for the newspaper folios
17:55 🔗 rude___ the thing is, the pages were literally disintegrating
17:56 🔗 rude___ putting a plate on it didn't work so well
17:58 🔗 rude___ diy book scanning has really taken off since then so who knows what would be possible today
17:59 🔗 rude___ yipdw: what lens did you use for the painting?
17:59 🔗 yipdw rude___: 24-70 f/2.8L at 24mm, f/4, 1/80s, ISO 50
17:59 🔗 yipdw i would have preferred to use a tilt-shift to get a more rectangular projection, but
17:59 🔗 yipdw cost, etc.
17:59 🔗 yipdw Adobe's lens corrections seem to do a good enough job
18:00 🔗 yipdw er, 30mm
18:00 🔗 yipdw http://ashleyriot.com/childsplayre.jpg
18:01 🔗 yipdw that upload is a bit dark; I guess she took it into Photoshop
18:08 🔗 rude___ awesome
18:10 🔗 rude___ this is what the D1X yielded, http://bryanvaccaro.org/archive/Img4291.jpg
18:11 🔗 rude___ beautiful details in the burber carpet
18:11 🔗 yipdw eh
18:11 🔗 yipdw heh
18:11 🔗 yipdw how much have you noticed diffraction artifacts affecting that sort of work
18:11 🔗 yipdw ?
18:12 🔗 yipdw (EXIF tags on that image say f/16, which I usually never work at for photographic or archival purposes)
18:14 🔗 yipdw not so much because I hate small apertures, just that I usually hover around f/2.8 - f/5.6
18:14 🔗 yipdw and I've heard, but not tested, that diffraction begins to impact sharpness around f/11
18:16 🔗 SketchCow So I guess in photographic history, at the beginning, they were trying to set the lenses and focus and stops to be painterly.
18:16 🔗 SketchCow because everyone assumed they were like painting
18:16 🔗 SketchCow and some group called itself some sort of lens setting
18:17 🔗 SketchCow And they basically shot it up so high to such a level of detail to go "fuck you, lenses are superior"
18:18 🔗 SketchCow What the.... motherfucker, this set of issues of this magazine swaps between THREE DIFFERENT FILE STRUCTURES
18:18 🔗 rude___ yipdw: I don't think we put much thought into it at the time, but I recall that some of the lower f stops didn't look as sharp as f/16 for whatever reason
18:19 🔗 yipdw hmm interesting
18:19 🔗 yipdw I usually don't worry too much about it due to other factors generally being way more important to image quality :P
18:19 🔗 yipdw (e.g. composition, lighting, whether or not your subject is a ponce)
18:20 🔗 yipdw but for archiving it seems like a fun thing to test
18:20 🔗 rude___ it had something to do with the size of the content, the lens, and lighting situation
18:20 🔗 rude___ smaller items were shot at f/3.2, f/8
18:21 🔗 rude___ the simplest answer though is that I didn't know what I was doing
18:21 🔗 yipdw :P
18:21 🔗 SketchCow Whoop, here we go, structure #4
18:21 🔗 yipdw still works, I can make out the newspaper content
19:22 🔗 SketchCow How'd we do with Gamepro? They close in just over an hour.
20:41 🔗 SketchCow Jason and friends!
20:41 🔗 SketchCow You've been duly warned!
20:41 🔗 SketchCow http://cmdrtaco.net/2011/12/everything2-com-seeks-new-ownership/
20:47 🔗 SketchCow I'm jamming it up into archive.org's collection.
20:51 🔗 soultcer Weren't they the ones who complained when someone from archiveteam but a torrent of their posts online, because they can make backups without our help?
20:51 🔗 soultcer *put
20:51 🔗 dan_ Heads up: Everything2.com is up for sale. http://cmdrtaco.net/2011/12/everything2-com-seeks-new-ownership/
20:52 🔗 soultcer dan_: SketchCow posted this seconds before you, but thanks for the warning anyway ;-)
20:52 🔗 SketchCow http://www.archive.org/details/archiveteam-everything2
20:53 🔗 dan_ I shot off an e-mail, just thought i'd post in IRC just in case
20:53 🔗 SketchCow Your commitment is charming.
20:53 🔗 bsmith094 TO THE DOWNLOAD MANAGERS!!! Away!
20:54 🔗 bsmith094 honest question, though, whatever happened to the simple websites where a simple, easy wget -m would grab everything in nice, neat folders?
20:54 🔗 dan_ Funny, Rob Malda is also seeking employment. Pissing off archiveteam I don't think scores any points. :)
20:54 🔗 SketchCow Malda's kind of an idiot.
20:55 🔗 SketchCow You know that, right.
20:55 🔗 bsmith094 rob malda, is he an actor?
20:55 🔗 SketchCow He's the first Slashdot founder.
20:55 🔗 soultcer CmdrTaco
20:55 🔗 SketchCow I've met him.
20:55 🔗 SketchCow He's a fucking zero.
20:55 🔗 bsmith094 whoops, thinking of allen alda
20:56 🔗 SketchCow Matt Haughey is worth 4,000 Rob Maldas.
20:56 🔗 dan_ I live in his hometown (jason, many notacons ago we hung out in the lobby with tyger/froggy the night after the con ended)
20:56 🔗 SketchCow Yes indeed we did
20:57 🔗 bsmith094 the meta filter guy?
20:57 🔗 SketchCow Yes
20:58 🔗 bsmith094 (04:44:06 AM) SketchCow: There's my talk at the bottom (Jason)
20:58 🔗 bsmith094 SketchCow: whats this talk (04:43:59 AM) SketchCow: http://video.constantvzw.org/VJ13/
20:59 🔗 SketchCow Yes
20:59 🔗 SketchCow The one I gave in Belgium on Sunday
20:59 🔗 bsmith094 ah, really nice audio for a telepresence
20:59 🔗 SketchCow With bonus shutouts, kicks, and the rest.
21:02 🔗 bsmith094 hey, here's a site worth saving, localroger.com, wget -m that thro in ia, maybe 20mb if you squint, authors page , has most of his work on it
21:19 🔗 * underscor emails malda with an offer of $50
21:19 🔗 underscor hehe
21:30 🔗 bsmith094 im trying to edit tha archives page tp add my own scrapes of some websites ive had lying around, can someone check my syntax?
22:15 🔗 underscor SketchCow: Did we have anyone archiving GP?
22:15 🔗 underscor Pulling it at 80Mbps like a boss
22:31 🔗 bsmith094 underscor: gp?
22:32 🔗 underscor gamepro
22:32 🔗 SketchCow Gamepro was sort of being archived, but another shot is always welcome.
22:32 🔗 bsmith094 is it dead yet, and is there a script for that?
22:34 🔗 underscor no, and no
22:35 🔗 bsmith094 any particular folder u need archived
22:35 🔗 zetathust html archived
22:36 🔗 arrith bsmith094: that change to the wiki isn't appearing in the table
22:36 🔗 arrith bsmith094: try to edit it and hitting 'preview' to try to get it to show
22:36 🔗 bsmith094 yes i know can you fix that please?
22:36 🔗 Paradoks bsmith: I re-arranged your entry on the archives page. It shows up, now, and the links work. I also made the assumption that "passage" had the standard two 's's, rather than three.
22:36 🔗 SketchCow Turns out I know someone who is going to be raiding the closets of GamePro
22:36 🔗 SketchCow Now going to talk about arranging for a set of people with a truck
22:37 🔗 SketchCow My job, why does it never end
22:37 🔗 Paradoks Yay!
22:37 🔗 instence gamepro is gone
22:37 🔗 instence just got switched over within the hour
22:37 🔗 SketchCow GAMEPRO is gone
22:37 🔗 SketchCow GAMEPRO the OFFICE is still there
22:38 🔗 bsmith094 wait the Systems closets ??! holy crap you lucked out
22:38 🔗 instence ...i was just saying the site gamepro.com got switched over, chill
22:39 🔗 SketchCow Ha ha
22:39 🔗 SketchCow Come to #archiveteam and tell people to chill
22:39 🔗 SketchCow Next go to #football and ask people to not be so opinionated
22:39 🔗 SketchCow #politics could use a telling off to "use less ad hominem attacks"
22:40 🔗 bsmith094 lol
22:40 🔗 instence ?
22:40 🔗 underscor hahaha
22:42 🔗 yipdw I dunno, watching Redis' MONITOR is a good way to max and relax
22:43 🔗 underscor ^
22:43 🔗 bsmith094 how do i run a python script from inside a shell script
22:43 🔗 bsmith094 using vars read in from a file
22:43 🔗 yipdw python [script name] and pass the variables as arguments
22:43 🔗 arrith bsmith094: some of the scripts i gave you earlier did that
22:43 🔗 yipdw or set them in the environment
22:44 🔗 bsmith094 yeah i know, and im trying to send linklist to downloader.py
22:44 🔗 arrith bsmith094: any script where you set your downloader.py location
22:44 🔗 bsmith094 they're in the same directory
22:44 🔗 bsmith094 while read num; do echo exec python downloader.py -f html $num; done < linklist.txt
22:45 🔗 bsmith094 what am i missing, because that just does the echo part?
22:45 🔗 zetathust echo effect is ten so tiny polecat
22:46 🔗 bsmith094 zetathust: ummm, what?
22:46 🔗 arrith i'd agree with that
23:04 🔗 SketchCow Well, OK, then.
23:05 🔗 SketchCow it appears that a set of friends of mine are posed to literally take everything out of the gamepro offices not nailed down
23:05 🔗 SketchCow Anyone in the SF area available at 11am thursday? E-mail me, jason@textfiles.com, I'll put you in touch
23:08 🔗 SketchCow First person sputtering at the mirror of everything2
23:09 🔗 PatC Yay! I got a "new" (old) computer for a storage box :)
23:14 🔗 pberry hola

irclogger-viewer