[07:07] oh my this is larger than I thought [07:08] everyone is in Archive Team [07:58] YOU ARE ALL IN ARCHIVE TEAM [07:59] thats why i'm here [07:59] ditto [08:00] Assimilated [08:00] i saved ao3 (most of it) ((well... the stories anyway)) [08:00] it was dl.tv not getting archive that pushed me [08:01] i told you guys to archive it but then i got the where is the post saying its going down [08:01] * brayden hasn't got the disk, or bandwidth.. or anything to archive [08:01] Only thing that has bandwidth doesn't have the disk for it :o [08:01] Damn Linode [08:01] This is ARCHIVE TEAM. [08:01] * omf_ kicks BlueMax off a cliff [08:01] * BlueMax flies up and stabs omf_ in the nose [08:01] eww [08:02] * bsmith093 rescues omf_ [08:04] so omf_, have you begun any progress on ff or are you going to kick me down a pit again [08:04] i have approx 2.4 million stories off ff.net ~100GB, downloading 2 at once aporx one chapter/sec , its nowhere near done, but ill gladly hand off the files and downloader script ive co opted [08:05] one hundred gig?! holy christ man how much are you actually backing up [08:07] i havent sent them anywhere yet and its been running for about 14 months on my laptop. just the stories in text form with a metadata block at the beginning. [08:08] i have them saved in the file structure Fanfiction\category\status\category - author - title.txt [08:08] I assume you have a ton of duplicates [08:08] prpbably [08:08] not worth sorting them out [08:09] seriously eanna take over [08:10] could I ask you to compress that text and see how much it comes out to with the smallest compression possible? I'd take over if I was able to confirm I could run the script constantly but the heat coming through might make it impossible [08:10] fucking Australia [08:10] i have no idea how much it would compress, and the job would take days [08:10] Yeah our fanfic collection is a few hundred gigs [08:11] omf_: what format is it [08:11] It actually needs its own collection [08:11] 50gb warc.gz files [08:11] oy i have straight text files [08:13] BlueMax: im running this http://code.google.com/p/fanficdownloader/ with a tweaked personal.ini file [08:14] well I guess I can try and take over [08:14] but my upstream is kinda shitty [08:14] i give a xargs version of that a list of links and it just ticks through them [08:15] Just remember not having the data in warc format limits its future uses. [08:16] if you wanna send me your personal.ini file I'll see what I can do over here [08:16] and how does it limit future uses omf_? [08:16] omf_: im not throwing away a year long download job, give me a place to throw it [08:16] I am not saying that [08:16] No warcs means no wayback machine [08:17] or warc viewers [08:17] Plain text grabs have many uses like anything NLP which is a huge area [08:17] BlueMax: waithing for transfer [08:17] Which leads to the next question I had bsmith093. Do you have an ftp with the data for us to get it? [08:18] timeout [08:18] i can send it to you, sure where am i sending? [08:18] Let me create an ftp account for you [08:18] http://jetbytes.com will this site do? [08:19] im not gonna compress because i only have 28gb left on the machine im using [08:20] no problem [08:20] BlueMax, as for your earlier question I am trying to write a little program that finds the new categories so we can add that to the map before doing an update grab [08:20] BlueMax: http://files1.jetbytes.com/73d8ef234e2339a096f4f3601b71ee55 [08:21] thats the ini file [08:23] man my connection is being a dick tonight, won't let me load jetbytes [08:23] BlueMax, jetbytes is stalling out on me as well [08:23] worked great for me [08:23] ny usa [08:23] sometimes that happens [08:23] can you put it somewhere else? [08:24] ummm where else [08:24] duh pastbin [08:24] just paste a copy into paste.archivingyoursh.it [08:24] or that sure [08:26] http://paste.archivingyoursh.it/nibacowidi.apache [08:27] 503? [08:28] also a bash script with this in it "cat $@ | egrep '^http:' | xargs -n1 python downloader.py -c personal.ini -f txt " no quotes [08:29] ru that script with a file containing a list of links as the only argument [08:31] I can't get the pastebin to load [08:31] Also don't have a list of links [08:31] Oh [08:31] It loaded [08:31] My bad [08:38] also bollocks, bash script? don't have a linux system on hand [08:39] BlueMax: also python 2 [08:41] ack. [08:41] BlueMax: any other ftp thing besides filezilla its exploding when i try to load the queue with this much [08:42] on windows? [08:42] ubuntu [08:43] man I could not tell you [08:44] can't upload in folders? or do you have all the files in the same folder [08:45] oh wait [08:45] you said earlier [08:45] yeah can you upload by category? [08:45] all in one root folder fanfiction and im usign nautilus's connct to server option [08:45] can rsync do this omf_ [08:53] omf_: filezilla is choking to death on this folder what else do i use for this [08:53] BlueMax: id love to but filezilla is choking [08:58] give it a good slap on the back then [08:59] jesus 100gb of plain text? [08:59] yep [09:00] :D [09:01] that's what, over 500,000 times the size of the complete works of shakespeare? [09:01] gutenberg text shakespeare is 5 mb so yes [09:02] I love how we're sort of accumulating all the insane packrats on the internet [09:02] wait .... maybe [09:02] damn straight... ive spent the past few days backing up shows to dvd, that ill probably never watch again [09:04] damn my HDD is too fragmented to get a good sized linux partition [09:04] will have to run the script through a VM [09:06] bsmith093, you said something about a list of links [09:07] yes i have a thing to generate that or i could just email it or something, its 70 mb [09:07] the list [09:07] upload it [09:07] kk [09:08] is there anything that changes the list or is it just a big list of stories because it probably needs regenerating or something [09:09] did my last sentence make any sense at all [09:11] BlueMax omf_ ok ive sent the stuff to the server the scripts and things are in 11111fanficstuff personal ini might need a tiny tweak to be ready [09:13] you're going to have to walk me through it like a braindead child once I have Linux Mint ready [09:14] ignore that here they are xaa xab xac very large text files hold on [09:24] how much ram will it all take [09:29] looks like the final missing eps of doctor who have been found! [09:31] not much ram maybe half a gig [09:31] http://www.mirror.co.uk/tv/tv-news/106-doctor-who-episodes-uncovered-2343474 [09:32] Smiley: holy fish fingers and custard!!!?!?! ALL 107?!?!? [09:33] omf im havung upload problems with uploading these xaa ab ac files i think the servers busy with the massive job im running [09:35] server load and RAM usage are very small [09:35] very small? good [09:36] bsmith093: appently [09:36] so yeah im getting critical error 550 when trying to add in the lists to the server [09:37] google genlist .py i used that to make them [09:39] I wouldn't have a clue how to do that bsmith093 [09:40] there in the format http://www.fanfiction.net/s/1234567 omf_ [09:43] well its almost 6AM local time , im going to bed, i'll see if i can upload them in a few hours [09:44] well damn, could've kept my other downloads going [09:44] omf currently the highest number link is below 10 million [09:45] omf_ BlueMax bed now for me bye but still lurking [18:28] //join #crapd [21:48] omf_: how much do u have [21:49] its still going on my end [21:49] back of the envelope numbers say 6.45 days to completion [21:50] at 200kb/s since most of these files arent nearly bigenough to stabilize the connection speed [21:54] bsmith093, I got 25gb so far [21:55] holy crap, i didnt realize it was nearly that much [21:56] should be about 5 days then [22:00] omf are you sure you dont mean 2.5GB, cause 25 GB after 12 hours is 4MB/s which is nowhere near my upload speed [22:03] the -z option of rsync compresses the files before they are sent over the net [22:12] rsync -z is good shit [22:12] rsync should saturate a connection even with many small files [22:13] if you need faster you can do "tar | ssh" or "tar | nc" [23:11] omf_: so there IS a compress-on-the-fly thing. huh , whoo, then [23:11] bsmith093, willing to help me get this VM running the fanficdownloader? [23:11] BlueMax: sure [23:12] alright just need to go turn it on [23:12] id there anywhere is can drop the list files, though? the ftp server isnt letting me put them there for some reason [23:13] man I have no idea [23:16] gdau [23:16] *gday [23:17] bsmith093: I'm currently running a fresh installation of Linux Mint 15 with nothing extra installed outside of VMware Tools [23:21] ive compressed the full thing into 4 mb. jet files again? [23:22] if it works sure [23:22] do I need to install any packages [23:25] at least python 2.5 [23:26] here the files for the downloader http://files1.jetbytes.com/eb4034b3121250a2c16ea5dbcf708f31 [23:28] doesn't seem to want to work for me today either [23:28] if that failed im also using file dropper... man i cant believe how many of these free one off file uploaders there **AREN'T** [23:28] what about mediafire or something [23:28] oh [23:28] file dropper [23:28] that works [23:28] http://www.filedropper.com/fanficdownloader [23:29] is python 2.7 good enough? [23:29] that's what's installed on here [23:29] i give you a file, you give me a link, it doesn't even have to be live for more than 20min... HOW HARD IS THAT!!!!?! [23:29] yes [23:30] honestly im trying to find the deps, for this and i cant really find much, so probably just python [23:31] also mercurial for updating [23:34] mercurial is installed, files have been unzipped [23:34] BM-Linux: when you get that, decompress and put it in user home folder, then run screen -SL 1 ./auto.sh xaa [23:34] what's the expected output [23:35] what should I be seeing if I run it right, that is [23:35] the x ** files are the lists of links [23:35] saving file to ** filepath** [23:35] i use a screen session because the full run even counting where i left off, will probably take months [23:35] should I run as root? [23:36] NO [23:36] unnessessary [23:36] k [23:36] theres a screen log file in there thats the expected outout [23:36] *output [23:36] screen is currently not installed [23:36] did I miss something? [23:37] oh yeah get screen [23:37] apt-get update; apt-get install screen [23:37] wait, mint does have apt still, right? [23:37] its good for running jobs and detaching to do other things [23:37] seriously I HEART SCREEN [23:37] I just use Synaptic [23:37] and xargs [23:38] er , and bash as well, ive never used mint [23:39] Must run suid root for multiuser support. [23:39] That's after installing screen [23:39] BM-Linux: you're not starting from the beginning your taking over from where i stopped, so the first link should be 3 million something [23:40] yes i know ive never ad to run screen as root [23:41] So what do I do, run as root? [23:41] i split the full 1-10 million link file, into 3 parts, xaa is 1- 4 million xab is 4-8 million and xac is 8-10 million [23:41] not root [23:41] screen -SL 1 ./auto.sh xaa [23:42] Must run suid root for multiuser support. [23:42] idk, anyone wanna take this omf_ GLaDOS [23:43] oy wait are the files in the older owned by you,. i forgot that [23:44] sudo chown -R fanficdownloader/ (username) [23:45] didn't help [23:47] ok then what do you have for long running jobs [23:47] what do you mean [23:48] the script auto.sh will take up a terminal for several months atleast, do u mind? [23:50] wow, several months? not sure if I can keep a computer on for more than a couple of days [23:51] was that sarcasm, text doesnt cover that well [23:51] also how much free space do u have [23:51] no it wasn't [23:51] and this PC has about 70GB free [23:54] not that much in terms of power, or heat, my laptop's been doing this for a year and i havent noticed a spike in usage [23:54] BM-Linux: just run it as root, it's a vm. [23:55] GLaDOS: whats with the multiuser support screen message though? [23:55] Cannot identify account '.'. [23:55] uemaxima@bluemaxima-virtual-machine ~/fanficdownloader $ sudo screen -SL ./auto.sh xaa [23:55] umm, set the sticky bit, i just googled this that thats what it siad [23:55] sudo su; screen -SL ./auto.sh xaa [23:56] (note: ; doesn't work with su) [23:57] Cannot identify account ".". [23:59] ok then drop screen then, just when your shutting down note the last link downloaded and update xaa to refelct that [23:59] *reflect