#archiveteam 2013-10-06,Sun

↑back Search

Time Nickname Message
07:07 🔗 Pinstripe oh my this is larger than I thought
07:08 🔗 yipdw everyone is in Archive Team
07:58 🔗 BlueMax YOU ARE ALL IN ARCHIVE TEAM
07:59 🔗 godane thats why i'm here
07:59 🔗 bsmith093 ditto
08:00 🔗 brayden_ Assimilated
08:00 🔗 bsmith093 i saved ao3 (most of it) ((well... the stories anyway))
08:00 🔗 godane it was dl.tv not getting archive that pushed me
08:01 🔗 godane i told you guys to archive it but then i got the where is the post saying its going down
08:01 🔗 * brayden hasn't got the disk, or bandwidth.. or anything to archive
08:01 🔗 brayden Only thing that has bandwidth doesn't have the disk for it :o
08:01 🔗 brayden Damn Linode
08:01 🔗 omf_ This is ARCHIVE TEAM.
08:01 🔗 * omf_ kicks BlueMax off a cliff
08:01 🔗 * BlueMax flies up and stabs omf_ in the nose
08:01 🔗 brayden eww
08:02 🔗 * bsmith093 rescues omf_
08:04 🔗 BlueMax so omf_, have you begun any progress on ff or are you going to kick me down a pit again
08:04 🔗 bsmith093 i have approx 2.4 million stories off ff.net ~100GB, downloading 2 at once aporx one chapter/sec , its nowhere near done, but ill gladly hand off the files and downloader script ive co opted
08:05 🔗 BlueMax one hundred gig?! holy christ man how much are you actually backing up
08:07 🔗 bsmith093 i havent sent them anywhere yet and its been running for about 14 months on my laptop. just the stories in text form with a metadata block at the beginning.
08:08 🔗 bsmith093 i have them saved in the file structure Fanfiction\category\status\category - author - title.txt
08:08 🔗 BlueMax I assume you have a ton of duplicates
08:08 🔗 bsmith093 prpbably
08:08 🔗 bsmith093 not worth sorting them out
08:09 🔗 bsmith093 seriously eanna take over
08:10 🔗 BlueMax could I ask you to compress that text and see how much it comes out to with the smallest compression possible? I'd take over if I was able to confirm I could run the script constantly but the heat coming through might make it impossible
08:10 🔗 BlueMax fucking Australia
08:10 🔗 bsmith093 i have no idea how much it would compress, and the job would take days
08:10 🔗 omf_ Yeah our fanfic collection is a few hundred gigs
08:11 🔗 bsmith093 omf_: what format is it
08:11 🔗 omf_ It actually needs its own collection
08:11 🔗 omf_ 50gb warc.gz files
08:11 🔗 bsmith093 oy i have straight text files
08:13 🔗 bsmith093 BlueMax: im running this http://code.google.com/p/fanficdownloader/ with a tweaked personal.ini file
08:14 🔗 BlueMax well I guess I can try and take over
08:14 🔗 BlueMax but my upstream is kinda shitty
08:14 🔗 bsmith093 i give a xargs version of that a list of links and it just ticks through them
08:15 🔗 omf_ Just remember not having the data in warc format limits its future uses.
08:16 🔗 BlueMax if you wanna send me your personal.ini file I'll see what I can do over here
08:16 🔗 BlueMax and how does it limit future uses omf_?
08:16 🔗 bsmith093 omf_: im not throwing away a year long download job, give me a place to throw it
08:16 🔗 omf_ I am not saying that
08:16 🔗 omf_ No warcs means no wayback machine
08:17 🔗 omf_ or warc viewers
08:17 🔗 omf_ Plain text grabs have many uses like anything NLP which is a huge area
08:17 🔗 bsmith093 BlueMax: waithing for transfer
08:17 🔗 omf_ Which leads to the next question I had bsmith093. Do you have an ftp with the data for us to get it?
08:18 🔗 BlueMax timeout
08:18 🔗 bsmith093 i can send it to you, sure where am i sending?
08:18 🔗 omf_ Let me create an ftp account for you
08:18 🔗 BlueMax http://jetbytes.com will this site do?
08:19 🔗 bsmith093 im not gonna compress because i only have 28gb left on the machine im using
08:20 🔗 omf_ no problem
08:20 🔗 omf_ BlueMax, as for your earlier question I am trying to write a little program that finds the new categories so we can add that to the map before doing an update grab
08:20 🔗 bsmith093 BlueMax: http://files1.jetbytes.com/73d8ef234e2339a096f4f3601b71ee55
08:21 🔗 bsmith093 thats the ini file
08:23 🔗 BlueMax man my connection is being a dick tonight, won't let me load jetbytes
08:23 🔗 omf_ BlueMax, jetbytes is stalling out on me as well
08:23 🔗 bsmith093 worked great for me
08:23 🔗 bsmith093 ny usa
08:23 🔗 BlueMax sometimes that happens
08:23 🔗 BlueMax can you put it somewhere else?
08:24 🔗 bsmith093 ummm where else
08:24 🔗 bsmith093 duh pastbin
08:24 🔗 omf_ just paste a copy into paste.archivingyoursh.it
08:24 🔗 bsmith093 or that sure
08:26 🔗 bsmith093 http://paste.archivingyoursh.it/nibacowidi.apache
08:27 🔗 BlueMax 503?
08:28 🔗 bsmith093 also a bash script with this in it "cat $@ | egrep '^http:' | xargs -n1 python downloader.py -c personal.ini -f txt " no quotes
08:29 🔗 bsmith093 ru that script with a file containing a list of links as the only argument
08:31 🔗 BlueMax I can't get the pastebin to load
08:31 🔗 BlueMax Also don't have a list of links
08:31 🔗 BlueMax Oh
08:31 🔗 BlueMax It loaded
08:31 🔗 BlueMax My bad
08:38 🔗 BlueMax also bollocks, bash script? don't have a linux system on hand
08:39 🔗 bsmith093 BlueMax: also python 2
08:41 🔗 BlueMax ack.
08:41 🔗 bsmith093 BlueMax: any other ftp thing besides filezilla its exploding when i try to load the queue with this much
08:42 🔗 BlueMax on windows?
08:42 🔗 bsmith093 ubuntu
08:43 🔗 BlueMax man I could not tell you
08:44 🔗 BlueMax can't upload in folders? or do you have all the files in the same folder
08:45 🔗 BlueMax oh wait
08:45 🔗 BlueMax you said earlier
08:45 🔗 BlueMax yeah can you upload by category?
08:45 🔗 bsmith093 all in one root folder fanfiction and im usign nautilus's connct to server option
08:45 🔗 bsmith093 can rsync do this omf_
08:53 🔗 bsmith093 omf_: filezilla is choking to death on this folder what else do i use for this
08:53 🔗 bsmith093 BlueMax: id love to but filezilla is choking
08:58 🔗 BlueMax give it a good slap on the back then
08:59 🔗 DFJustin jesus 100gb of plain text?
08:59 🔗 bsmith093 yep
09:00 🔗 DFJustin :D
09:01 🔗 BlueMax that's what, over 500,000 times the size of the complete works of shakespeare?
09:01 🔗 bsmith093 gutenberg text shakespeare is 5 mb so yes
09:02 🔗 DFJustin I love how we're sort of accumulating all the insane packrats on the internet
09:02 🔗 bsmith093 wait .... maybe
09:02 🔗 bsmith093 damn straight... ive spent the past few days backing up shows to dvd, that ill probably never watch again
09:04 🔗 BlueMax damn my HDD is too fragmented to get a good sized linux partition
09:04 🔗 BlueMax will have to run the script through a VM
09:06 🔗 BlueMax bsmith093, you said something about a list of links
09:07 🔗 bsmith093 yes i have a thing to generate that or i could just email it or something, its 70 mb
09:07 🔗 bsmith093 the list
09:07 🔗 omf_ upload it
09:07 🔗 bsmith093 kk
09:08 🔗 BlueMax is there anything that changes the list or is it just a big list of stories because it probably needs regenerating or something
09:09 🔗 BlueMax did my last sentence make any sense at all
09:11 🔗 bsmith093 BlueMax omf_ ok ive sent the stuff to the server the scripts and things are in 11111fanficstuff personal ini might need a tiny tweak to be ready
09:13 🔗 BlueMax you're going to have to walk me through it like a braindead child once I have Linux Mint ready
09:14 🔗 bsmith093 ignore that here they are xaa xab xac very large text files hold on
09:24 🔗 BlueMax how much ram will it all take
09:29 🔗 Smiley looks like the final missing eps of doctor who have been found!
09:31 🔗 bsmith093 not much ram maybe half a gig
09:31 🔗 Smiley http://www.mirror.co.uk/tv/tv-news/106-doctor-who-episodes-uncovered-2343474
09:32 🔗 bsmith093 Smiley: holy fish fingers and custard!!!?!?! ALL 107?!?!?
09:33 🔗 bsmith093 omf im havung upload problems with uploading these xaa ab ac files i think the servers busy with the massive job im running
09:35 🔗 omf_ server load and RAM usage are very small
09:35 🔗 BlueMax very small? good
09:36 🔗 Smiley bsmith093: appently
09:36 🔗 bsmith093 so yeah im getting critical error 550 when trying to add in the lists to the server
09:37 🔗 bsmith093 google genlist .py i used that to make them
09:39 🔗 BlueMax I wouldn't have a clue how to do that bsmith093
09:40 🔗 bsmith093 there in the format http://www.fanfiction.net/s/1234567 omf_
09:43 🔗 bsmith093 well its almost 6AM local time , im going to bed, i'll see if i can upload them in a few hours
09:44 🔗 BlueMax well damn, could've kept my other downloads going
09:44 🔗 bsmith093 omf currently the highest number link is below 10 million
09:45 🔗 bsmith093 omf_ BlueMax bed now for me bye but still lurking
18:28 🔗 BiggieJon //join #crapd
21:48 🔗 bsmith093 omf_: how much do u have
21:49 🔗 bsmith093 its still going on my end
21:49 🔗 bsmith093 back of the envelope numbers say 6.45 days to completion
21:50 🔗 bsmith093 at 200kb/s since most of these files arent nearly bigenough to stabilize the connection speed
21:54 🔗 omf_ bsmith093, I got 25gb so far
21:55 🔗 bsmith093 holy crap, i didnt realize it was nearly that much
21:56 🔗 bsmith093 should be about 5 days then
22:00 🔗 bsmith093 omf are you sure you dont mean 2.5GB, cause 25 GB after 12 hours is 4MB/s which is nowhere near my upload speed
22:03 🔗 omf_ the -z option of rsync compresses the files before they are sent over the net
22:12 🔗 xmc rsync -z is good shit
22:12 🔗 xmc rsync should saturate a connection even with many small files
22:13 🔗 xmc if you need faster you can do "tar | ssh" or "tar | nc"
23:11 🔗 bsmith093 omf_: so there IS a compress-on-the-fly thing. huh , whoo, then
23:11 🔗 BlueMax bsmith093, willing to help me get this VM running the fanficdownloader?
23:11 🔗 bsmith093 BlueMax: sure
23:12 🔗 BlueMax alright just need to go turn it on
23:12 🔗 bsmith093 id there anywhere is can drop the list files, though? the ftp server isnt letting me put them there for some reason
23:13 🔗 BlueMax man I have no idea
23:16 🔗 BM-Linux gdau
23:16 🔗 BM-Linux *gday
23:17 🔗 BM-Linux bsmith093: I'm currently running a fresh installation of Linux Mint 15 with nothing extra installed outside of VMware Tools
23:21 🔗 bsmith093 ive compressed the full thing into 4 mb. jet files again?
23:22 🔗 BM-Linux if it works sure
23:22 🔗 BM-Linux do I need to install any packages
23:25 🔗 bsmith093 at least python 2.5
23:26 🔗 bsmith093 here the files for the downloader http://files1.jetbytes.com/eb4034b3121250a2c16ea5dbcf708f31
23:28 🔗 BM-Linux doesn't seem to want to work for me today either
23:28 🔗 bsmith093 if that failed im also using file dropper... man i cant believe how many of these free one off file uploaders there **AREN'T**
23:28 🔗 BM-Linux what about mediafire or something
23:28 🔗 BM-Linux oh
23:28 🔗 BM-Linux file dropper
23:28 🔗 BM-Linux that works
23:28 🔗 bsmith093 http://www.filedropper.com/fanficdownloader
23:29 🔗 BM-Linux is python 2.7 good enough?
23:29 🔗 BM-Linux that's what's installed on here
23:29 🔗 bsmith093 i give you a file, you give me a link, it doesn't even have to be live for more than 20min... HOW HARD IS THAT!!!!?!
23:29 🔗 bsmith093 yes
23:30 🔗 bsmith093 honestly im trying to find the deps, for this and i cant really find much, so probably just python
23:31 🔗 bsmith093 also mercurial for updating
23:34 🔗 BM-Linux mercurial is installed, files have been unzipped
23:34 🔗 bsmith093 BM-Linux: when you get that, decompress and put it in user home folder, then run screen -SL 1 ./auto.sh xaa
23:34 🔗 BM-Linux what's the expected output
23:35 🔗 BM-Linux what should I be seeing if I run it right, that is
23:35 🔗 bsmith093 the x ** files are the lists of links
23:35 🔗 bsmith093 saving file to ** filepath**
23:35 🔗 bsmith093 i use a screen session because the full run even counting where i left off, will probably take months
23:35 🔗 BM-Linux should I run as root?
23:36 🔗 bsmith093 NO
23:36 🔗 bsmith093 unnessessary
23:36 🔗 BM-Linux k
23:36 🔗 bsmith093 theres a screen log file in there thats the expected outout
23:36 🔗 bsmith093 *output
23:36 🔗 BM-Linux screen is currently not installed
23:36 🔗 BM-Linux did I miss something?
23:37 🔗 bsmith093 oh yeah get screen
23:37 🔗 GLaDOS apt-get update; apt-get install screen
23:37 🔗 GLaDOS wait, mint does have apt still, right?
23:37 🔗 bsmith093 its good for running jobs and detaching to do other things
23:37 🔗 bsmith093 seriously I HEART SCREEN
23:37 🔗 BM-Linux I just use Synaptic
23:37 🔗 bsmith093 and xargs
23:38 🔗 bsmith093 er , and bash as well, ive never used mint
23:39 🔗 BM-Linux Must run suid root for multiuser support.
23:39 🔗 BM-Linux That's after installing screen
23:39 🔗 bsmith093 BM-Linux: you're not starting from the beginning your taking over from where i stopped, so the first link should be 3 million something
23:40 🔗 bsmith093 yes i know ive never ad to run screen as root
23:41 🔗 BM-Linux So what do I do, run as root?
23:41 🔗 bsmith093 i split the full 1-10 million link file, into 3 parts, xaa is 1- 4 million xab is 4-8 million and xac is 8-10 million
23:41 🔗 bsmith093 not root
23:41 🔗 bsmith093 screen -SL 1 ./auto.sh xaa
23:42 🔗 BM-Linux Must run suid root for multiuser support.
23:42 🔗 bsmith093 idk, anyone wanna take this omf_ GLaDOS
23:43 🔗 bsmith093 oy wait are the files in the older owned by you,. i forgot that
23:44 🔗 bsmith093 sudo chown -R fanficdownloader/ (username)
23:45 🔗 BM-Linux didn't help
23:47 🔗 bsmith093 ok then what do you have for long running jobs
23:47 🔗 BM-Linux what do you mean
23:48 🔗 bsmith093 the script auto.sh will take up a terminal for several months atleast, do u mind?
23:50 🔗 BM-Linux wow, several months? not sure if I can keep a computer on for more than a couple of days
23:51 🔗 bsmith093 was that sarcasm, text doesnt cover that well
23:51 🔗 bsmith093 also how much free space do u have
23:51 🔗 BM-Linux no it wasn't
23:51 🔗 BM-Linux and this PC has about 70GB free
23:54 🔗 bsmith093 not that much in terms of power, or heat, my laptop's been doing this for a year and i havent noticed a spike in usage
23:54 🔗 GLaDOS BM-Linux: just run it as root, it's a vm.
23:55 🔗 bsmith093 GLaDOS: whats with the multiuser support screen message though?
23:55 🔗 BM-Linux Cannot identify account '.'.
23:55 🔗 BM-Linux uemaxima@bluemaxima-virtual-machine ~/fanficdownloader $ sudo screen -SL ./auto.sh xaa
23:55 🔗 bsmith093 umm, set the sticky bit, i just googled this that thats what it siad
23:55 🔗 GLaDOS sudo su; screen -SL ./auto.sh xaa
23:56 🔗 GLaDOS (note: ; doesn't work with su)
23:57 🔗 BM-Linux Cannot identify account ".".
23:59 🔗 bsmith093 ok then drop screen then, just when your shutting down note the last link downloaded and update xaa to refelct that
23:59 🔗 bsmith093 *reflect

irclogger-viewer