Time |
Nickname |
Message |
07:07
🔗
|
Pinstripe |
oh my this is larger than I thought |
07:08
🔗
|
yipdw |
everyone is in Archive Team |
07:58
🔗
|
BlueMax |
YOU ARE ALL IN ARCHIVE TEAM |
07:59
🔗
|
godane |
thats why i'm here |
07:59
🔗
|
bsmith093 |
ditto |
08:00
🔗
|
brayden_ |
Assimilated |
08:00
🔗
|
bsmith093 |
i saved ao3 (most of it) ((well... the stories anyway)) |
08:00
🔗
|
godane |
it was dl.tv not getting archive that pushed me |
08:01
🔗
|
godane |
i told you guys to archive it but then i got the where is the post saying its going down |
08:01
🔗
|
* |
brayden hasn't got the disk, or bandwidth.. or anything to archive |
08:01
🔗
|
brayden |
Only thing that has bandwidth doesn't have the disk for it :o |
08:01
🔗
|
brayden |
Damn Linode |
08:01
🔗
|
omf_ |
This is ARCHIVE TEAM. |
08:01
🔗
|
* |
omf_ kicks BlueMax off a cliff |
08:01
🔗
|
* |
BlueMax flies up and stabs omf_ in the nose |
08:01
🔗
|
brayden |
eww |
08:02
🔗
|
* |
bsmith093 rescues omf_ |
08:04
🔗
|
BlueMax |
so omf_, have you begun any progress on ff or are you going to kick me down a pit again |
08:04
🔗
|
bsmith093 |
i have approx 2.4 million stories off ff.net ~100GB, downloading 2 at once aporx one chapter/sec , its nowhere near done, but ill gladly hand off the files and downloader script ive co opted |
08:05
🔗
|
BlueMax |
one hundred gig?! holy christ man how much are you actually backing up |
08:07
🔗
|
bsmith093 |
i havent sent them anywhere yet and its been running for about 14 months on my laptop. just the stories in text form with a metadata block at the beginning. |
08:08
🔗
|
bsmith093 |
i have them saved in the file structure Fanfiction\category\status\category - author - title.txt |
08:08
🔗
|
BlueMax |
I assume you have a ton of duplicates |
08:08
🔗
|
bsmith093 |
prpbably |
08:08
🔗
|
bsmith093 |
not worth sorting them out |
08:09
🔗
|
bsmith093 |
seriously eanna take over |
08:10
🔗
|
BlueMax |
could I ask you to compress that text and see how much it comes out to with the smallest compression possible? I'd take over if I was able to confirm I could run the script constantly but the heat coming through might make it impossible |
08:10
🔗
|
BlueMax |
fucking Australia |
08:10
🔗
|
bsmith093 |
i have no idea how much it would compress, and the job would take days |
08:10
🔗
|
omf_ |
Yeah our fanfic collection is a few hundred gigs |
08:11
🔗
|
bsmith093 |
omf_: what format is it |
08:11
🔗
|
omf_ |
It actually needs its own collection |
08:11
🔗
|
omf_ |
50gb warc.gz files |
08:11
🔗
|
bsmith093 |
oy i have straight text files |
08:13
🔗
|
bsmith093 |
BlueMax: im running this http://code.google.com/p/fanficdownloader/ with a tweaked personal.ini file |
08:14
🔗
|
BlueMax |
well I guess I can try and take over |
08:14
🔗
|
BlueMax |
but my upstream is kinda shitty |
08:14
🔗
|
bsmith093 |
i give a xargs version of that a list of links and it just ticks through them |
08:15
🔗
|
omf_ |
Just remember not having the data in warc format limits its future uses. |
08:16
🔗
|
BlueMax |
if you wanna send me your personal.ini file I'll see what I can do over here |
08:16
🔗
|
BlueMax |
and how does it limit future uses omf_? |
08:16
🔗
|
bsmith093 |
omf_: im not throwing away a year long download job, give me a place to throw it |
08:16
🔗
|
omf_ |
I am not saying that |
08:16
🔗
|
omf_ |
No warcs means no wayback machine |
08:17
🔗
|
omf_ |
or warc viewers |
08:17
🔗
|
omf_ |
Plain text grabs have many uses like anything NLP which is a huge area |
08:17
🔗
|
bsmith093 |
BlueMax: waithing for transfer |
08:17
🔗
|
omf_ |
Which leads to the next question I had bsmith093. Do you have an ftp with the data for us to get it? |
08:18
🔗
|
BlueMax |
timeout |
08:18
🔗
|
bsmith093 |
i can send it to you, sure where am i sending? |
08:18
🔗
|
omf_ |
Let me create an ftp account for you |
08:18
🔗
|
BlueMax |
http://jetbytes.com will this site do? |
08:19
🔗
|
bsmith093 |
im not gonna compress because i only have 28gb left on the machine im using |
08:20
🔗
|
omf_ |
no problem |
08:20
🔗
|
omf_ |
BlueMax, as for your earlier question I am trying to write a little program that finds the new categories so we can add that to the map before doing an update grab |
08:20
🔗
|
bsmith093 |
BlueMax: http://files1.jetbytes.com/73d8ef234e2339a096f4f3601b71ee55 |
08:21
🔗
|
bsmith093 |
thats the ini file |
08:23
🔗
|
BlueMax |
man my connection is being a dick tonight, won't let me load jetbytes |
08:23
🔗
|
omf_ |
BlueMax, jetbytes is stalling out on me as well |
08:23
🔗
|
bsmith093 |
worked great for me |
08:23
🔗
|
bsmith093 |
ny usa |
08:23
🔗
|
BlueMax |
sometimes that happens |
08:23
🔗
|
BlueMax |
can you put it somewhere else? |
08:24
🔗
|
bsmith093 |
ummm where else |
08:24
🔗
|
bsmith093 |
duh pastbin |
08:24
🔗
|
omf_ |
just paste a copy into paste.archivingyoursh.it |
08:24
🔗
|
bsmith093 |
or that sure |
08:26
🔗
|
bsmith093 |
http://paste.archivingyoursh.it/nibacowidi.apache |
08:27
🔗
|
BlueMax |
503? |
08:28
🔗
|
bsmith093 |
also a bash script with this in it "cat $@ | egrep '^http:' | xargs -n1 python downloader.py -c personal.ini -f txt " no quotes |
08:29
🔗
|
bsmith093 |
ru that script with a file containing a list of links as the only argument |
08:31
🔗
|
BlueMax |
I can't get the pastebin to load |
08:31
🔗
|
BlueMax |
Also don't have a list of links |
08:31
🔗
|
BlueMax |
Oh |
08:31
🔗
|
BlueMax |
It loaded |
08:31
🔗
|
BlueMax |
My bad |
08:38
🔗
|
BlueMax |
also bollocks, bash script? don't have a linux system on hand |
08:39
🔗
|
bsmith093 |
BlueMax: also python 2 |
08:41
🔗
|
BlueMax |
ack. |
08:41
🔗
|
bsmith093 |
BlueMax: any other ftp thing besides filezilla its exploding when i try to load the queue with this much |
08:42
🔗
|
BlueMax |
on windows? |
08:42
🔗
|
bsmith093 |
ubuntu |
08:43
🔗
|
BlueMax |
man I could not tell you |
08:44
🔗
|
BlueMax |
can't upload in folders? or do you have all the files in the same folder |
08:45
🔗
|
BlueMax |
oh wait |
08:45
🔗
|
BlueMax |
you said earlier |
08:45
🔗
|
BlueMax |
yeah can you upload by category? |
08:45
🔗
|
bsmith093 |
all in one root folder fanfiction and im usign nautilus's connct to server option |
08:45
🔗
|
bsmith093 |
can rsync do this omf_ |
08:53
🔗
|
bsmith093 |
omf_: filezilla is choking to death on this folder what else do i use for this |
08:53
🔗
|
bsmith093 |
BlueMax: id love to but filezilla is choking |
08:58
🔗
|
BlueMax |
give it a good slap on the back then |
08:59
🔗
|
DFJustin |
jesus 100gb of plain text? |
08:59
🔗
|
bsmith093 |
yep |
09:00
🔗
|
DFJustin |
:D |
09:01
🔗
|
BlueMax |
that's what, over 500,000 times the size of the complete works of shakespeare? |
09:01
🔗
|
bsmith093 |
gutenberg text shakespeare is 5 mb so yes |
09:02
🔗
|
DFJustin |
I love how we're sort of accumulating all the insane packrats on the internet |
09:02
🔗
|
bsmith093 |
wait .... maybe |
09:02
🔗
|
bsmith093 |
damn straight... ive spent the past few days backing up shows to dvd, that ill probably never watch again |
09:04
🔗
|
BlueMax |
damn my HDD is too fragmented to get a good sized linux partition |
09:04
🔗
|
BlueMax |
will have to run the script through a VM |
09:06
🔗
|
BlueMax |
bsmith093, you said something about a list of links |
09:07
🔗
|
bsmith093 |
yes i have a thing to generate that or i could just email it or something, its 70 mb |
09:07
🔗
|
bsmith093 |
the list |
09:07
🔗
|
omf_ |
upload it |
09:07
🔗
|
bsmith093 |
kk |
09:08
🔗
|
BlueMax |
is there anything that changes the list or is it just a big list of stories because it probably needs regenerating or something |
09:09
🔗
|
BlueMax |
did my last sentence make any sense at all |
09:11
🔗
|
bsmith093 |
BlueMax omf_ ok ive sent the stuff to the server the scripts and things are in 11111fanficstuff personal ini might need a tiny tweak to be ready |
09:13
🔗
|
BlueMax |
you're going to have to walk me through it like a braindead child once I have Linux Mint ready |
09:14
🔗
|
bsmith093 |
ignore that here they are xaa xab xac very large text files hold on |
09:24
🔗
|
BlueMax |
how much ram will it all take |
09:29
🔗
|
Smiley |
looks like the final missing eps of doctor who have been found! |
09:31
🔗
|
bsmith093 |
not much ram maybe half a gig |
09:31
🔗
|
Smiley |
http://www.mirror.co.uk/tv/tv-news/106-doctor-who-episodes-uncovered-2343474 |
09:32
🔗
|
bsmith093 |
Smiley: holy fish fingers and custard!!!?!?! ALL 107?!?!? |
09:33
🔗
|
bsmith093 |
omf im havung upload problems with uploading these xaa ab ac files i think the servers busy with the massive job im running |
09:35
🔗
|
omf_ |
server load and RAM usage are very small |
09:35
🔗
|
BlueMax |
very small? good |
09:36
🔗
|
Smiley |
bsmith093: appently |
09:36
🔗
|
bsmith093 |
so yeah im getting critical error 550 when trying to add in the lists to the server |
09:37
🔗
|
bsmith093 |
google genlist .py i used that to make them |
09:39
🔗
|
BlueMax |
I wouldn't have a clue how to do that bsmith093 |
09:40
🔗
|
bsmith093 |
there in the format http://www.fanfiction.net/s/1234567 omf_ |
09:43
🔗
|
bsmith093 |
well its almost 6AM local time , im going to bed, i'll see if i can upload them in a few hours |
09:44
🔗
|
BlueMax |
well damn, could've kept my other downloads going |
09:44
🔗
|
bsmith093 |
omf currently the highest number link is below 10 million |
09:45
🔗
|
bsmith093 |
omf_ BlueMax bed now for me bye but still lurking |
18:28
🔗
|
BiggieJon |
//join #crapd |
21:48
🔗
|
bsmith093 |
omf_: how much do u have |
21:49
🔗
|
bsmith093 |
its still going on my end |
21:49
🔗
|
bsmith093 |
back of the envelope numbers say 6.45 days to completion |
21:50
🔗
|
bsmith093 |
at 200kb/s since most of these files arent nearly bigenough to stabilize the connection speed |
21:54
🔗
|
omf_ |
bsmith093, I got 25gb so far |
21:55
🔗
|
bsmith093 |
holy crap, i didnt realize it was nearly that much |
21:56
🔗
|
bsmith093 |
should be about 5 days then |
22:00
🔗
|
bsmith093 |
omf are you sure you dont mean 2.5GB, cause 25 GB after 12 hours is 4MB/s which is nowhere near my upload speed |
22:03
🔗
|
omf_ |
the -z option of rsync compresses the files before they are sent over the net |
22:12
🔗
|
xmc |
rsync -z is good shit |
22:12
🔗
|
xmc |
rsync should saturate a connection even with many small files |
22:13
🔗
|
xmc |
if you need faster you can do "tar | ssh" or "tar | nc" |
23:11
🔗
|
bsmith093 |
omf_: so there IS a compress-on-the-fly thing. huh , whoo, then |
23:11
🔗
|
BlueMax |
bsmith093, willing to help me get this VM running the fanficdownloader? |
23:11
🔗
|
bsmith093 |
BlueMax: sure |
23:12
🔗
|
BlueMax |
alright just need to go turn it on |
23:12
🔗
|
bsmith093 |
id there anywhere is can drop the list files, though? the ftp server isnt letting me put them there for some reason |
23:13
🔗
|
BlueMax |
man I have no idea |
23:16
🔗
|
BM-Linux |
gdau |
23:16
🔗
|
BM-Linux |
*gday |
23:17
🔗
|
BM-Linux |
bsmith093: I'm currently running a fresh installation of Linux Mint 15 with nothing extra installed outside of VMware Tools |
23:21
🔗
|
bsmith093 |
ive compressed the full thing into 4 mb. jet files again? |
23:22
🔗
|
BM-Linux |
if it works sure |
23:22
🔗
|
BM-Linux |
do I need to install any packages |
23:25
🔗
|
bsmith093 |
at least python 2.5 |
23:26
🔗
|
bsmith093 |
here the files for the downloader http://files1.jetbytes.com/eb4034b3121250a2c16ea5dbcf708f31 |
23:28
🔗
|
BM-Linux |
doesn't seem to want to work for me today either |
23:28
🔗
|
bsmith093 |
if that failed im also using file dropper... man i cant believe how many of these free one off file uploaders there **AREN'T** |
23:28
🔗
|
BM-Linux |
what about mediafire or something |
23:28
🔗
|
BM-Linux |
oh |
23:28
🔗
|
BM-Linux |
file dropper |
23:28
🔗
|
BM-Linux |
that works |
23:28
🔗
|
bsmith093 |
http://www.filedropper.com/fanficdownloader |
23:29
🔗
|
BM-Linux |
is python 2.7 good enough? |
23:29
🔗
|
BM-Linux |
that's what's installed on here |
23:29
🔗
|
bsmith093 |
i give you a file, you give me a link, it doesn't even have to be live for more than 20min... HOW HARD IS THAT!!!!?! |
23:29
🔗
|
bsmith093 |
yes |
23:30
🔗
|
bsmith093 |
honestly im trying to find the deps, for this and i cant really find much, so probably just python |
23:31
🔗
|
bsmith093 |
also mercurial for updating |
23:34
🔗
|
BM-Linux |
mercurial is installed, files have been unzipped |
23:34
🔗
|
bsmith093 |
BM-Linux: when you get that, decompress and put it in user home folder, then run screen -SL 1 ./auto.sh xaa |
23:34
🔗
|
BM-Linux |
what's the expected output |
23:35
🔗
|
BM-Linux |
what should I be seeing if I run it right, that is |
23:35
🔗
|
bsmith093 |
the x ** files are the lists of links |
23:35
🔗
|
bsmith093 |
saving file to ** filepath** |
23:35
🔗
|
bsmith093 |
i use a screen session because the full run even counting where i left off, will probably take months |
23:35
🔗
|
BM-Linux |
should I run as root? |
23:36
🔗
|
bsmith093 |
NO |
23:36
🔗
|
bsmith093 |
unnessessary |
23:36
🔗
|
BM-Linux |
k |
23:36
🔗
|
bsmith093 |
theres a screen log file in there thats the expected outout |
23:36
🔗
|
bsmith093 |
*output |
23:36
🔗
|
BM-Linux |
screen is currently not installed |
23:36
🔗
|
BM-Linux |
did I miss something? |
23:37
🔗
|
bsmith093 |
oh yeah get screen |
23:37
🔗
|
GLaDOS |
apt-get update; apt-get install screen |
23:37
🔗
|
GLaDOS |
wait, mint does have apt still, right? |
23:37
🔗
|
bsmith093 |
its good for running jobs and detaching to do other things |
23:37
🔗
|
bsmith093 |
seriously I HEART SCREEN |
23:37
🔗
|
BM-Linux |
I just use Synaptic |
23:37
🔗
|
bsmith093 |
and xargs |
23:38
🔗
|
bsmith093 |
er , and bash as well, ive never used mint |
23:39
🔗
|
BM-Linux |
Must run suid root for multiuser support. |
23:39
🔗
|
BM-Linux |
That's after installing screen |
23:39
🔗
|
bsmith093 |
BM-Linux: you're not starting from the beginning your taking over from where i stopped, so the first link should be 3 million something |
23:40
🔗
|
bsmith093 |
yes i know ive never ad to run screen as root |
23:41
🔗
|
BM-Linux |
So what do I do, run as root? |
23:41
🔗
|
bsmith093 |
i split the full 1-10 million link file, into 3 parts, xaa is 1- 4 million xab is 4-8 million and xac is 8-10 million |
23:41
🔗
|
bsmith093 |
not root |
23:41
🔗
|
bsmith093 |
screen -SL 1 ./auto.sh xaa |
23:42
🔗
|
BM-Linux |
Must run suid root for multiuser support. |
23:42
🔗
|
bsmith093 |
idk, anyone wanna take this omf_ GLaDOS |
23:43
🔗
|
bsmith093 |
oy wait are the files in the older owned by you,. i forgot that |
23:44
🔗
|
bsmith093 |
sudo chown -R fanficdownloader/ (username) |
23:45
🔗
|
BM-Linux |
didn't help |
23:47
🔗
|
bsmith093 |
ok then what do you have for long running jobs |
23:47
🔗
|
BM-Linux |
what do you mean |
23:48
🔗
|
bsmith093 |
the script auto.sh will take up a terminal for several months atleast, do u mind? |
23:50
🔗
|
BM-Linux |
wow, several months? not sure if I can keep a computer on for more than a couple of days |
23:51
🔗
|
bsmith093 |
was that sarcasm, text doesnt cover that well |
23:51
🔗
|
bsmith093 |
also how much free space do u have |
23:51
🔗
|
BM-Linux |
no it wasn't |
23:51
🔗
|
BM-Linux |
and this PC has about 70GB free |
23:54
🔗
|
bsmith093 |
not that much in terms of power, or heat, my laptop's been doing this for a year and i havent noticed a spike in usage |
23:54
🔗
|
GLaDOS |
BM-Linux: just run it as root, it's a vm. |
23:55
🔗
|
bsmith093 |
GLaDOS: whats with the multiuser support screen message though? |
23:55
🔗
|
BM-Linux |
Cannot identify account '.'. |
23:55
🔗
|
BM-Linux |
uemaxima@bluemaxima-virtual-machine ~/fanficdownloader $ sudo screen -SL ./auto.sh xaa |
23:55
🔗
|
bsmith093 |
umm, set the sticky bit, i just googled this that thats what it siad |
23:55
🔗
|
GLaDOS |
sudo su; screen -SL ./auto.sh xaa |
23:56
🔗
|
GLaDOS |
(note: ; doesn't work with su) |
23:57
🔗
|
BM-Linux |
Cannot identify account ".". |
23:59
🔗
|
bsmith093 |
ok then drop screen then, just when your shutting down note the last link downloaded and update xaa to refelct that |
23:59
🔗
|
bsmith093 |
*reflect |