#archiveteam 2015-05-12,Tue

↑back Search

Time Nickname Message
00:03 🔗 sirdancea has quit IRC (Read error: Operation timed out)
00:18 🔗 SketchCow Verfified - this f/win 6
00:32 🔗 primus104 has quit IRC (Leaving.)
00:39 🔗 mistym has quit IRC (Remote host closed the connection)
00:52 🔗 Nystrom has quit IRC (Ping timeout: 492 seconds)
00:54 🔗 mistym has joined #archiveteam
00:57 🔗 Nystrom has joined #archiveteam
01:28 🔗 Nystrom has quit IRC (Ping timeout: 492 seconds)
01:30 🔗 Nystrom has joined #archiveteam
01:32 🔗 Start has joined #archiveteam
01:48 🔗 Ymgve has quit IRC ()
02:00 🔗 dashcloud that seems like a sentence fragment- was it meant for this channel?
02:09 🔗 xmc i'm going to guess no
02:16 🔗 Sk1d has quit IRC (Ping timeout: 265 seconds)
02:19 🔗 Sk1d has joined #archiveteam
02:22 🔗 guest9000 has joined #archiveteam
02:28 🔗 guest9000 yipdw closure SketchCow balrog dcmorton is there a "rare stuff from the 90s that nobody can find a torrents link for" section yet, cause i have a ton of shows from that era. bill and ted, back to the future (both animated) 160-something eps of Tom and Jerry, jumanji *animated* etc, anywhere i can rsync it to?
02:32 🔗 lexicon has joined #archiveteam
02:34 🔗 lexicon has left WeeChat 1.1.1
02:35 🔗 bar_noone has joined #archiveteam
02:36 🔗 bar_noone is now known as lexicon
02:48 🔗 nertzy has joined #archiveteam
03:16 🔗 nertzy has quit IRC (This computer has gone to sleep)
03:17 🔗 SketchCow Yes, but bear in mind it could be restricted in access very quickly.
03:29 🔗 mistym has quit IRC (Remote host closed the connection)
03:57 🔗 mistym has joined #archiveteam
04:06 🔗 tsp_ has joined #archiveteam
04:08 🔗 * tsp_ heard something about massive fanfiction archive. If it's available somewhere, I'm interested in it
04:08 🔗 tsp_ Then there's this, someone's downloading fimfiction as epubs: https://www.fimfiction.net/user/Fimfarchive
04:09 🔗 guest9000 tsp that might be me i have 450+gb of it
04:10 🔗 tsp_ From where?
04:10 🔗 guest9000 tsp_: fanfiction.net
04:10 🔗 tsp_ Oh, I've got lots of that. Not as much as you do, though
04:11 🔗 guest9000 tsp_: want it?
04:11 🔗 tsp_ I'm not sure I have the space for it. How big is it compressed?
04:11 🔗 guest9000 no idea im actually still grabbing
04:11 🔗 tsp_ I'll wait a bit, it'll probably end up on archive.org like the last one
04:13 🔗 tsp_ I've got a bunch of .db files of the larger stories, been doing this for years on a small scale. But only story chapters; I can give you story ids you don't have
04:13 🔗 guest9000 speaking of, im the gut who uploaded basically all of a03, and it turns out the ultra compressed archive file i had was corrupted when i uploaded it, anyone want to try fixing that? unfortunately i no longer have the original data
04:13 🔗 tsp_ where's that?
04:14 🔗 tsp_ and what's corrupted about it? I don't think that's too fixable
04:14 🔗 tsp_ AO3's epubs are broken, they say that in their known issues page but you have to dig for it.
04:15 🔗 guest9000 tsp here https://archive.org/details/Ao3ArchiveCrawl
04:16 🔗 guest9000 and here the 7zip recovery page, that may as well be in chinese for all the good it does me http://www.7-zip.org/recover.html
04:16 🔗 tsp_ Not my prefered settings, I'd at least prefer an html like format and more raw pages, but I take what I can get. Let's see
04:16 🔗 guest9000 tsp_: txt files
04:16 🔗 tsp_ 18gb... of txt files
04:17 🔗 guest9000 sorted by category/category - author - title.txt
04:17 🔗 tsp_ How long did that take? The downloader sleps for 1s or so between chapters
04:17 🔗 tsp_ I hope you call it in parallel
04:17 🔗 tsp_ cause that thing is ultra slow
04:17 🔗 guest9000 tsp_: a few weeks i think, back when i grabbed afap
04:18 🔗 guest9000 2 at once
04:18 🔗 tsp_ what's afap?
04:18 🔗 guest9000 i have the same downloader going on fanfiction.net right now
04:18 🔗 guest9000 as fast as possible
04:18 🔗 tsp_ The sleep settings there are even longer
04:18 🔗 tsp_ You might be able to tweak them though
04:18 🔗 guest9000 its been heavily updated since then.
04:19 🔗 tsp_ Oh, fanficfare
04:19 🔗 guest9000 i realize that but its probably not going down any time soon and im in no rush. i use a quick fire and forget script
04:19 🔗 tsp_ How are you dealing with parallel downloads? I always fail at that
04:20 🔗 guest9000 ive got 9 million stpries, only took me 2 years or so.
04:20 🔗 guest9000 split the list, run 2 instances at once
04:21 🔗 tsp_ I can guess, you're going from 1 to max, without scraping the category pages to see what's there?
04:21 🔗 guest9000 2 chapters per second 600k to go, ill be done in maybe a month or so
04:21 🔗 guest9000 theres only 11 million ids, i figure thats the easiest way
04:21 🔗 tsp_ What does it do if one fails?
04:21 🔗 guest9000 notes it , goes on to the next one
04:22 🔗 tsp_ I"ve got about 700k over a few years
04:22 🔗 guest9000 i have a screen session for both threads. it logs,
04:22 🔗 guest9000 the only downside is updating the collection is gonna be a bitch
04:23 🔗 tsp_ If you include the update date in the story (it usually does), you can scrape the category pages for the story ids every few months
04:24 🔗 tsp_ if the update dates are different, or the number of chapters are different, then update that story
04:24 🔗 guest9000 actually the files are sorted by "category/status/category - author - title.txt" in that format status is either "In-Progress" or "Completed" is there a way to only grep those files?
04:24 🔗 tsp_ yeah, grep has a --from-files option
04:24 🔗 tsp_ or something
04:25 🔗 guest9000 tsp_: update date, i do.
04:25 🔗 mistym has quit IRC (Remote host closed the connection)
04:26 🔗 guest9000 it has update date, date published author story urls and summary in a block of text at the beginning of each file.
04:26 🔗 Sk1d has quit IRC (Read error: Operation timed out)
04:26 🔗 tsp_ is the story id in the file? How do you handle resumes
04:26 🔗 mistym has joined #archiveteam
04:26 🔗 tsp_ oh, duh, you can record the last id you got
04:26 🔗 guest9000 yes the story id is in the text block at the beginning, and what do you mean resumes?
04:26 🔗 tsp_ simple setup, simple solution
04:26 🔗 tsp_ like, if your script dies
04:27 🔗 guest9000 screen log
04:27 🔗 guest9000 i check every few days
04:27 🔗 aaaaaaaaa has quit IRC (Leaving)
04:27 🔗 tsp_ what version if 7zip did you use on this thing?
04:27 🔗 guest9000 realistically, if i miss a few hundred, i still have the largest and most comprehensive collection.
04:28 🔗 guest9000 err, i have no idea, the defualt that came with ubuntu ...10 i think?
04:28 🔗 guest9000 i used the manual's description of ultra settings
04:29 🔗 Sk1d has joined #archiveteam
04:32 🔗 guest9000 about the ao3 grab, i actually had an author contact me about some of her stories she had deleted and wanted back, and i had to dissappoint her because i found out the file was bad.
04:35 🔗 tsp_ have you tried the same 7z version on the ubuntu 10 box?
04:36 🔗 tsp_ I looked at the header, the first 8 bytes are ok, the rest is all 0. After that it continues
04:37 🔗 guest9000 tsp_: seriously, 900mb of zeros?
04:37 🔗 tsp_ no
04:37 🔗 tsp_ I'm not sure yet
04:38 🔗 guest9000 oh, yeah that would be stupid!, so whats there?
04:38 🔗 guest9000 ive upgraded since then, im on mint 17 now
04:38 🔗 tsp_ the first 8 bytes seem correct, then the next 24 are 0, doing soe quick math. Then a bunch of data
04:38 🔗 tsp_ So, let's see... I'll patch these bytes...
04:39 🔗 tsp_ Oh, that's described on th erecovery page as something I should do.
04:40 🔗 guest9000 see, i would have tried whatever that means, but i have no idea how youre doing that
04:41 🔗 tsp_ I'll try it locally and not in this silly vps
04:41 🔗 guest9000 use this link https://archive.org/download/Ao3ArchiveCrawl/ao3-1-700000.7z i garantee its faster
04:42 🔗 tsp_ I can only download a 500 kb/s max
04:42 🔗 tsp_ so it'll take half an hour
04:42 🔗 guest9000 ugh, i kep forgetting 100mpbs isnt as common as id like it to be
04:43 🔗 guest9000 what city u in?
04:43 🔗 tsp_ canada
04:43 🔗 tsp_ hang on, oh here we go
04:43 🔗 guest9000 tsp_: what, find something?
04:44 🔗 tsp_ this situation can also happen if the archiving was interrupted for some reason
04:44 🔗 guest9000 im reasonbly sure i just let it run overnight, but this was years ago
04:44 🔗 tsp_ this is like reading a google translated page
04:47 🔗 guest9000 now, i really wish id backed up the data somewhere, i have plenty of space now!
04:57 🔗 tsp_ I think the archive got interrupted before it finished
04:57 🔗 tsp_ but could be wrong
04:58 🔗 guest9000 so i'm most likely screwed on ever seeing any of that data then?
04:59 🔗 tsp_ There's a complicated recovery process that might get some of it back, but I'm honestly not that good with a hex editor and byte offsets to pull it off
04:59 🔗 tsp_ your best bet is to simply scrape ao3 again. Not ideal, but better than nothing
05:20 🔗 mutoso has quit IRC (Quit: leaving)
05:33 🔗 tsp_ guest9000: These things all end with "End file."?
05:45 🔗 DFJustin there are some pretty crackerjack nerds in here, stick around and someone might come to the rescue
05:55 🔗 guest9000 tsp_: youve got some valid text output?!
05:56 🔗 guest9000 tsp_: and yes that was the default end for the scraper and i just left it.
05:56 🔗 Lord_Nigh ok, the famitracker old forums definitely are closing
05:56 🔗 tsp_ Yeah, I got a bunch of fanfics. Problem, they're all in one giant file, and I have to run it again because my dummy file was too small.
05:57 🔗 guest9000 damn, i was afraid of that. are the names of the files preserved somewhere?
05:57 🔗 Lord_Nigh http://famitracker.com/forum/ is closing
05:57 🔗 tsp_ as the recovery page said, we can't really fix the giant file issue, but let's see what I can get out of it first with a 20gb file.
05:57 🔗 Lord_Nigh the new forums are forums.famitracker.com
05:58 🔗 guest9000 its all there?! *manly sqee*
05:58 🔗 tsp_ nope, they're not. But you have the titles, authors, categories, status, and the End file marker
05:58 🔗 tsp_ I'm trying a 20gb file. I doubt I'll get 20gb of data
05:58 🔗 tsp_ because there's no way 18gb can fit into a 900mb archive
05:58 🔗 tsp_ Well, ok, maybe, but my guess is not
05:59 🔗 tsp_ it'll take a few hours for this to compress.
05:59 🔗 guest9000 honestly , its just a crapload of text, i figured that was just how good ultra was
05:59 🔗 tsp_ rar beat 7z last I checked on text
06:00 🔗 guest9000 in hindsight 900mb is only 5% of 18gb, so that ratio would be pretty amazing
06:01 🔗 tsp_ Ah well, I'll send you the big text file once I get it, you can write a script to parse the stuff out of it you want
06:02 🔗 guest9000 tsp_: where the hell are you going to upoad that to?
06:03 🔗 xmc archive.org ? :)
06:03 🔗 guest9000 duh
06:03 🔗 tsp_ my website, dropbox, wherever I can squeeze it in
06:03 🔗 tsp_ noone wants a big text stream in the form it's going to come out as
06:03 🔗 guest9000 its 2 am local time, i should probably serioulsly consider going to bed
06:07 🔗 Lord_Nigh sleep is for the weak
06:08 🔗 guest9000 has quit IRC (http://www.mibbit.com ajax IRC Client)
06:08 🔗 xmc sleep is for the tired
06:09 🔗 guest9000 has joined #archiveteam
06:10 🔗 bsmith093 has joined #archiveteam
06:11 🔗 bsmith093 has quit IRC (Client Quit)
06:11 🔗 bsmith093 has joined #archiveteam
06:13 🔗 guest9000 SketchCow Lord_Nigh xmc chfoo tsp_ working on a huge a03 archive recovery any ideas on parsing?
06:13 🔗 xmc parsing what?
06:13 🔗 Lord_Nigh a03? what's that?
06:13 🔗 tsp_ AO3, archiveofourown
06:14 🔗 guest9000 the 20gb file my ao3 crawl turned into from a bad compression apparently
06:14 🔗 tsp_ it didn't turn into 20gb, yet
06:14 🔗 tsp_ basicly the 7zip recovery page says: make an archive bigger than the existing one, split it, put yours in place of that, extract it, and see what you get
06:14 🔗 xmc huh
06:15 🔗 guest9000 tsp_: still might as well get ideas flowing, any eta, also seriously tsp_ THANK YOU SOOOO MUCH! :)
06:15 🔗 tsp_ 1gb gave me 1gb of text, so 20gb should give me... however much text this 947mb archive holds
06:16 🔗 tsp_ 14% compressing. I need to compress a 20gb file of /dev/urandom first, then split it, put the block of data in, extract
06:16 🔗 tsp_ what a silly way to do it, you can't just hex edit the expected size in?
06:16 🔗 bsmith093 tsp_, would dev/zero be faster?
06:17 🔗 tsp_ That wouldn't get the desired effect. "We must create new "good" 7z archive with same method as in bad.7z, and new archive must be much larger than bad.7z"
06:18 🔗 tsp_ I take that to mean I can't just cheat and use /dev/zero
06:19 🔗 primus104 has joined #archiveteam
06:19 🔗 bsmith093 the really sad part is there's an inventory file of every single file that was supposed to be in this grab, so i'll know what's missing
06:21 🔗 tsp_ Just grab everything not deleted again. You have the work urls
06:22 🔗 guest9000 yeah but i know stuff been deleted between now and then, and i'm one of those anal retentive nerds. oh well, youre right better than nothing
06:28 🔗 MMovie2 has joined #archiveteam
06:28 🔗 SketchCow What
06:30 🔗 MMovie has quit IRC (Ping timeout: 306 seconds)
06:37 🔗 guest9000 has quit IRC (http://www.mibbit.com ajax IRC Client)
06:41 🔗 bsmith093 tsp_, restarted the grab with fanficfare, and the old config file, should be done in a month or two, they have 4.9million stories now
06:41 🔗 bsmith093 going to bed
06:43 🔗 SketchCow bsmith093: boop
06:50 🔗 primus104 has quit IRC (Leaving.)
06:51 🔗 garyrh has quit IRC (http://bnc4free.com/)
06:51 🔗 garyrh has joined #archiveteam
07:01 🔗 MMovie2 has quit IRC (Read error: Connection reset by peer)
07:04 🔗 MMovie has joined #archiveteam
07:04 🔗 mistym has quit IRC (Remote host closed the connection)
07:09 🔗 MMovie has quit IRC (Ping timeout: 306 seconds)
07:16 🔗 MMovie has joined #archiveteam
07:19 🔗 atomotic has joined #archiveteam
07:20 🔗 schbirid has joined #archiveteam
07:22 🔗 MMovie has quit IRC (Ping timeout: 306 seconds)
07:33 🔗 primus104 has joined #archiveteam
07:34 🔗 MMovie has joined #archiveteam
08:05 🔗 mistym has joined #archiveteam
08:09 🔗 rolf has joined #archiveteam
08:10 🔗 rejon has quit IRC (Read error: Operation timed out)
08:11 🔗 mistym has quit IRC (Read error: Operation timed out)
08:25 🔗 primus104 has quit IRC (Leaving.)
08:26 🔗 rejon has joined #archiveteam
08:44 🔗 rejon has quit IRC (Ping timeout: 362 seconds)
08:48 🔗 DopefishJ has joined #archiveteam
08:56 🔗 DFJustin has quit IRC (Ping timeout: 740 seconds)
08:58 🔗 vOYtEC_ has joined #archiveteam
09:00 🔗 vOYtEC has quit IRC (Read error: Connection reset by peer)
09:00 🔗 rejon has joined #archiveteam
09:11 🔗 rolf has quit IRC (Leaving...)
09:49 🔗 primus104 has joined #archiveteam
10:00 🔗 BlueMaxim has quit IRC (Quit: Leaving)
10:07 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
10:07 🔗 mistym has joined #archiveteam
10:16 🔗 mistym has quit IRC (Read error: Operation timed out)
10:28 🔗 rolf has joined #archiveteam
10:37 🔗 Ymgve has joined #archiveteam
10:42 🔗 scyther has joined #archiveteam
10:58 🔗 signius has quit IRC (Ping timeout: 252 seconds)
11:09 🔗 dan_ has quit IRC (Ping timeout: 252 seconds)
11:11 🔗 signius has joined #archiveteam
11:28 🔗 dashcloud from the #aohell channel: https://twitter.com/AP/status/598081874146238465 (Verizon is buying AOL)
11:38 🔗 antomati_ has joined #archiveteam
11:40 🔗 antomatic has quit IRC (Read error: Operation timed out)
12:08 🔗 mistym has joined #archiveteam
12:12 🔗 dan_ has joined #archiveteam
12:17 🔗 mistym has quit IRC (Ping timeout: 512 seconds)
12:33 🔗 midas great, so now ill get some verizon cd's with 24 hours of internet
13:01 🔗 sankin has joined #archiveteam
13:12 🔗 atomotic has joined #archiveteam
13:23 🔗 primus104 has quit IRC (Leaving.)
13:26 🔗 sankin has quit IRC (Leaving.)
13:48 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
13:52 🔗 sankin has joined #archiveteam
14:03 🔗 nertzy has joined #archiveteam
14:06 🔗 scyther has quit IRC (Read error: Connection reset by peer)
14:10 🔗 mistym has joined #archiveteam
14:15 🔗 mistym has quit IRC (Ping timeout: 252 seconds)
14:18 🔗 DopefishJ is now known as DFJustin
14:19 🔗 rolf has quit IRC (Leaving...)
14:40 🔗 mistym has joined #archiveteam
14:54 🔗 Mayonaise has quit IRC (Ping timeout: 362 seconds)
15:05 🔗 Mayonaise has joined #archiveteam
15:08 🔗 Start has quit IRC (Disconnected.)
15:21 🔗 phillipsj dashcloud I have some comodore 5??? dirves (they tend to go out of alignment -- they came with an article explaining a quick and dirty "fix") I also have many 5¼ inch floppy dirves and 3½ inch floppy drives (which I may want to put into active use for "secure boot" purposes)
15:23 🔗 phillipsj My rarest drive is probably a 270MB 3.5" disk cartridge drive (syquest)? I may have damged rather than fixed the heads by trying to clean them with a cotton swab though.
15:36 🔗 dan_ has quit IRC (Ping timeout: 252 seconds)
15:44 🔗 dan_ has joined #archiveteam
15:44 🔗 sunny256_ has quit IRC (Read error: Connection reset by peer)
15:45 🔗 nertzy has quit IRC (Read error: Operation timed out)
15:47 🔗 nertzy has joined #archiveteam
15:51 🔗 mistym has quit IRC (Remote host closed the connection)
15:51 🔗 primus104 has joined #archiveteam
15:55 🔗 DFJustin has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 Jonimus has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 wp494 has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 xtr-201 has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 Smiley has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 SketchCow has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 RedType has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 twrist has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 SadDM has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 sivoais has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 useretail has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 dx- has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 mr-b has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 thefinn93 has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 dugo_ has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 jk[SVP] has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 chfoo- has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 offby1 has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 Selanda has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 matthusby has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 rduser has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 underscor has quit IRC (ircd.shaw.ca irc.shaw.ca)
15:55 🔗 NotGLaDOS has joined #archiveteam
15:57 🔗 _vOYtEC has joined #archiveteam
15:57 🔗 RedType_ has joined #archiveteam
15:58 🔗 dugo has joined #archiveteam
15:58 🔗 primus104 has quit IRC (Leaving.)
15:59 🔗 SmileyG has joined #archiveteam
16:02 🔗 vOYtEC_ has quit IRC (Read error: Connection reset by peer)
16:03 🔗 PepsiMax has quit IRC (Ping timeout: 265 seconds)
16:03 🔗 Deewiant has quit IRC (Ping timeout: 265 seconds)
16:03 🔗 DFJustin has joined #archiveteam
16:03 🔗 Jonimus has joined #archiveteam
16:03 🔗 wp494 has joined #archiveteam
16:03 🔗 SadDM has joined #archiveteam
16:03 🔗 useretail has joined #archiveteam
16:03 🔗 dx- has joined #archiveteam
16:03 🔗 mr-b has joined #archiveteam
16:03 🔗 thefinn93 has joined #archiveteam
16:03 🔗 jk[SVP] has joined #archiveteam
16:03 🔗 chfoo- has joined #archiveteam
16:03 🔗 offby1 has joined #archiveteam
16:03 🔗 Selanda has joined #archiveteam
16:03 🔗 matthusby has joined #archiveteam
16:03 🔗 rduser has joined #archiveteam
16:03 🔗 underscor has joined #archiveteam
16:03 🔗 irc.shaw.ca sets mode: +oo SadDM underscor
16:03 🔗 Deewiant has joined #archiveteam
16:03 🔗 nico_32 has quit IRC (Ping timeout: 265 seconds)
16:04 🔗 PepsiMax has joined #archiveteam
16:05 🔗 mistym has joined #archiveteam
16:06 🔗 dan_ has quit IRC (Ping timeout: 252 seconds)
16:07 🔗 dan_ has joined #archiveteam
16:09 🔗 nico_32 has joined #archiveteam
16:13 🔗 Start has joined #archiveteam
16:17 🔗 sivoais has joined #archiveteam
16:31 🔗 SimpBrain has joined #archiveteam
16:45 🔗 Start has quit IRC (Disconnected.)
16:48 🔗 philpem has joined #archiveteam
16:50 🔗 Start has joined #archiveteam
16:51 🔗 xmc midas: but, with verizon math, it'll actually only be 24 minutes
16:59 🔗 SketchCow has joined #archiveteam
16:59 🔗 GLaDOS sets mode: +o SketchCow
17:02 🔗 scyther has joined #archiveteam
17:06 🔗 nertzy has quit IRC (Quit: This computer has gone to sleep)
17:29 🔗 xmc sets mode: +o swebb
17:29 🔗 swebb sets mode: +o DFJustin
17:42 🔗 Start has quit IRC (Disconnected.)
17:45 🔗 SketchCow bsmith093: Hey there. So I've been packing up the fan fiction collection.
17:45 🔗 pwnsrv has joined #archiveteam
17:45 🔗 SketchCow It's big. I understand if another comes down the line.
17:51 🔗 aaaaaaaaa has joined #archiveteam
17:58 🔗 mistym has quit IRC (Remote host closed the connection)
17:59 🔗 mistym has joined #archiveteam
18:35 🔗 rolf has joined #archiveteam
18:36 🔗 habi has joined #archiveteam
18:37 🔗 caber has quit IRC (Quit: Kids: talk with your parents about ad-blockers, and, at some point; social media. But fundamentals first!)
18:44 🔗 Start has joined #archiveteam
18:44 🔗 caber has joined #archiveteam
18:44 🔗 Start has quit IRC (Client Quit)
18:45 🔗 Start has joined #archiveteam
18:50 🔗 habi has quit IRC (Quit: Leaving.)
18:50 🔗 habi has joined #archiveteam
18:52 🔗 Start has quit IRC (Ping timeout: 370 seconds)
18:54 🔗 rolf has quit IRC (Leaving...)
18:56 🔗 habi has left
19:03 🔗 rolf has joined #archiveteam
19:03 🔗 rolf has quit IRC (Client Quit)
19:08 🔗 Nystrom has quit IRC (- nbs-irc 2.39 - www.nbs-irc.net -)
19:27 🔗 aaaaaaaaa has quit IRC (Leaving)
19:34 🔗 rolf has joined #archiveteam
19:36 🔗 aaaaaaaaa has joined #archiveteam
19:40 🔗 rolf has quit IRC (Leaving...)
19:40 🔗 wm_ has joined #archiveteam
19:41 🔗 primus104 has joined #archiveteam
19:54 🔗 SN4T14__ has joined #archiveteam
20:00 🔗 SN4T14_ has quit IRC (Ping timeout: 369 seconds)
20:02 🔗 bsmith094 has joined #archiveteam
20:03 🔗 scyther has quit IRC (Leaving)
20:13 🔗 bsmith094 SketchCow: i'm still running that actually, i havent sent anything up in a while, but its almost caught up, 600k ids to go
20:20 🔗 bsmith094 tsp_: restarted the ao3 scraper about 10 hours ago, rough ETC ~4.5 months
20:20 🔗 bsmith094 tsp_: theyve been very busy
20:21 🔗 tsp_ bsmith094: I sent you a pm, well, 93 a pm
20:24 🔗 bsmith093 got it, downloading
20:27 🔗 tsp_ I don't think my script screwd up, things seem to be where they're supposed to be
20:28 🔗 bsmith094 its opening, so all one file or split?
20:32 🔗 bsmith094 tsp_: oh i see, its the same as i compressed it! awesome, how much is there?
20:33 🔗 tsp_ 4gb or so. I used the inventory file to reconstruct the filenames based on the work ids, and split at End file.
20:34 🔗 bsmith094 i knew that inventory was a good idea, go past-me!
20:34 🔗 bsmith094 i figured anyone who grabbed it , would like to have a list of what was there
20:35 🔗 bsmith094 how many files?
20:37 🔗 tsp_ 140179
20:40 🔗 bsmith094 140179?700000 is about 20.02% so better than nothing! thanks :)
20:40 🔗 tsp_ np
20:42 🔗 bsmith094 merging now, 48 minutes to go
20:42 🔗 tsp_ merging? With your current download? YOu should do that after you're done
20:43 🔗 tsp_ you only want to merge if the story doesn't exist already on the site
20:45 🔗 bsmith094 this way at least , i have something, a more complete collection
20:53 🔗 sankin has quit IRC (Leaving.)
20:54 🔗 tsp_ If you merge now and download any story you don't have, you won't be able to update them easily. IF you merge later, you'll be able to only merge what doesn't already exist in what you downloaded, which is better IMO
20:59 🔗 bsmith094 crap, it just finished the merge. ah well redo only lost 12 hours
21:11 🔗 bsmith094 has quit IRC (http://www.mibbit.com ajax IRC Client)
21:12 🔗 rolf has joined #archiveteam
21:16 🔗 BlueMaxim has joined #archiveteam
21:17 🔗 SketchCow bsmith093: So you're saying I need to stop packing.
21:17 🔗 SketchCow And will have to pack it later.
21:17 🔗 SketchCow Doing so.
21:18 🔗 SketchCow That's the problem with the FTP. Some people are working on things for months and others are doing it then walking away, then annoyed I don't mend-meld know they're finished.
21:26 🔗 rolf has quit IRC (Leaving...)
21:27 🔗 rolf has joined #archiveteam
21:36 🔗 rolf has quit IRC (Leaving...)
21:45 🔗 phillipsj BTW I thought of other possibly rare hardware I have: 8 track tape player and Record player capable of playing 78s. Archiving commercial music (or video) sounds like a pain though.
22:00 🔗 phillipsj has quit IRC (Read error: Operation timed out)
22:02 🔗 bsmith093 SketchCow, sorry, you can finish if you want, call it volume 1 or something, just thought i should tell you its not yet complete.
22:03 🔗 bsmith093 SketchCow, its your space, and i appreciate you letting me use it :)
22:03 🔗 SketchCow I have killed it and so let me know when you're done.
22:18 🔗 nwf has quit IRC (Read error: Operation timed out)
22:20 🔗 josephroo has quit IRC (Read error: Operation timed out)
22:20 🔗 vegbrasil has quit IRC (Read error: Operation timed out)
22:21 🔗 marvinw has quit IRC (Read error: Operation timed out)
22:21 🔗 Froggypwn has quit IRC (Read error: Operation timed out)
22:22 🔗 mistym_ has joined #archiveteam
22:23 🔗 sep332 has quit IRC (Read error: Operation timed out)
22:23 🔗 S[h]O[r]T has quit IRC (Read error: Operation timed out)
22:24 🔗 mistym has quit IRC (Read error: Connection reset by peer)
22:24 🔗 Ctrl-S has quit IRC (Read error: Operation timed out)
22:24 🔗 Ctrl-S_ is now known as Ctrl-S
22:24 🔗 S[h]O[r]T has joined #archiveteam
22:24 🔗 Control-S has joined #archiveteam
22:24 🔗 aaaaaaaaa has quit IRC (Read error: Operation timed out)
22:24 🔗 bsmith095 has joined #archiveteam
22:24 🔗 aMunster has quit IRC (Read error: Connection reset by peer)
22:25 🔗 aMunster has joined #archiveteam
22:25 🔗 Froggypwn has joined #archiveteam
22:25 🔗 josephroo has joined #archiveteam
22:25 🔗 lrkj has quit IRC (Remote host closed the connection)
22:25 🔗 vegbrasil has joined #archiveteam
22:25 🔗 aaaaaaaaa has joined #archiveteam
22:25 🔗 sep332 has joined #archiveteam
22:26 🔗 marvinw has joined #archiveteam
22:26 🔗 bsmith095 SketchCow: should be mostly done is a few weeks or so
22:27 🔗 SketchCow Great
22:27 🔗 nwf has joined #archiveteam
22:27 🔗 lrkj has joined #archiveteam
22:34 🔗 xtr-201 has joined #archiveteam
22:36 🔗 Emcy_ has joined #archiveteam
22:38 🔗 NotGLaDOS has quit IRC (Ping timeout: 240 seconds)
22:38 🔗 twrist has joined #archiveteam
22:39 🔗 Emcy has quit IRC (Ping timeout: 240 seconds)
22:42 🔗 Control-S has quit IRC (Read error: Connection reset by peer)
22:42 🔗 achip has quit IRC (Read error: Operation timed out)
22:43 🔗 achip has joined #archiveteam
22:43 🔗 Control-S has joined #archiveteam
22:46 🔗 Froggypwn has quit IRC (Read error: Operation timed out)
22:46 🔗 lrkj has quit IRC (Read error: Connection reset by peer)
22:46 🔗 godane has quit IRC (Quit: Leaving.)
22:47 🔗 Control-S has quit IRC (Read error: Connection reset by peer)
22:49 🔗 Froggypwn has joined #archiveteam
22:49 🔗 nwf has quit IRC (Ping timeout: 600 seconds)
22:49 🔗 aaaaaaaaa has quit IRC (Read error: Operation timed out)
22:50 🔗 yotta has quit IRC (Ping timeout: 600 seconds)
22:51 🔗 josephroo has quit IRC (Ping timeout: 600 seconds)
22:52 🔗 aaaaaaaaa has joined #archiveteam
22:53 🔗 sep332 has quit IRC (Ping timeout: 600 seconds)
22:53 🔗 aMunster has quit IRC (Ping timeout: 600 seconds)
22:55 🔗 lrkj has joined #archiveteam
22:55 🔗 S[h]O[r]T has quit IRC (Ping timeout: 600 seconds)
22:56 🔗 Control-S has joined #archiveteam
22:56 🔗 aMunster has joined #archiveteam
22:56 🔗 josephroo has joined #archiveteam
22:56 🔗 nwf has joined #archiveteam
22:56 🔗 yotta has joined #archiveteam
22:56 🔗 S[h]O[r]T has joined #archiveteam
22:59 🔗 sep332 has joined #archiveteam
23:04 🔗 Start has joined #archiveteam
23:06 🔗 Froggypwn has quit IRC (Ping timeout: 265 seconds)
23:07 🔗 Froggypwn has joined #archiveteam
23:15 🔗 xtr-201 has quit IRC (Ping timeout: 370 seconds)
23:19 🔗 philpem has quit IRC (Ping timeout: 252 seconds)
23:57 🔗 Ymgve has quit IRC ()

irclogger-viewer