| Time | Nickname | Message | 
    
        | 00:51
            
                π | SketchCow | So, unfortunately, it looks like Myspace is now doing a small transition and killing pages | 
    
        | 02:42
            
                π | bsmith093 | you know how some really advanced bulk renamers can add the parent folder(s) to the name of a file? well i need to remove that part organized like this stuff/blah/status/blah - authorname - filename.txt the only matching parts will be the "blah", and it is garanteed to be a part of the filename | 
    
        | 02:56
            
                π | instence_ | uh | 
    
        | 02:56
            
                π | instence_ | whats your before and after? | 
    
        | 02:58
            
                π | instence_ | you say matching parts, are you trying to regex match those? and only modify those files partially? or? | 
    
        | 02:59
            
                π | instence_ | an app for windows I used to rename stuff is called "ReNamer" works great | 
    
        | 03:14
            
                π | bsmith093 | instence_: before= stuff/blah/status/blah - authorname - filename.txt | 
    
        | 03:14
            
                π | bsmith093 | instence_:  after= stuff/blah/status/authorname - filename.txt | 
    
        | 03:28
            
                π | instence_ | with ReNamer that would be quite easy, but I think its a windows only app | 
    
        | 03:29
            
                π | instence_ | http://www.den4b.com/?x=downloads&product=renamer | 
    
        | 03:30
            
                π | instence_ | http://www.den4b.com/?x=screenshots&product=renamer | 
    
        | 03:31
            
                π | instence_ | you can stack rules as well | 
    
        | 03:46
            
                π | dashcloud | so, is there still a channel for fileformat wiki efforts, or it just goes here or -bs? | 
    
        | 04:45
            
                π | tuankiet | Hello eberybody! | 
    
        | 04:48
            
                π | bsmith093 | instence_: how would i do that in renamer, it runs fine in wine, so im using that for now | 
    
        | 04:55
            
                π | tuankiet | @alard: are there any projects? | 
    
        | 06:03
            
                π | Nemo_bis | SketchCow: thanks! | 
    
        | 06:15
            
                π | SketchCow | No problem. Sorry there's still a lag with me this year. | 
    
        | 06:16
            
                π | SketchCow | I'd hoped to be more archiveteam responsive, but this DEFCON documentary is kicking my aaaaaassssss | 
    
        | 08:41
            
                π | chronomex | godane: ftp.download.packardbell.com: Downloaded: 2679 files, 28G in 1d 17h 47m 55s (194 KB/s) | 
    
        | 08:42
            
                π | chronomex | now: time nice ionice -c 3 zip -vr ftp.download.packardbell.com.zip ftp.download.packardbell.com | 
    
        | 09:07
            
                π | godane | chronomex: thanks for getting it | 
    
        | 09:07
            
                π | godane | i know that would have take me forever to get | 
    
        | 09:08
            
                π | chronomex | :) | 
    
        | 09:08
            
                π | godane | and to also upload | 
    
        | 09:19
            
                π | chronomex | yeah, might take a while | 
    
        | 09:19
            
                π | chronomex | I downloaded a terabyte of ftp last month :P | 
    
        | 09:20
            
                π | Nemo_bis | chronomex: ah, 200 KB/s, lucky you :) | 
    
        | 09:21
            
                π | Nemo_bis | NATO still at 40 KB/s | 
    
        | 09:21
            
                π | Nemo_bis | 42 GiB so far | 
    
        | 09:22
            
                π | chronomex | o_O | 
    
        | 09:22
            
                π | chronomex | ftp.3gpp.org is huge | 
    
        | 09:22
            
                π | chronomex | btw. | 
    
        | 09:23
            
                π | chronomex | 350g, iirc | 
    
        | 09:23
            
                π | Nemo_bis | everything has recent timestamps there | 
    
        | 13:54
            
                π | hiker1 | To what extent does heritrix discover JavaScript and CSS? | 
    
        | 14:15
            
                π | alard | tuankiet: Well, it's time to start downloading the Yahoo blogs. | 
    
        | 14:16
            
                π | alard | hiker1: It probably downloads things referenced with <script> or <link rel="stylesheet"> tags, and I think it even has some rules to find images etc. in the actual CSS and JavaScript files. | 
    
        | 14:16
            
                π | hiker1 | How easy is it to set up? | 
    
        | 14:17
            
                π | alard | It isn't that hard, but it's unwieldy. | 
    
        | 14:17
            
                π | hiker1 | I wanted to test it on a single site. | 
    
        | 14:18
            
                π | hiker1 | I suppose it's probably not worth the hassle | 
    
        | 14:18
            
                π | ersi | Neither Heritrix or Wayback is easy to setup | 
    
        | 14:22
            
                π | hiker1 | sigh. | 
    
        | 14:22
            
                π | hiker1 | Maybe someone that knows how could release a VirtualBox image with it already installed and ready to accept a warc file? | 
    
        | 14:24
            
                π | hiker1 | alard: There is a python library called mitmproxy. Might be useful to proxy the HTTPS records: http://mitmproxy.org/ | 
    
        | 14:24
            
                π | hiker1 | Right now I am using a simple rewrite modification to warc-proxy to get them sent. | 
    
        | 14:24
            
                π | hiker1 | very, very rudimentary. | 
    
        | 14:24
            
                π | ersi | I've fiddled a little with it, and plan to maybe continue - but we'll see (RE: wayback, heritrix) | 
    
        | 14:24
            
                π | godane | so i just found a very good copy of the screen savers episode from 2003 | 
    
        | 14:25
            
                π | godane | Kevin Rose uploaded it too :-D | 
    
        | 14:25
            
                π | ersi | OH MY GOD! | 
    
        | 14:29
            
                π | godane | https://www.youtube.com/user/kevinrose | 
    
        | 14:29
            
                π | godane | i found it on his youtube channel | 
    
        | 14:29
            
                π | godane | i may have to email him so i get more episodes of tss | 
    
        | 14:31
            
                π | godane | he has about 50 episodes of the screen savers in mp4 | 
    
        | 14:31
            
                π | godane | :-D | 
    
        | 14:40
            
                π | hiker1 | WARC doesn't replay the actual browser sessions, only the traffic. Some JavaScript scripts I found appear to append a callback handle to the url that is generated at runtime based on a live JS object. WARC can not replay this behavior. | 
    
        | 14:42
            
                π | hiker1 | Technically it does archive all the information that a website outputs, but some of the information is impractical to use or view without extensive modifications to the JavaScript. | 
    
        | 14:44
            
                π | hiker1 | It makes me think of an HTML5 game http://wordsquared.com/. You can download all the traffic, but you will never be able to see the game properly I think. | 
    
        | 14:44
            
                π | alard | hiker1: And? Or are you just thinking aloud? :) | 
    
        | 14:44
            
                π | alard | You could fix individual sites, but there's no general solution, I think. | 
    
        | 14:53
            
                π | hiker1 | thinking aloud :) | 
    
        | 14:53
            
                π | hiker1 | I noticed this while attempting to archive a website just now. | 
    
        | 14:54
            
                π | tuankiet | @alard: Tracker rate limiting is in effect. Retrying after 30 seconds... :(( | 
    
        | 14:57
            
                π | alard | tuankiet: Yes, there was something wrong yesterday. I'm now gathering some files to debug with. (Until I got distracted by wordsquared just now. :) | 
    
        | 14:57
            
                π | hiker1 | hah xD | 
    
        | 15:00
            
                π | tuankiet | @alard: Oh, runnning again. I've just restarted VMs to update the code :)) | 
    
        | 15:11
            
                π | alard | Good. Found the problem: HTTP/1.1 999 Unable to process request at this time -- error 999 | 
    
        | 15:12
            
                π | alard | What's the best way to handle those? Wait and retry? | 
    
        | 15:13
            
                π | Nemo_bis | ah, as it was feared | 
    
        | 15:14
            
                π | balrog- | that means you are being throttled | 
    
        | 15:15
            
                π | balrog- | http://www.murraymoffatt.com/software-problem-0011.html | 
    
        | 15:16
            
                π | alard | It's Nemo_bis, in this case. | 
    
        | 15:17
            
                π | Nemo_bis | alard: I got that error? but I just started | 
    
        | 15:17
            
                π | balrog- | wow MS is killing messenger | 
    
        | 15:17
            
                π | Nemo_bis | I have lots of "Project code is out of date and needs to be upgraded. Retrying after 30 seconds..." | 
    
        | 15:18
            
                π | alard | Yes, I've paused the thing again. | 
    
        | 15:18
            
                π | twrist | Messenger is being integrated into skype, though. | 
    
        | 15:18
            
                π | twrist | So yeah. | 
    
        | 15:18
            
                π | alard | Nemo_bis: In the last few minutes there were 999-warcs from grue, tuankiet, and you. | 
    
        | 15:18
            
                π | Nemo_bis | hm | 
    
        | 15:19
            
                π | balrog- | twrist: yeah but the protocol, etc are going away | 
    
        | 15:19
            
                π | twrist | Ah, right. | 
    
        | 15:19
            
                π | ersi | Super old. | 
    
        | 15:19
            
                π | balrog- | alard: need to detect 999s and throttle | 
    
        | 15:19
            
                π | twrist | So, what's currently being archived? | 
    
        | 15:19
            
                π | Nemo_bis | alard: I've switched the warrior to tinyback | 
    
        | 15:19
            
                π | alard | balrog-: How long to wait? (And does saying you're from Google still work?) | 
    
        | 15:20
            
                π | balrog- | alard: I don't know, I haven't tested Γ’ΒΒΓΒ info online says 2-24 hours, but I don't know | 
    
        | 15:20
            
                π | Nemo_bis | can it be that Yahoo is suspicious because it sees activity from my IP on flickr etc. as logged in user? | 
    
        | 15:20
            
                π | Nemo_bis | it definitely can't be bandwidth in my case | 
    
        | 15:21
            
                π | tuankiet | Bad thing now | 
    
        | 15:22
            
                π | alard | Nemo_bis: Perhaps you're normally less active on Asian blogs. | 
    
        | 15:23
            
                π | twrist | Give me a git URL to clone, guys. | 
    
        | 15:23
            
                π | ersi | At what project are you guys getting HTTP 999's? | 
    
        | 15:24
            
                π | twrist | I'm itching to join in. | 
    
        | 15:24
            
                π | ersi | twrist: http://github.com/archiveteam/ | 
    
        | 15:24
            
                π | twrist | Need to be a bit more precise, I'm using ubuntu server and IRSSI | 
    
        | 15:24
            
                π | twrist | I only just started as well | 
    
        | 15:24
            
                π | * | twrist is GLaDOS, FYI | 
    
        | 15:25
            
                π | ersi | I think they're doing yahooblogs-grab right now | 
    
        | 15:25
            
                π | twrist | ah | 
    
        | 15:25
            
                π | twrist | so https://github.com/archiveteam/yahooblogs-grab.git? | 
    
        | 15:25
            
                π | ersi | yeah.. | 
    
        | 15:25
            
                π | alard | twrist: There's not much sense starting right now, we need to update the script. | 
    
        | 15:26
            
                π | alard | ersi: blog.yahoo.com | 
    
        | 15:26
            
                π | twrist | ah | 
    
        | 15:28
            
                π | tuankiet | Or using Tor so we won't have 999 again. But the speed is super low :)) | 
    
        | 15:29
            
                π | twrist | The URL I typed out isn't working. | 
    
        | 15:29
            
                π | twrist | Anyone else able to paste it in here? | 
    
        | 15:29
            
                π | Deewiant | https://github.com/ArchiveTeam/yahooblog-grab.git | 
    
        | 15:30
            
                π | twrist | ah, no s | 
    
        | 15:37
            
                π | twrist | so the arguments were --downloader=name --concurrent=6? | 
    
        | 15:43
            
                π | alard | Yes. There's a new version that should handle the 999 error better. | 
    
        | 15:54
            
                π | goekesmi | ls | 
    
        | 15:54
            
                π | * | goekesmi sighs. | 
    
        | 15:54
            
                π | hiker1 | xD | 
    
        | 16:04
            
                π | chazchaz | Is ther a channel for yahooblog-grab? | 
    
        | 16:42
            
                π | SketchCow | I suggest #yahooblah | 
    
        | 16:46
            
                π | Coderjoe | O_O yahoo blog is from yahoo korea? | 
    
        | 18:13
            
                π | alard | I think the current version of the script works better. (There are fewer 0MB items, and it's much slower.) | 
    
        | 19:09
            
                π | hiker1 | Is anyone archiving stuff from Tor? | 
    
        | 19:24
            
                π | swebb | I used tor once to auto-change my IP when grabbing some stuff from google, but it was way slow. | 
    
        | 19:25
            
                π | hiker1 | well, yeah. But there are some websites which are tor only. | 
    
        | 19:28
            
                π | * | ats raw-images an extremely dodgy floppy four times using two different Amiga drives, converts using disk-analyser, merges the resulting partial images back together giving a full image, and peers happily at the first bits of email he ever sent :) | 
    
        | 19:28
            
                π | balrog- | what are you using to merge? | 
    
        | 19:30
            
                π | ats | rawadf off aminet, patched to not complain about the number of tracks in the .eadf files disk-analyser produces | 
    
        | 19:30
            
                π | ats | I also had to patch disk-analyser to not write junk into the EADF track header structure... | 
    
        | 19:32
            
                π | ats | then disk-analyser again to turn (raw-track) EADF into (AmigaDOS-track) ADF, adfread to extract the files from the filesystem, and unar to extract the .lzx archives on the floppy | 
    
        | 19:52
            
                π | hiker1 | If anyone is bored of archiving with wget, please try my WarcMiddleware. I'd be glad to assist in setting it up. https://github.com/iramari/WarcMiddleware | 
    
        | 20:34
            
                π | Nemo_bis | alard: how do I know if I'm still collecting mostly useless 999 crap, in case I work on Yahoo? | 
    
        | 21:02
            
                π | alard | Nemo_bis: Hard to say. It shouldn't, it should retry (and print a message). | 
    
        | 21:04
            
                π | Nemo_bis | ok | 
    
        | 21:05
            
                π | Nemo_bis | TinyBack was getting ratelimited anyway | 
    
        | 21:38
            
                π | SketchCow | Nemo_bis: http://archive.org/details/magazine_rack | 
    
        | 21:38
            
                π | Nemo_bis | SketchCow: Pretty!!! | 
    
        | 21:39
            
                π | Nemo_bis | Are you going to make some of those dark? | 
    
        | 21:39
            
                π | SketchCow | Ostensibly | 
    
        | 21:40
            
                π | Nemo_bis | :) | 
    
        | 21:45
            
                π | SketchCow | Like, Wood Magazine will probably disappear. | 
    
        | 21:50
            
                π | Nemo_bis | But... children in Africa will DIE if we don't let them know how to build life-saving wood stuff, in English, on a website! | 
    
        | 21:53
            
                π | Nemo_bis | On eMule and eMule only there's also another 5 GiB archive of another woodworking magazine. Surely the same woodworking geek scanner. | 
    
        | 21:54
            
                π | chronomex | haha | 
    
        | 21:55
            
                π | SketchCow | Which one? | 
    
        | 21:55
            
                π | SketchCow | You have so many here. | 
    
        | 21:56
            
                π | SketchCow | http://archive.org/details/general_magazine | 
    
        | 21:56
            
                π | SketchCow | http://archive.org/details/woodsmith_magazin | 
    
        | 21:57
            
                π | SketchCow | http://archive.org/details/woodsmith_magazine I mean | 
    
        | 21:58
            
                π | SketchCow | How long was this uploading, Nemo_bis? | 
    
        | 21:58
            
                π | Nemo_bis | SketchCow: I don't know, a few days of work for the CSV maybe. | 
    
        | 21:58
            
                π | Nemo_bis | I didn't measure the time for download and upload in itself. | 
    
        | 22:00
            
                π | Nemo_bis | Also a few hours of trackers browsing and other searches. | 
    
        | 22:01
            
                π | Nemo_bis | http://p.defau.lt/?YTRaoQFxExjw8T612Pl_XQ | 
    
        | 22:03
            
                π | SketchCow | In the future, like godane, I can just browse your uploads and see what you haven't had pushed into a collection and make it happen. | 
    
        | 22:03
            
                π | SketchCow | Your activities also get the attention of the devs, who see it come by | 
    
        | 22:08
            
                π | * | Nemo_bis hopes not to get too many curses | 
    
        | 22:08
            
                π | Nemo_bis | I thought sending you a nice list at the end of the job was going to be helpful? | 
    
        | 22:09
            
                π | SketchCow | No. | 
    
        | 22:09
            
                π | SketchCow | Doesn't help and it actually gets caught in the spam filter | 
    
        | 22:10
            
                π | SketchCow | Because someone from italy is mailing me piles of URLs | 
    
        | 22:10
            
                π | Nemo_bis | Oh, even. | 
    
        | 22:10
            
                π | chronomex | :P | 
    
        | 22:12
            
                π | SketchCow | Also, the vorugsveta collection didn't make it through the fun | 
    
        | 22:12
            
                π | SketchCow | I'm going to make it a collection for you, but it needs more love | 
    
        | 22:12
            
                π | Nemo_bis | Yes, I noticed. | 
    
        | 22:13
            
                π | Nemo_bis | I didn't look those zips carefully enough, sorry. | 
    
        | 22:13
            
                π | SketchCow | Yeah, those things are buuuuuuuunk | 
    
        | 22:13
            
                π | SketchCow | How about I dark them all with a note to delete them? | 
    
        | 22:13
            
                π | Nemo_bis | Suggestions on how to get something useful out of a FictionBook? | 
    
        | 22:13
            
                π | Nemo_bis | I'm ok with it. | 
    
        | 22:14
            
                π | SketchCow | No, wait, this thing is valid. | 
    
        | 22:14
            
                π | SketchCow | Just not playing with our system | 
    
        | 22:14
            
                π | SketchCow | FICTIONBOOOOOOOOOK | 
    
        | 22:14
            
                π | SketchCow | Thanks, Russia | 
    
        | 22:14
            
                π | Nemo_bis | heh | 
    
        | 22:14
            
                π | Nemo_bis | It's not even well seeded, by the way. | 
    
        | 22:28
            
                π | hiker1 | Nemo_bis: What did you mean when you said make some of those dark? | 
    
        | 22:28
            
                π | SketchCow | http://archive.org/details/vokrugsveta | 
    
        | 22:28
            
                π | SketchCow | we'll see when the gods arise on that one | 
    
        | 22:29
            
                π | mistym | Nemo_bis: Wikipedia suggests Calibre can convert FictionBook to smth more conventional. | 
    
        | 22:30
            
                π | SketchCow | https://twitter.com/jefferson_bail/status/289096186420400128 | 
    
        | 22:49
            
                π | Nemo_bis | SketchCow: thanks for fixing it. I liked that tweet too, wondered what syllabus exactly. | 
    
        | 22:50
            
                π | SketchCow | I'm sure it's related to computer programming, and realizing what was done | 
    
        | 22:50
            
                π | SketchCow | I asked him to send it along. | 
    
        | 22:50
            
                π | Nemo_bis | Nice | 
    
        | 22:52
            
                π | SketchCow | By the way, the guy who wrote the wikipedia entry also wrote a scathing e-mail to archive.org about how we were the pit of evil | 
    
        | 22:52
            
                π | SketchCow | Good thing I helped bring in so much fundraising last year | 
    
        | 22:53
            
                π | SketchCow | Also: Ares Magazine is as sexy as sexy gets | 
    
        | 22:58
            
                π | SketchCow | http://archive.org/details/ares_magazine | 
    
        | 23:03
            
                π | Nemo_bis | Should still be usable, shouldn't it? With some printing perhaps. | 
    
        | 23:05
            
                π | godane | stupid question | 
    
        | 23:05
            
                π | godane | i don't know how to submit a comment on youtube | 
    
        | 23:07
            
                π | SketchCow | Goood | 
    
        | 23:08
            
                π | godane | why is that? | 
    
        | 23:08
            
                π | godane | trying to help kevin rose upload the 50 episodes of the screen savers he has | 
    
        | 23:10
            
                π | godane | this is the episode in question: https://www.youtube.com/watch?v=ZglwVT5NIJw | 
    
        | 23:10
            
                π | godane | its a episode from july 14 2003 | 
    
        | 23:11
            
                π | godane | there next to no caps for episodes in 2003 | 
    
        | 23:26
            
                π | SketchCow | Example of "I'm just gonna dark it" | 
    
        | 23:26
            
                π | SketchCow | http://www.woodworkersjournal.com/Main/Store/5_Disc_Annual_Collection_CD_Bundle_20052009_257.aspx | 
    
        | 23:30
            
                π | dashcloud | here's something interested I came across today: http://www.emsps.com/oldtools/ They buy and sell old-very old software | 
    
        | 23:31
            
                π | Nemo_bis | SketchCow: some computer magazines like Pc Open here use the PDFs of their past issues as fillers for DVDs when they don't find enough stuff, it seems. | 
    
        | 23:32
            
                π | Nemo_bis | Something like 10 % of their CD/DVDs contains either some or all past issues in PDF... | 
    
        | 23:32
            
                π | dashcloud | Linux Journal definitely does that | 
    
        | 23:38
            
                π | chronomex | nice | 
    
        | 23:46
            
                π | SketchCow | So, I don't mind being the guy making these collections, BUT | 
    
        | 23:47
            
                π | SketchCow | I'd really appreciate it if you do-gooder motherfuckers would walk the collection and find doubles and cases where we have something really shitty when there's known better versions. | 
    
        | 23:52
            
                π | Nemo_bis | SketchCow: are there more duplicates than those I told you? | 
    
        | 23:52
            
                π | Nemo_bis | (Question is pointless if email really went to spam.) | 
    
        | 23:55
            
                π | SketchCow | It did go to spam. | 
    
        | 23:58
            
                π | Nemo_bis | http://p.defau.lt/?2fxIiFNmvwaO2FBSJdn7fA | 
    
        | 23:58
            
                π | Nemo_bis | <https://archive.org/search.php?query=%22Toronto%20PET%20User%27s%20Group%22> (duplicate of <https://archive.org/details/tpug-newsletter I'm afraid) | 
    
        | 23:58
            
                π | Nemo_bis | and YourComputer which you had already spotted (and deleted, unless it was someone else) | 
    
        | 23:59
            
                π | Nemo_bis | I didn't find more in public items. |