[01:21] should I set the warrior to yahoo or leave it on archiveteam's choice? [01:22] yahoo would get you banned almost instantly the last time I checked [01:40] yep, rate limited in ~5min [01:40] the question is, is it worth it to keep it on yahoo given the short deadline? [01:41] Yahoo is going slowly - the more people on it, the more we'll save before it's over [01:42] in that case I'll leave it going [04:25] http://adfoc.us/13353922321031 [04:26] yep, sounds exactly like a spambot [05:13] Can I ask what the icon by some people's nicknames on the tracker means? [05:15] downloading with the warrior as opposed to installing the scripts yourself [05:16] Ah. Cool, thanks. [05:17] And can I ask how bad the yahoo situation is? Are all the items on the tracker? [05:22] 19gb of 109gb of the 4chandata downloaded. This is going to take a few days [05:24] tyn: the last I checked, pretty bad. the last time I ran the YM project, I got banned quite quickly [05:24] and I heard there were ~12.7M threads [05:24] About cleaning up the wiki. Are we just going to lock the old pages? Maybe we should move them into a static site on github so the data is still available but not in the wiki directly [05:25] I only see ~11.6K threads [05:33] the units on the yahoo tracker are whole subforums [05:33] or at least the forums- ones anyway [06:17] today I found out I was overlooking a header in s3 efforts called "size-hint" [06:17] And that it's best to throw something in there, because then it won't shove my 50gb file into a 40gb partition [06:22] The CEO of a company that does blog and posting recording asked me to support their open access coalition. [06:22] The coalition is them and others making sure places get access to all this crunchy SEO data. [06:22] So I asked him instead what it'll take to get copies of all THEIR data to archive.org, with a time-shift [06:22] Let's see what happens! [06:23] http://spinn3r.com/ is the company [06:24] That would be a nice end run around Google, now wouldn't it. [06:27] ......and he said yes. [06:27] How big are we talking [06:27] they have been around since 2005 [06:27] So while other people are handwringing over Google Reader's feed loss, I do believe I just got quite a bit of data. [06:27] They claim 200gb a day [06:27] Yeah that is sweet [06:28] I had never heard of that company before and I cannot find pricing for them on the site [06:28] http://www.spinn3r.com/savings [06:29] Spinn3r indexes 150GB of content per month. We maintain 18 months of archives. [06:30] http://en.linuxreviews.org/Spinn3r [06:31] nice [06:32] The savings page has no real data. Just some numbers they threw up as "costs" [06:32] Oh, I know. [06:37] The real test is when they start putting data into the IA, and how well kept it is [06:41] "Anyway. to gain access you would have to use our client. It's in Java but pretty easy to setup. It doesn't write ARC format. It uses our own proprietary format." [06:41] From a letter he just sent. [06:41] Obviously, I will ask for assistance from you maniacs to split his format apart [06:41] I bet it is just fucking text csv or retard xml [06:42] I hate when companies create formats for no good reason [06:45] omf_: it's PROPRIETARY [06:45] hence, advanced [06:45] :) [06:45] duh [06:45] that got a good laugh out of me. I have seen sales people imply that before [06:46] every time I see companies advertise with the word proprietary, patented, etc, it's just like.... this is supposed to be a plus? [06:46] before it was [06:46] now people want access [06:50] We still have plenty of sites to go for #ispygames. I am handing out copy and paste wget commands to make it easier to contribute [06:50] ooh yes, gimme [06:59] Oh, wow. [06:59] They only have the last 60 days of blogs. [06:59] They deleted the rest. [06:59] Now that's a shame. [07:03] How the fuck is that useful for long term analytics? [07:03] That is one of their boasting points [07:10] adhdlytics [07:13] Oh I keep forgetting about that field. [07:13] omf_: the internet moves fast, and if your company doesn't give me money [07:14] YOU'LL BE LEFT BEHIND [07:14] look a chart [07:14] http://nooooooooooooooo.com/ [07:29] 454447.8 / 583096.0 MB Rate: 25062.7 / 2416.1 KB Uploaded: 2478394.0 MB [77%] 0d 15:08 [ R: 5.45] [07:29] InternetCensus2012 [07:29] Only 15 hours left! [07:29] hey SketchCow [07:29] hey [07:29] i got december's episodes of wilkow uploaded [07:30] Excellent [07:30] you guys are not lose that [07:30] trying to uploaded before backing up to bluray [07:33] most of jan 2013 episodes of wilkow are going up too [07:33] i'm only upload up to jan 25 cause that all i could get up to with this bluray back up [07:34] also we really need to get people working on 400TB like bluray [07:35] i say that cause it could last as long as cds did if they can make it onto the market in the next 3 to 5 years [08:43] SketchCow: Will you put the internet census up on IA? [09:58] Yes [09:58] I will need to split it up to a few items. [10:05] The fun continues. [10:05] 478098.4 / 583096.0 MB Rate: 24641.8 / 3219.7 KB Uploaded: 2656925.3 MB [81%] 0d 9:16 [ R: 5.56] [10:05] InternetCensus2012 [10:06] that thing is awesome [10:11] Quick, everyone join #archiveteam [10:11] Erm, #archivist [10:55] p.s. on the side, I'm backing up the CD-ROMs [11:34] 76gb left on the 4data [20:18] ha, this torrent has uploaded 3.4 terabytes to others. [20:18] I think I'll leave it running for a while afterwards. It obviously needs the seeds. [20:18] SketchCow: Are you finished with punchfork? [20:19] SketchCow: the geocities seeds appently disappeared [20:19] GLaDOS: was trying to get a copy to keep it seeded and then he said the last one disappeared. [20:20] I assume it would be possible to add the IA as webseed for the torrent [20:21] we assumed they were the last seed. [20:21] :/ [20:24] problem now is there is no seeders to get a copy from [20:25] Well assuming your client supports web seeds it's just a matter of loading an updated torrent file [20:30] alard: NEARLY done with punchfork. [20:31] SketchCow: Ah. That's good to know, thanks. I'll wait with the index-making. [20:31] The yahoo-blogs done? [20:32] root@teamarchive-1:/1/ALARD/warrior# du -sh punch* [20:32] 130G punchfork-user [20:32] 49G punchfork-date [20:33] So, the user and date ones are not done. [20:33] Wait, date is done. [20:41] hi, there doesn't happen to be a possibility to resume my archiveteam-warrior where it left off (I would like to turn off my computer but the current job isn't finished yet) [20:41] you can usually suspend a virtual machine, what are you running it in? [20:42] in virtualbox, will the resuming work? [20:44] i.e. is the "WgetDownload" part fault resilient enough, to resume even if I will have another ip then [20:45] adding: punchfork-userpages-1/ (stored 0%) [20:45] adding: punchfork-userpages-1/punchfork.com-user-Astorga-20130217-084553.zip (deflated 21%) [20:45] adding: punchfork-userpages-1/punchfork.com-user-amanda467-20130220-070146.zip (deflated 22%) [20:45] adding: punchfork-userpages-1/punchfork.com-user-KrystlF-20130303-171612.zip (deflated 20%) [20:45] adding: punchfork-userpages-1/punchfork.com-user-Trubby-20130222-231046.zip (deflated 18%) [20:45] That's a little odd, alard: I'm zipping the zips and getting 20% reduction? [20:47] urgato: yeah it won't know. [20:47] So it *should* work, don't worry if it doesn't though [20:47] we have a track of which users have completed. [20:47] SketchCow: If I am not mistaken, you did a talk on digital foot print (via Twitter & Facebook) at HOPE a few years ago? [20:47] or was this someone else? [20:49] Smiley: okay thanks, I wouldn't worry, it just would be a bit of a waste to just discard 6000 items/~400MB, that's why I want to resume [20:49] i know that feeling. [20:50] urgato: just important to pause/suspend the warrior. [20:50] not reboot it. [20:50] yes, thanks [20:54] SketchCow: That's not strange, I think. ZIP compresses each file separately, so if you compress it again (as one file) there'll be more duplication. [21:00] It's also possible that the Python zipfile library didn't compress anything. [21:59] Hey how do I see the archive for http://repo.opensolaris.org/ ? [22:25] I have a question and maybe want to start a discussion. I was thinking that since HD-DVDs had a feature similar to Blu-ray's BD-Live known as HDi Advanced Content. And that HD-DVD lost the 'format war.' if there was a way to wget/download the sites/content and preserve it so one day when the HDi content for every movie is gone, a user can still somehow... [22:25] access it. [22:26] If the HDi content is still up in the first place. [22:26] The project/task sounds ambitious probably, but just a thought/idea [22:27] Here are some links: http://en.wikipedia.org/wiki/Advanced_Content [22:27] http://en.wikipedia.org/wiki/HDi_(interactivity) [22:27] But what do you think? [22:31] I have a Toshiba Player along with an Xbox 360 HD-DVD that supports HDi Advanced Content. And will eventually buy a HD-DVD drive for my PC to rip the discs. Right now though, I am ripping all my Dad's Vinyl. [22:32] you should be able to just open wireshark and see the requests [22:32] sounds like the static content would be easy to pull [22:33] arkhive: what process are you using for vinyl? [22:34] My Dad and I are using his really nice record player and computer with a good soundcard. Hooked up [22:35] We have thousands to go. [22:35] He is into digitizing music and video, too. [22:36] balrog_: is there a better method/way to do it/get better results/higher quality [22:36] arkhive: I hope you have a preamplifier in between. [22:36] Yep. [22:36] Don't remember brand. I'm sure it's a good one. [22:37] hold on [22:37] good cartridge too? Then you're set. Many people recommend recording at 24bit/96khz [22:37] also clean the records first, unless they're "new" [22:38] the nitty gritty is not bad, but it's not too cheap [22:38] Yeah to both. And is there an easy, automated way to clean up the static/artifacts or whatever it's called that you get when recording them? [22:38] iZotope RX Advanced is the best software for that. [22:39] oh, what software are you using to record? [22:39] static is a sign of old / heavily-played records, or a bad / improperly balanced cartridge. [22:40] No I mean like the extra small sounds in the background. [22:40] hm like what? [22:41] (I don't even listen to music, my dad is showing me this stuff, so i'm still learning) [22:41] Like the sound when you first put the needle on the record [22:42] (I'm waiting for the final version of discferret to dump, dump, dump :P ) [22:46] Oh, Audacity [22:47] on Windows? be careful, it doesn't necessarily record all 24-bits [22:47] Should I switch to iZotope RX then? [22:47] Yeah windows. [22:47] no, iZotope is a sound cleaner, not a recording program [22:47] Oh [22:47] here's a nightly build of Audacity with ASIO support you can use: http://blankw.cerise.feralhosting.com/Audacity-ASIO/audacity-win-2.0.4-alpha-Feb-22-2013.exe [22:48] be sure to use ASIO [22:55] k