#archiveteam 2013-03-20,Wed

↑back Search

Time Nickname Message
01:21 🔗 nebopolis should I set the warrior to yahoo or leave it on archiveteam's choice?
01:22 🔗 wp494 yahoo would get you banned almost instantly the last time I checked
01:40 🔗 nebopolis yep, rate limited in ~5min
01:40 🔗 nebopolis the question is, is it worth it to keep it on yahoo given the short deadline?
01:41 🔗 adamcaudi Yahoo is going slowly - the more people on it, the more we'll save before it's over
01:42 🔗 nebopolis in that case I'll leave it going
04:25 🔗 Gurl46 http://adfoc.us/13353922321031
04:26 🔗 wp494 yep, sounds exactly like a spambot
05:13 🔗 tyn Can I ask what the icon by some people's nicknames on the tracker means?
05:15 🔗 DFJustin downloading with the warrior as opposed to installing the scripts yourself
05:16 🔗 tyn Ah. Cool, thanks.
05:17 🔗 tyn And can I ask how bad the yahoo situation is? Are all the items on the tracker?
05:22 🔗 omf_ 19gb of 109gb of the 4chandata downloaded. This is going to take a few days
05:24 🔗 wp494 tyn: the last I checked, pretty bad. the last time I ran the YM project, I got banned quite quickly
05:24 🔗 wp494 and I heard there were ~12.7M threads
05:24 🔗 omf_ About cleaning up the wiki. Are we just going to lock the old pages? Maybe we should move them into a static site on github so the data is still available but not in the wiki directly
05:25 🔗 wp494 I only see ~11.6K threads
05:33 🔗 DFJustin the units on the yahoo tracker are whole subforums
05:33 🔗 DFJustin or at least the forums- ones anyway
06:17 🔗 SketchCow today I found out I was overlooking a header in s3 efforts called "size-hint"
06:17 🔗 SketchCow And that it's best to throw something in there, because then it won't shove my 50gb file into a 40gb partition
06:22 🔗 SketchCow The CEO of a company that does blog and posting recording asked me to support their open access coalition.
06:22 🔗 SketchCow The coalition is them and others making sure places get access to all this crunchy SEO data.
06:22 🔗 SketchCow So I asked him instead what it'll take to get copies of all THEIR data to archive.org, with a time-shift
06:22 🔗 SketchCow Let's see what happens!
06:23 🔗 SketchCow http://spinn3r.com/ is the company
06:24 🔗 SketchCow That would be a nice end run around Google, now wouldn't it.
06:27 🔗 SketchCow ......and he said yes.
06:27 🔗 omf_ How big are we talking
06:27 🔗 omf_ they have been around since 2005
06:27 🔗 SketchCow So while other people are handwringing over Google Reader's feed loss, I do believe I just got quite a bit of data.
06:27 🔗 SketchCow They claim 200gb a day
06:27 🔗 omf_ Yeah that is sweet
06:28 🔗 omf_ I had never heard of that company before and I cannot find pricing for them on the site
06:28 🔗 SketchCow http://www.spinn3r.com/savings
06:29 🔗 SketchCow Spinn3r indexes 150GB of content per month. We maintain 18 months of archives.
06:30 🔗 SketchCow http://en.linuxreviews.org/Spinn3r
06:31 🔗 hdevalenc nice
06:32 🔗 omf_ The savings page has no real data. Just some numbers they threw up as "costs"
06:32 🔗 SketchCow Oh, I know.
06:37 🔗 omf_ The real test is when they start putting data into the IA, and how well kept it is
06:41 🔗 SketchCow "Anyway. to gain access you would have to use our client. It's in Java but pretty easy to setup. It doesn't write ARC format. It uses our own proprietary format."
06:41 🔗 SketchCow From a letter he just sent.
06:41 🔗 SketchCow Obviously, I will ask for assistance from you maniacs to split his format apart
06:41 🔗 omf_ I bet it is just fucking text csv or retard xml
06:42 🔗 omf_ I hate when companies create formats for no good reason
06:45 🔗 hdevalenc omf_: it's PROPRIETARY
06:45 🔗 hdevalenc hence, advanced
06:45 🔗 omf_ :)
06:45 🔗 hdevalenc duh
06:45 🔗 omf_ that got a good laugh out of me. I have seen sales people imply that before
06:46 🔗 hdevalenc every time I see companies advertise with the word proprietary, patented, etc, it's just like.... this is supposed to be a plus?
06:46 🔗 omf_ before it was
06:46 🔗 omf_ now people want access
06:50 🔗 omf_ We still have plenty of sites to go for #ispygames. I am handing out copy and paste wget commands to make it easier to contribute
06:50 🔗 Samuel_Mi ooh yes, gimme
06:59 🔗 SketchCow Oh, wow.
06:59 🔗 SketchCow They only have the last 60 days of blogs.
06:59 🔗 SketchCow They deleted the rest.
06:59 🔗 SketchCow Now that's a shame.
07:03 🔗 omf_ How the fuck is that useful for long term analytics?
07:03 🔗 omf_ That is one of their boasting points
07:10 🔗 hdevalenc adhdlytics
07:13 🔗 omf_ Oh I keep forgetting about that field.
07:13 🔗 hdevalenc omf_: the internet moves fast, and if your company doesn't give me money
07:14 🔗 hdevalenc YOU'LL BE LEFT BEHIND
07:14 🔗 hdevalenc look a chart
07:14 🔗 omf_ http://nooooooooooooooo.com/
07:29 🔗 SketchCow 454447.8 / 583096.0 MB Rate: 25062.7 / 2416.1 KB Uploaded: 2478394.0 MB [77%] 0d 15:08 [ R: 5.45]
07:29 🔗 SketchCow InternetCensus2012
07:29 🔗 SketchCow Only 15 hours left!
07:29 🔗 godane hey SketchCow
07:29 🔗 SketchCow hey
07:29 🔗 godane i got december's episodes of wilkow uploaded
07:30 🔗 SketchCow Excellent
07:30 🔗 godane you guys are not lose that
07:30 🔗 godane trying to uploaded before backing up to bluray
07:33 🔗 godane most of jan 2013 episodes of wilkow are going up too
07:33 🔗 godane i'm only upload up to jan 25 cause that all i could get up to with this bluray back up
07:34 🔗 godane also we really need to get people working on 400TB like bluray
07:35 🔗 godane i say that cause it could last as long as cds did if they can make it onto the market in the next 3 to 5 years
08:43 🔗 soultcer SketchCow: Will you put the internet census up on IA?
09:58 🔗 SketchCow Yes
09:58 🔗 SketchCow I will need to split it up to a few items.
10:05 🔗 SketchCow The fun continues.
10:05 🔗 SketchCow 478098.4 / 583096.0 MB Rate: 24641.8 / 3219.7 KB Uploaded: 2656925.3 MB [81%] 0d 9:16 [ R: 5.56]
10:05 🔗 SketchCow InternetCensus2012
10:06 🔗 C-Keen that thing is awesome
10:11 🔗 GLaDOS Quick, everyone join #archiveteam
10:11 🔗 GLaDOS Erm, #archivist
10:55 🔗 SketchCow p.s. on the side, I'm backing up the CD-ROMs
11:34 🔗 omf_ 76gb left on the 4data
20:18 🔗 SketchCow ha, this torrent has uploaded 3.4 terabytes to others.
20:18 🔗 SketchCow I think I'll leave it running for a while afterwards. It obviously needs the seeds.
20:18 🔗 alard SketchCow: Are you finished with punchfork?
20:19 🔗 Smiley SketchCow: the geocities seeds appently disappeared
20:19 🔗 Smiley GLaDOS: was trying to get a copy to keep it seeded and then he said the last one disappeared.
20:20 🔗 soultcer I assume it would be possible to add the IA as webseed for the torrent
20:21 🔗 Smiley we assumed they were the last seed.
20:21 🔗 Smiley :/
20:24 🔗 Smiley problem now is there is no seeders to get a copy from
20:25 🔗 soultcer Well assuming your client supports web seeds it's just a matter of loading an updated torrent file
20:30 🔗 SketchCow alard: NEARLY done with punchfork.
20:31 🔗 alard SketchCow: Ah. That's good to know, thanks. I'll wait with the index-making.
20:31 🔗 alard The yahoo-blogs done?
20:32 🔗 SketchCow root@teamarchive-1:/1/ALARD/warrior# du -sh punch*
20:32 🔗 SketchCow 130G punchfork-user
20:32 🔗 SketchCow 49G punchfork-date
20:33 🔗 SketchCow So, the user and date ones are not done.
20:33 🔗 SketchCow Wait, date is done.
20:41 🔗 urgato hi, there doesn't happen to be a possibility to resume my archiveteam-warrior where it left off (I would like to turn off my computer but the current job isn't finished yet)
20:41 🔗 chronomex you can usually suspend a virtual machine, what are you running it in?
20:42 🔗 urgato in virtualbox, will the resuming work?
20:44 🔗 urgato i.e. is the "WgetDownload" part fault resilient enough, to resume even if I will have another ip then
20:45 🔗 SketchCow adding: punchfork-userpages-1/ (stored 0%)
20:45 🔗 SketchCow adding: punchfork-userpages-1/punchfork.com-user-Astorga-20130217-084553.zip (deflated 21%)
20:45 🔗 SketchCow adding: punchfork-userpages-1/punchfork.com-user-amanda467-20130220-070146.zip (deflated 22%)
20:45 🔗 SketchCow adding: punchfork-userpages-1/punchfork.com-user-KrystlF-20130303-171612.zip (deflated 20%)
20:45 🔗 SketchCow adding: punchfork-userpages-1/punchfork.com-user-Trubby-20130222-231046.zip (deflated 18%)
20:45 🔗 SketchCow That's a little odd, alard: I'm zipping the zips and getting 20% reduction?
20:47 🔗 Smiley urgato: yeah it won't know.
20:47 🔗 Smiley So it *should* work, don't worry if it doesn't though
20:47 🔗 Smiley we have a track of which users have completed.
20:47 🔗 no2pencil SketchCow: If I am not mistaken, you did a talk on digital foot print (via Twitter & Facebook) at HOPE a few years ago?
20:47 🔗 no2pencil or was this someone else?
20:49 🔗 urgato Smiley: okay thanks, I wouldn't worry, it just would be a bit of a waste to just discard 6000 items/~400MB, that's why I want to resume
20:49 🔗 Smiley i know that feeling.
20:50 🔗 Smiley urgato: just important to pause/suspend the warrior.
20:50 🔗 Smiley not reboot it.
20:50 🔗 urgato yes, thanks
20:54 🔗 alard SketchCow: That's not strange, I think. ZIP compresses each file separately, so if you compress it again (as one file) there'll be more duplication.
21:00 🔗 alard It's also possible that the Python zipfile library didn't compress anything.
21:59 🔗 Alek Hey how do I see the archive for http://repo.opensolaris.org/ ?
22:25 🔗 arkhive I have a question and maybe want to start a discussion. I was thinking that since HD-DVDs had a feature similar to Blu-ray's BD-Live known as HDi Advanced Content. And that HD-DVD lost the 'format war.' if there was a way to wget/download the sites/content and preserve it so one day when the HDi content for every movie is gone, a user can still somehow...
22:25 🔗 arkhive access it.
22:26 🔗 arkhive If the HDi content is still up in the first place.
22:26 🔗 arkhive The project/task sounds ambitious probably, but just a thought/idea
22:27 🔗 arkhive Here are some links: http://en.wikipedia.org/wiki/Advanced_Content
22:27 🔗 arkhive http://en.wikipedia.org/wiki/HDi_(interactivity)
22:27 🔗 arkhive But what do you think?
22:31 🔗 arkhive I have a Toshiba Player along with an Xbox 360 HD-DVD that supports HDi Advanced Content. And will eventually buy a HD-DVD drive for my PC to rip the discs. Right now though, I am ripping all my Dad's Vinyl.
22:32 🔗 S[h]O[r]T you should be able to just open wireshark and see the requests
22:32 🔗 S[h]O[r]T sounds like the static content would be easy to pull
22:33 🔗 balrog_ arkhive: what process are you using for vinyl?
22:34 🔗 arkhive My Dad and I are using his really nice record player and computer with a good soundcard. Hooked up
22:35 🔗 arkhive We have thousands to go.
22:35 🔗 arkhive He is into digitizing music and video, too.
22:36 🔗 arkhive balrog_: is there a better method/way to do it/get better results/higher quality
22:36 🔗 balrog_ arkhive: I hope you have a preamplifier in between.
22:36 🔗 arkhive Yep.
22:36 🔗 arkhive Don't remember brand. I'm sure it's a good one.
22:37 🔗 arkhive hold on
22:37 🔗 balrog_ good cartridge too? Then you're set. Many people recommend recording at 24bit/96khz
22:37 🔗 balrog_ also clean the records first, unless they're "new"
22:38 🔗 balrog_ the nitty gritty is not bad, but it's not too cheap
22:38 🔗 arkhive Yeah to both. And is there an easy, automated way to clean up the static/artifacts or whatever it's called that you get when recording them?
22:38 🔗 balrog_ iZotope RX Advanced is the best software for that.
22:39 🔗 balrog_ oh, what software are you using to record?
22:39 🔗 balrog_ static is a sign of old / heavily-played records, or a bad / improperly balanced cartridge.
22:40 🔗 arkhive No I mean like the extra small sounds in the background.
22:40 🔗 balrog_ hm like what?
22:41 🔗 arkhive (I don't even listen to music, my dad is showing me this stuff, so i'm still learning)
22:41 🔗 arkhive Like the sound when you first put the needle on the record
22:42 🔗 arkhive (I'm waiting for the final version of discferret to dump, dump, dump :P )
22:46 🔗 arkhive Oh, Audacity
22:47 🔗 balrog_ on Windows? be careful, it doesn't necessarily record all 24-bits
22:47 🔗 arkhive Should I switch to iZotope RX then?
22:47 🔗 arkhive Yeah windows.
22:47 🔗 balrog_ no, iZotope is a sound cleaner, not a recording program
22:47 🔗 arkhive Oh
22:47 🔗 balrog_ here's a nightly build of Audacity with ASIO support you can use: http://blankw.cerise.feralhosting.com/Audacity-ASIO/audacity-win-2.0.4-alpha-Feb-22-2013.exe
22:48 🔗 balrog_ be sure to use ASIO
22:55 🔗 arkhive k

irclogger-viewer