Time |
Nickname |
Message |
01:21
🔗
|
nebopolis |
should I set the warrior to yahoo or leave it on archiveteam's choice? |
01:22
🔗
|
wp494 |
yahoo would get you banned almost instantly the last time I checked |
01:40
🔗
|
nebopolis |
yep, rate limited in ~5min |
01:40
🔗
|
nebopolis |
the question is, is it worth it to keep it on yahoo given the short deadline? |
01:41
🔗
|
adamcaudi |
Yahoo is going slowly - the more people on it, the more we'll save before it's over |
01:42
🔗
|
nebopolis |
in that case I'll leave it going |
04:25
🔗
|
Gurl46 |
http://adfoc.us/13353922321031 |
04:26
🔗
|
wp494 |
yep, sounds exactly like a spambot |
05:13
🔗
|
tyn |
Can I ask what the icon by some people's nicknames on the tracker means? |
05:15
🔗
|
DFJustin |
downloading with the warrior as opposed to installing the scripts yourself |
05:16
🔗
|
tyn |
Ah. Cool, thanks. |
05:17
🔗
|
tyn |
And can I ask how bad the yahoo situation is? Are all the items on the tracker? |
05:22
🔗
|
omf_ |
19gb of 109gb of the 4chandata downloaded. This is going to take a few days |
05:24
🔗
|
wp494 |
tyn: the last I checked, pretty bad. the last time I ran the YM project, I got banned quite quickly |
05:24
🔗
|
wp494 |
and I heard there were ~12.7M threads |
05:24
🔗
|
omf_ |
About cleaning up the wiki. Are we just going to lock the old pages? Maybe we should move them into a static site on github so the data is still available but not in the wiki directly |
05:25
🔗
|
wp494 |
I only see ~11.6K threads |
05:33
🔗
|
DFJustin |
the units on the yahoo tracker are whole subforums |
05:33
🔗
|
DFJustin |
or at least the forums- ones anyway |
06:17
🔗
|
SketchCow |
today I found out I was overlooking a header in s3 efforts called "size-hint" |
06:17
🔗
|
SketchCow |
And that it's best to throw something in there, because then it won't shove my 50gb file into a 40gb partition |
06:22
🔗
|
SketchCow |
The CEO of a company that does blog and posting recording asked me to support their open access coalition. |
06:22
🔗
|
SketchCow |
The coalition is them and others making sure places get access to all this crunchy SEO data. |
06:22
🔗
|
SketchCow |
So I asked him instead what it'll take to get copies of all THEIR data to archive.org, with a time-shift |
06:22
🔗
|
SketchCow |
Let's see what happens! |
06:23
🔗
|
SketchCow |
http://spinn3r.com/ is the company |
06:24
🔗
|
SketchCow |
That would be a nice end run around Google, now wouldn't it. |
06:27
🔗
|
SketchCow |
......and he said yes. |
06:27
🔗
|
omf_ |
How big are we talking |
06:27
🔗
|
omf_ |
they have been around since 2005 |
06:27
🔗
|
SketchCow |
So while other people are handwringing over Google Reader's feed loss, I do believe I just got quite a bit of data. |
06:27
🔗
|
SketchCow |
They claim 200gb a day |
06:27
🔗
|
omf_ |
Yeah that is sweet |
06:28
🔗
|
omf_ |
I had never heard of that company before and I cannot find pricing for them on the site |
06:28
🔗
|
SketchCow |
http://www.spinn3r.com/savings |
06:29
🔗
|
SketchCow |
Spinn3r indexes 150GB of content per month. We maintain 18 months of archives. |
06:30
🔗
|
SketchCow |
http://en.linuxreviews.org/Spinn3r |
06:31
🔗
|
hdevalenc |
nice |
06:32
🔗
|
omf_ |
The savings page has no real data. Just some numbers they threw up as "costs" |
06:32
🔗
|
SketchCow |
Oh, I know. |
06:37
🔗
|
omf_ |
The real test is when they start putting data into the IA, and how well kept it is |
06:41
🔗
|
SketchCow |
"Anyway. to gain access you would have to use our client. It's in Java but pretty easy to setup. It doesn't write ARC format. It uses our own proprietary format." |
06:41
🔗
|
SketchCow |
From a letter he just sent. |
06:41
🔗
|
SketchCow |
Obviously, I will ask for assistance from you maniacs to split his format apart |
06:41
🔗
|
omf_ |
I bet it is just fucking text csv or retard xml |
06:42
🔗
|
omf_ |
I hate when companies create formats for no good reason |
06:45
🔗
|
hdevalenc |
omf_: it's PROPRIETARY |
06:45
🔗
|
hdevalenc |
hence, advanced |
06:45
🔗
|
omf_ |
:) |
06:45
🔗
|
hdevalenc |
duh |
06:45
🔗
|
omf_ |
that got a good laugh out of me. I have seen sales people imply that before |
06:46
🔗
|
hdevalenc |
every time I see companies advertise with the word proprietary, patented, etc, it's just like.... this is supposed to be a plus? |
06:46
🔗
|
omf_ |
before it was |
06:46
🔗
|
omf_ |
now people want access |
06:50
🔗
|
omf_ |
We still have plenty of sites to go for #ispygames. I am handing out copy and paste wget commands to make it easier to contribute |
06:50
🔗
|
Samuel_Mi |
ooh yes, gimme |
06:59
🔗
|
SketchCow |
Oh, wow. |
06:59
🔗
|
SketchCow |
They only have the last 60 days of blogs. |
06:59
🔗
|
SketchCow |
They deleted the rest. |
06:59
🔗
|
SketchCow |
Now that's a shame. |
07:03
🔗
|
omf_ |
How the fuck is that useful for long term analytics? |
07:03
🔗
|
omf_ |
That is one of their boasting points |
07:10
🔗
|
hdevalenc |
adhdlytics |
07:13
🔗
|
omf_ |
Oh I keep forgetting about that field. |
07:13
🔗
|
hdevalenc |
omf_: the internet moves fast, and if your company doesn't give me money |
07:14
🔗
|
hdevalenc |
YOU'LL BE LEFT BEHIND |
07:14
🔗
|
hdevalenc |
look a chart |
07:14
🔗
|
omf_ |
http://nooooooooooooooo.com/ |
07:29
🔗
|
SketchCow |
454447.8 / 583096.0 MB Rate: 25062.7 / 2416.1 KB Uploaded: 2478394.0 MB [77%] 0d 15:08 [ R: 5.45] |
07:29
🔗
|
SketchCow |
InternetCensus2012 |
07:29
🔗
|
SketchCow |
Only 15 hours left! |
07:29
🔗
|
godane |
hey SketchCow |
07:29
🔗
|
SketchCow |
hey |
07:29
🔗
|
godane |
i got december's episodes of wilkow uploaded |
07:30
🔗
|
SketchCow |
Excellent |
07:30
🔗
|
godane |
you guys are not lose that |
07:30
🔗
|
godane |
trying to uploaded before backing up to bluray |
07:33
🔗
|
godane |
most of jan 2013 episodes of wilkow are going up too |
07:33
🔗
|
godane |
i'm only upload up to jan 25 cause that all i could get up to with this bluray back up |
07:34
🔗
|
godane |
also we really need to get people working on 400TB like bluray |
07:35
🔗
|
godane |
i say that cause it could last as long as cds did if they can make it onto the market in the next 3 to 5 years |
08:43
🔗
|
soultcer |
SketchCow: Will you put the internet census up on IA? |
09:58
🔗
|
SketchCow |
Yes |
09:58
🔗
|
SketchCow |
I will need to split it up to a few items. |
10:05
🔗
|
SketchCow |
The fun continues. |
10:05
🔗
|
SketchCow |
478098.4 / 583096.0 MB Rate: 24641.8 / 3219.7 KB Uploaded: 2656925.3 MB [81%] 0d 9:16 [ R: 5.56] |
10:05
🔗
|
SketchCow |
InternetCensus2012 |
10:06
🔗
|
C-Keen |
that thing is awesome |
10:11
🔗
|
GLaDOS |
Quick, everyone join #archiveteam |
10:11
🔗
|
GLaDOS |
Erm, #archivist |
10:55
🔗
|
SketchCow |
p.s. on the side, I'm backing up the CD-ROMs |
11:34
🔗
|
omf_ |
76gb left on the 4data |
20:18
🔗
|
SketchCow |
ha, this torrent has uploaded 3.4 terabytes to others. |
20:18
🔗
|
SketchCow |
I think I'll leave it running for a while afterwards. It obviously needs the seeds. |
20:18
🔗
|
alard |
SketchCow: Are you finished with punchfork? |
20:19
🔗
|
Smiley |
SketchCow: the geocities seeds appently disappeared |
20:19
🔗
|
Smiley |
GLaDOS: was trying to get a copy to keep it seeded and then he said the last one disappeared. |
20:20
🔗
|
soultcer |
I assume it would be possible to add the IA as webseed for the torrent |
20:21
🔗
|
Smiley |
we assumed they were the last seed. |
20:21
🔗
|
Smiley |
:/ |
20:24
🔗
|
Smiley |
problem now is there is no seeders to get a copy from |
20:25
🔗
|
soultcer |
Well assuming your client supports web seeds it's just a matter of loading an updated torrent file |
20:30
🔗
|
SketchCow |
alard: NEARLY done with punchfork. |
20:31
🔗
|
alard |
SketchCow: Ah. That's good to know, thanks. I'll wait with the index-making. |
20:31
🔗
|
alard |
The yahoo-blogs done? |
20:32
🔗
|
SketchCow |
root@teamarchive-1:/1/ALARD/warrior# du -sh punch* |
20:32
🔗
|
SketchCow |
130G punchfork-user |
20:32
🔗
|
SketchCow |
49G punchfork-date |
20:33
🔗
|
SketchCow |
So, the user and date ones are not done. |
20:33
🔗
|
SketchCow |
Wait, date is done. |
20:41
🔗
|
urgato |
hi, there doesn't happen to be a possibility to resume my archiveteam-warrior where it left off (I would like to turn off my computer but the current job isn't finished yet) |
20:41
🔗
|
chronomex |
you can usually suspend a virtual machine, what are you running it in? |
20:42
🔗
|
urgato |
in virtualbox, will the resuming work? |
20:44
🔗
|
urgato |
i.e. is the "WgetDownload" part fault resilient enough, to resume even if I will have another ip then |
20:45
🔗
|
SketchCow |
adding: punchfork-userpages-1/ (stored 0%) |
20:45
🔗
|
SketchCow |
adding: punchfork-userpages-1/punchfork.com-user-Astorga-20130217-084553.zip (deflated 21%) |
20:45
🔗
|
SketchCow |
adding: punchfork-userpages-1/punchfork.com-user-amanda467-20130220-070146.zip (deflated 22%) |
20:45
🔗
|
SketchCow |
adding: punchfork-userpages-1/punchfork.com-user-KrystlF-20130303-171612.zip (deflated 20%) |
20:45
🔗
|
SketchCow |
adding: punchfork-userpages-1/punchfork.com-user-Trubby-20130222-231046.zip (deflated 18%) |
20:45
🔗
|
SketchCow |
That's a little odd, alard: I'm zipping the zips and getting 20% reduction? |
20:47
🔗
|
Smiley |
urgato: yeah it won't know. |
20:47
🔗
|
Smiley |
So it *should* work, don't worry if it doesn't though |
20:47
🔗
|
Smiley |
we have a track of which users have completed. |
20:47
🔗
|
no2pencil |
SketchCow: If I am not mistaken, you did a talk on digital foot print (via Twitter & Facebook) at HOPE a few years ago? |
20:47
🔗
|
no2pencil |
or was this someone else? |
20:49
🔗
|
urgato |
Smiley: okay thanks, I wouldn't worry, it just would be a bit of a waste to just discard 6000 items/~400MB, that's why I want to resume |
20:49
🔗
|
Smiley |
i know that feeling. |
20:50
🔗
|
Smiley |
urgato: just important to pause/suspend the warrior. |
20:50
🔗
|
Smiley |
not reboot it. |
20:50
🔗
|
urgato |
yes, thanks |
20:54
🔗
|
alard |
SketchCow: That's not strange, I think. ZIP compresses each file separately, so if you compress it again (as one file) there'll be more duplication. |
21:00
🔗
|
alard |
It's also possible that the Python zipfile library didn't compress anything. |
21:59
🔗
|
Alek |
Hey how do I see the archive for http://repo.opensolaris.org/ ? |
22:25
🔗
|
arkhive |
I have a question and maybe want to start a discussion. I was thinking that since HD-DVDs had a feature similar to Blu-ray's BD-Live known as HDi Advanced Content. And that HD-DVD lost the 'format war.' if there was a way to wget/download the sites/content and preserve it so one day when the HDi content for every movie is gone, a user can still somehow... |
22:25
🔗
|
arkhive |
access it. |
22:26
🔗
|
arkhive |
If the HDi content is still up in the first place. |
22:26
🔗
|
arkhive |
The project/task sounds ambitious probably, but just a thought/idea |
22:27
🔗
|
arkhive |
Here are some links: http://en.wikipedia.org/wiki/Advanced_Content |
22:27
🔗
|
arkhive |
http://en.wikipedia.org/wiki/HDi_(interactivity) |
22:27
🔗
|
arkhive |
But what do you think? |
22:31
🔗
|
arkhive |
I have a Toshiba Player along with an Xbox 360 HD-DVD that supports HDi Advanced Content. And will eventually buy a HD-DVD drive for my PC to rip the discs. Right now though, I am ripping all my Dad's Vinyl. |
22:32
🔗
|
S[h]O[r]T |
you should be able to just open wireshark and see the requests |
22:32
🔗
|
S[h]O[r]T |
sounds like the static content would be easy to pull |
22:33
🔗
|
balrog_ |
arkhive: what process are you using for vinyl? |
22:34
🔗
|
arkhive |
My Dad and I are using his really nice record player and computer with a good soundcard. Hooked up |
22:35
🔗
|
arkhive |
We have thousands to go. |
22:35
🔗
|
arkhive |
He is into digitizing music and video, too. |
22:36
🔗
|
arkhive |
balrog_: is there a better method/way to do it/get better results/higher quality |
22:36
🔗
|
balrog_ |
arkhive: I hope you have a preamplifier in between. |
22:36
🔗
|
arkhive |
Yep. |
22:36
🔗
|
arkhive |
Don't remember brand. I'm sure it's a good one. |
22:37
🔗
|
arkhive |
hold on |
22:37
🔗
|
balrog_ |
good cartridge too? Then you're set. Many people recommend recording at 24bit/96khz |
22:37
🔗
|
balrog_ |
also clean the records first, unless they're "new" |
22:38
🔗
|
balrog_ |
the nitty gritty is not bad, but it's not too cheap |
22:38
🔗
|
arkhive |
Yeah to both. And is there an easy, automated way to clean up the static/artifacts or whatever it's called that you get when recording them? |
22:38
🔗
|
balrog_ |
iZotope RX Advanced is the best software for that. |
22:39
🔗
|
balrog_ |
oh, what software are you using to record? |
22:39
🔗
|
balrog_ |
static is a sign of old / heavily-played records, or a bad / improperly balanced cartridge. |
22:40
🔗
|
arkhive |
No I mean like the extra small sounds in the background. |
22:40
🔗
|
balrog_ |
hm like what? |
22:41
🔗
|
arkhive |
(I don't even listen to music, my dad is showing me this stuff, so i'm still learning) |
22:41
🔗
|
arkhive |
Like the sound when you first put the needle on the record |
22:42
🔗
|
arkhive |
(I'm waiting for the final version of discferret to dump, dump, dump :P ) |
22:46
🔗
|
arkhive |
Oh, Audacity |
22:47
🔗
|
balrog_ |
on Windows? be careful, it doesn't necessarily record all 24-bits |
22:47
🔗
|
arkhive |
Should I switch to iZotope RX then? |
22:47
🔗
|
arkhive |
Yeah windows. |
22:47
🔗
|
balrog_ |
no, iZotope is a sound cleaner, not a recording program |
22:47
🔗
|
arkhive |
Oh |
22:47
🔗
|
balrog_ |
here's a nightly build of Audacity with ASIO support you can use: http://blankw.cerise.feralhosting.com/Audacity-ASIO/audacity-win-2.0.4-alpha-Feb-22-2013.exe |
22:48
🔗
|
balrog_ |
be sure to use ASIO |
22:55
🔗
|
arkhive |
k |