#archiveteam 2011-10-17,Mon

↑back Search

Time Nickname Message
01:20 🔗 human39 neat, I found an undeveloped roll of film
01:20 🔗 human39 (in this box, not mine)
01:20 🔗 Coderjoe uh
01:21 🔗 Coderjoe I hope it was not exposed to light or anything
01:21 🔗 human39 well, it's mine.
01:21 🔗 human39 now
01:21 🔗 human39 na, it's been in the container
01:21 🔗 Coderjoe (aside from actually taking the pictures)
01:21 🔗 Coderjoe oh. you said roll not reel
01:22 🔗 Coderjoe easier to tell with rolls
01:22 🔗 human39 yeah
01:22 🔗 human39 I wonder if it's worth getting developed. Hope this guy wasn't into weird stuff.
01:27 🔗 underscor Yeah, EFNet
01:27 🔗 underscor Fuck you too
01:27 🔗 underscor alard: Absolutely
01:30 🔗 Coderjoe mmm
01:30 🔗 Coderjoe DCI... talking around 1.5TB for a single 100-minute movie, with only one 8-channel soundtrack (at 96k)
02:43 🔗 Coderjoe lachlan mirror still chugging. at 3.7G
02:46 🔗 chronomex that's quite the website
03:24 🔗 SketchCow Back
03:42 🔗 underscor wb
03:42 🔗 underscor :>
05:24 🔗 chronomex today my work as an archivist involes simulating a tape read circuit to decode bits off a data tape image recorded with audio gear
05:24 🔗 chronomex just in case you guys thought I was slacking :)
05:26 🔗 balrog ooh, wow. what's this for?
05:29 🔗 chronomex http://xrtc.net/f/phreak/3ess.shtml <-- this machine, a 1973 computer welded to a telephone switch, has bad tape carts.
05:29 🔗 chronomex solution: replace tape drive with something solid-state
05:29 🔗 chronomex tape drive is in center above teletype, the thing with the round sticker on
05:30 🔗 chronomex have to replace tape drive to run diagnostics
05:31 🔗 chronomex have to run diagnostics to figure out what's wrong with the offline processor
05:31 🔗 chronomex have to fix the offline processor to run code on the machine safely
05:31 🔗 chronomex have to run code on the machine to do a backup
05:31 🔗 chronomex have to do a backup before rebooting
05:31 🔗 chronomex have to reboot because that will probably clear some stuck trouble that's been plaguing it since 1998 at least
05:32 🔗 chronomex yeah ... it was last booted in 1992
05:33 🔗 chronomex that view is the operator console side; the machine is two of those lineups - the second is the switching network and stuff
05:35 🔗 chronomex I want to strangle the fucker that decided that 1/4" tape cartridges are better than open-reel tape
05:36 🔗 chronomex STRANGLE you hear me
05:52 🔗 SketchCow Yeah
05:52 🔗 SketchCow batcave went south, can't get anyone to reset.
05:53 🔗 SketchCow So heartbroken, I know
05:53 🔗 chronomex D:
06:19 🔗 SketchCow http://www.freshdv.com/wp-content/uploads/2011/10/hurlbut-letus-41.jpg
06:19 🔗 SketchCow What a way to jizz up a perfectly fine DSLR
06:20 🔗 chronomex wow that's a lot of shit to bolt onto a dslr
06:21 🔗 bbot_ wow
06:21 🔗 bbot_ I count... four different handles?
06:55 🔗 SketchCow http://www.archive.org/search.php?query=collection%3Aarchiveteam-yahoovideo&sort=-publicdate
06:55 🔗 SketchCow Back in business.
06:55 🔗 chronomex speaking of video: http://ia700209.us.archive.org/6/items/dicksonfilmtwo/DicksonFilm_High_512kb.mp4
06:55 🔗 chronomex cool shit
07:02 🔗 SketchCow Yeah, going to let those go
07:02 🔗 SketchCow And get some rest, then back up
07:02 🔗 SketchCow There's so much stuff uploading now, the machine's finally emptying out
07:07 🔗 SketchCow Oh, and I found the artist for the archiveteam t-shirt and poster
07:10 🔗 chronomex oh?
07:34 🔗 Ymgve Dicks On Film?
07:34 🔗 Ymgve documentary about chatroulette?
07:34 🔗 Coderjoe ah. that explains the rsync troubles
07:45 🔗 Ymgve daamn: http://popc64.blogspot.com/
07:48 🔗 Coderjoe lachlan mirror still underway, at 4.2G
11:10 🔗 underscor chronomex: http://www.myspace.com/pagefault D:
11:10 🔗 underscor hahahaha
11:18 🔗 SketchCow Morning, probably need to sleep a tad
11:18 🔗 SketchCow But the batcave now has 12tb free
11:19 🔗 SketchCow So we have a lot of room again.
11:36 🔗 alard SketchCow: The scripts for me.com/mac.com are more or less working now, so that would be a way to get new things to fill it with.
11:36 🔗 SketchCow Excellent.
11:36 🔗 SketchCow So, we should talk about that.
11:37 🔗 SketchCow The number one thing besides making stuff be in a way the wayback machine can accept, when possible, is to have ways to package this crap up into units I can use to upload again.
11:37 🔗 alard Yes, probably have a look at the results as well.
11:37 🔗 SketchCow I'm starting down the google groups stuff, and oh man, this is going to take it forever.
11:38 🔗 ersi Did wayback successfully swallow the earlier warc-files btw?
11:38 🔗 SketchCow They've been doing lots of runs against them.
11:38 🔗 SketchCow I don't know how many are fully in but that work is being done.
11:38 🔗 alard MobileMe works with usernames, so there's not an easy way to group it into numbered chunks. (And the full list of usernames is not yet available.)
11:38 🔗 ersi So that's a yes?
11:39 🔗 SketchCow I am pretty sure it's a yes.
11:39 🔗 ersi Awesome, to 11
11:39 🔗 alard Even the wget-warc ones? That's good news.
11:41 🔗 SketchCow So, I asked archive team to back up a site.
11:41 🔗 SketchCow Someone came out and said he was doing it, but he got me nervous because he basically said "their robots.txt is blocking the images!"
11:42 🔗 SketchCow Which is like a private detective saying "and then they walked into a building that said no tresspassers!"
11:42 🔗 SketchCow 11:31 <bearh> I have the backup of csoon.com
11:42 🔗 SketchCow 11:45 <bearh> And i'm kinda unsure where to upload it.
11:42 🔗 SketchCow So, I'd like someone else to do it.
11:42 🔗 SketchCow It's not that large.
11:42 🔗 SketchCow But it's fucking hilarious.
11:42 🔗 SketchCow Died in 2000.
11:42 🔗 alard Heh. (Already did it, yesterday. Look in batcave. :)
11:42 🔗 SketchCow Been there ever since.
11:42 🔗 SketchCow Good deal, thanks.
11:43 🔗 SketchCow They're right, that's like finding an untouched dinosaur fossil
11:44 🔗 SketchCow I found another amazing site
11:44 🔗 SketchCow Collections of old department stores
11:45 🔗 SketchCow http://departmentstoremuseum.blogspot.com/
11:46 🔗 SketchCow http://departmentstoremuseum.blogspot.com/2010/06/may-co-cleveland-ohio.html
11:46 🔗 SketchCow That is a lot of crazy work
11:46 🔗 SketchCow I also had a nice long chat with the head of the CULINARY CURATION GROUP OF THE NEW YORK PUBLIC LIBRARY
11:46 🔗 SketchCow Try THAT for crazy
11:46 🔗 SketchCow http://legacy.www.nypl.org/research/chss/grd/resguides/menus/
11:57 🔗 SketchCow http://batcave.textfiles.com/ocrcount/ <--- You can see how long batcave was in the shitter
12:00 🔗 ersi was that, ocr jobs that were running on batcave? :o
12:08 🔗 SketchCow No.
12:09 🔗 SketchCow This was me tracking a limit imposed on my ingestion.
12:09 🔗 ersi Ah, alrighty
12:09 🔗 SketchCow I was using a method that worked fine but was hard on the structure
12:09 🔗 SketchCow And got into a fight over that
12:09 🔗 SketchCow Part of it was "you shouldn't use that method if there's more than 200 jobs in queue"
12:09 🔗 SketchCow Now, over time, that's not going to matter, i.e., a queue will be made that DOESN'T hold the job in queue on the machine, but just generally.
12:10 🔗 SketchCow But this was me seeing "So, does it EVER go below 200 or should I even watch"
12:10 🔗 SketchCow Answer: Yes
12:10 🔗 ersi And bam, you started filling it up gradually instead of appending to an ever increasing derive queue? :)
12:10 🔗 SketchCow Fuck no
12:11 🔗 SketchCow I slammed that shit up to max
12:11 🔗 ersi Then what was the point of that tracking?
12:11 🔗 SketchCow To no if I was being lied to
12:11 🔗 SketchCow I was not specifically being lied to
12:11 🔗 ersi ah
12:12 🔗 SketchCow Any time you see me mention interacting with other human beings, ask yourself "So, what's the most hostile interpretation as to why Jason is doing this"
12:12 🔗 SketchCow It'll save you time
12:12 🔗 SketchCow "Hey, guys, I went out to eat"
12:12 🔗 SketchCow Meaning: I got banned from a new diner
12:13 🔗 ersi Already known for.. long :)
12:13 🔗 SketchCow Apparently you forgot, twerp!
12:13 🔗 ersi Zing!
12:13 🔗 SketchCow The brutal thing coming up with yahoo video is I will be writing something that pulls down an item, does huge stats on it, then uploads again.
12:14 🔗 ersi hm, I should get going on instructables again
12:14 🔗 ersi that thing is fuckin' huge though
12:15 🔗 SketchCow It's funny for me that I now go into a directory on batcave, see it's 35gb, go "oh."
12:15 🔗 SketchCow I've put up 400gb items
12:15 🔗 SketchCow This is going to be hilarious
12:17 🔗 SketchCow http://googleblog.blogspot.com/2011/10/fall-sweep.html
12:18 🔗 SketchCow Shutting down: Code Search, Google Buzz, Jaiku, Google Labs (Immediately), University Research Program for Google Search
12:18 🔗 ersi Yeah
12:18 🔗 SketchCow Boutiques.com and like.com gone
12:19 🔗 SketchCow Code Search was critical
12:47 🔗 alard What would you like to get from the me.com/mac.com downloaders? At the moment, they produce:
12:48 🔗 alard 1. a warc.gz for web.me.com (plus xml index and log file)
12:48 🔗 alard 2. a warc.gz for homepage.mac.com (plus a log file)
12:48 🔗 alard 3. the xml feed for public.me.com, plus a copy of the file structure + the headers for each file (not warc)
12:49 🔗 alard 4. the xml feed for gallery.me.com, plus a zip file for each gallery
13:37 🔗 SketchCow Hmmm.
13:37 🔗 SketchCow I'd like all of it - what's the size differential.
13:42 🔗 alard You do get all of the content, it's just a question of in what form you'd like to get it.
13:42 🔗 alard Just a WARC or also separate files, that sort of thing.
13:45 🔗 alard Here's an example listing of what it produces now: http://pastebin.com/raw.php?i=438zhmSR
13:46 🔗 SketchCow http://vimeo.com/28173775
13:46 🔗 alard The files can get quite large (up to a 2 GB for the users I've tried so far), so I don't think it's useful to have the data in more than one form.
13:46 🔗 SketchCow I think it could be.
13:47 🔗 SketchCow WARC is so forward looking, but you can't use it for anything BUT wayback.
13:47 🔗 alard Or you have to run a WARC extractor to create the structure wget would create otherwise.
13:48 🔗 SketchCow Hmmm.
13:48 🔗 alard So you'd like to have the wget copy as well?
13:48 🔗 SketchCow Well, you know, I could see that.
13:48 🔗 alard With or without link conversion?
13:48 🔗 SketchCow Massive post-processing.
13:48 🔗 SketchCow I am fine with massive post-processing.
13:48 🔗 SketchCow So WARC might make the most sense.
13:48 🔗 SketchCow I'd like to run that against your warcs we've added already to archive.org, see how that looks.
13:48 🔗 alard It does save a lot of duplicate uploading.
13:49 🔗 SketchCow Agreed.
13:49 🔗 SketchCow And the thing with these machines I have, they suck down data at 40-80MB a second.
13:49 🔗 SketchCow So it can yank it down, rejigger, upload
13:50 🔗 alard (As a reference: the four users I have now have 3.6GB of data together. But maybe I chose the wrong examples.)
13:50 🔗 SketchCow Wow, what the hell.
13:50 🔗 SketchCow Can you link me to them?
13:50 🔗 alard http://web.me.com/sleemason/
13:51 🔗 SketchCow WARC is the way.
13:51 🔗 alard http://homepage.mac.com/ueda_daisuke/
13:52 🔗 alard http://gallery.me.com/amurnieks
13:52 🔗 balrog yeah, those.
13:52 🔗 alard (each user has something on homepage, gallery, public, web)
13:53 🔗 balrog hmm, how does WARC do it?
13:53 🔗 alard I currently make WARCs for homepage.mac.com and web.me.com.
13:53 🔗 alard For gallery.me.com I download the zip files that the server offers.
13:53 🔗 balrog ohh, http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml
13:53 🔗 alard For public.me.com I download the files.
13:53 🔗 alard balrog: Yup.
13:53 🔗 SketchCow And this is all closing June 2012?
13:54 🔗 SketchCow Are they blocking with robots.txt?
13:54 🔗 balrog SketchCow: yes, as per current info
13:54 🔗 SketchCow Sorry for not paying more attention, been dealing with data
13:54 🔗 balrog SketchCow: last I checked, no, but it's messy to parse because it uses XML and JS
13:54 🔗 balrog basically it uses JS to load the web content on many pages
13:54 🔗 balrog (from an XML file)
13:55 🔗 alard Only gallery.me.com has a robots.txt. public.me.com doesn't, but it is somewhat inaccessible to crawlers.
13:55 🔗 SketchCow Well, Jobs is dead, nobody is watching
13:55 🔗 alard homepage.mac.com has normal sites, can be crawled. web.me.com has some iWeb sites which are hard to crawl (but it's possible if you use webdav).
13:56 🔗 balrog alard: homepage.mac.com could have iWeb sites.
13:56 🔗 alard Any examples? The wayback machine doesn't have any.
13:56 🔗 balrog I should dig around, but I thought I saw some.
13:57 🔗 SketchCow But wow, we're talking a fuckton of data, aren't wee.
13:57 🔗 alard Not really sure, the gallery/public sections can get large, the web sites are somewhat smaller.
13:57 🔗 SketchCow I'm sure this is some related concept to having such intense integration of the OS and the site
13:57 🔗 balrog I'm pretty sure there are many GB of data on here.
13:57 🔗 SketchCow So people can just blow shit back and forth.
13:58 🔗 alard balrog: TB, probably.
13:58 🔗 balrog alard: I'll send you a list of homepage.mac.com pulled from my webhistory (which unfortunately doesn't go all that far back)
13:58 🔗 balrog SketchCow: what exactly are you referring to?
13:58 🔗 balrog alard: a few hundred TB, if you count all the gallery data
13:58 🔗 SketchCow I mean that the .me stuff Apple did really smoothed the process of handling data and stuff.
13:59 🔗 SketchCow Similar to what we saw with Friendster, when photo albums explode
13:59 🔗 balrog yeah, they did. they improved it with iCloud, but took away the web-facing features :[
14:01 🔗 balrog alard: hold on a moment :)
14:02 🔗 balrog alard: this is not mac.com but may be useful … http://www.wilmut.webspace.virginmedia.com/notes/webpages.html
14:02 🔗 SketchCow http://www.archive.org/details/ARCHIVETEAM-YV-9200002-9299997
14:02 🔗 SketchCow I am going to get in trouble for that one.
14:03 🔗 SketchCow There was major debate what the maximum item size should be.
14:03 🔗 SketchCow Most people agreed 100gb
14:03 🔗 balrog ooooh.
14:03 🔗 SketchCow That's 408gb
14:03 🔗 balrog why not break it up then?
14:03 🔗 SketchCow I meant to but it was in the wrong directory when an uploader script ran
14:03 🔗 SketchCow I misread it as 40gb
14:03 🔗 SketchCow I may have to yank it down and split it
14:03 🔗 balrog urgh. can you take it down?
14:03 🔗 SketchCow I am really good at yanking it, ask around
14:04 🔗 SketchCow Nothing's breaking, it just becomes harder for it to be moved around.
14:04 🔗 balrog alard: you there?
14:04 🔗 alard Yes.
14:04 🔗 balrog http://pastie.org/private/gi3mrystmzx5ogyeocapg
14:04 🔗 balrog that came out of my history
14:04 🔗 balrog not all may work though
14:04 🔗 balrog and it's short
14:04 🔗 balrog there's another db I have which I have to go through
14:05 🔗 balrog (raw sql)
14:07 🔗 SketchCow http://www.archive.org/details/ARCHIVETEAM-YV-3900000-3999999&reCache=1
14:07 🔗 SketchCow Really, 200gb is not bad for the videos from 100,000 potential userspaces
14:07 🔗 balrog isn't that a little large too?
14:07 🔗 SketchCow I am fine with 200gb
14:08 🔗 balrog alard: I'll grep this db for mac.com/me.com :p
14:08 🔗 balrog however, do you know of a regex that can be used?
14:08 🔗 alard balrog: I downloaded your list. (Though most of the users were already on my list, it seems.)
14:08 🔗 alard grep (homepage|web)\.(me|mac)\.com ?
14:09 🔗 balrog I'll get another bigger list, I just need a regex that will get the proper results
14:09 🔗 balrog yeah but this is sql
14:09 🔗 balrog it's likely to be in the middle of a line
14:09 🔗 balrog like, a forum post
14:09 🔗 alard I see. Dump all the content, feed it to grep?
14:09 🔗 balrog well yeah, I'd be working from a sql dump
14:09 🔗 balrog but there's stuff in the middle of lines
14:10 🔗 alard In that case, I repeat the previous regexp.
14:10 🔗 balrog ok ...
14:10 🔗 balrog we'll see if it works.
14:10 🔗 alard SketchCow: So I should keep it as WARCs?
14:11 🔗 SketchCow Yeah
14:11 🔗 alard What about the files public.me.com?
14:11 🔗 SketchCow As we discussed, we can make more contemporary extractions.
14:11 🔗 SketchCow All of them
14:11 🔗 SketchCow archive.org can sustain two copies, one generated from the others.
14:11 🔗 alard So don't download them separately, but download to a WARC.
14:11 🔗 SketchCow WARC ensures long-term sustaining
14:11 🔗 SketchCow This is the tradeoff, which I am fine with
14:12 🔗 SketchCow (archive.org prefers we always do WARCs, in return a fuck they do not give how much we waterfall into their serverspace)
14:12 🔗 SketchCow This from on-high
14:12 🔗 balrog alard: you mean each user in his own WARC?
14:12 🔗 alard What about the images on gallery.me.com? I currently ask Apple to produces zip files, which is really handy, but isn't WARC.
14:12 🔗 SketchCow If that's the best we can do, that's fine.
14:12 🔗 alard balrog: Yes, each user results in four WARCs.
14:12 🔗 balrog aha.
14:12 🔗 alard SketchCow: You can download the images, it just takes a little longer.
14:13 🔗 alard So if WARC is nicer, we should do WARC.
14:13 🔗 SketchCow Yes
14:13 🔗 * balrog copies over the latest .sql
14:13 🔗 SketchCow Also a mess: Our star wars forum thing
14:13 🔗 alard (Although I should look at what happens to the album structure if we do that.)
14:13 🔗 SketchCow That's what's not up
14:13 🔗 SketchCow I trust your judgement, alard.
14:14 🔗 SketchCow Now you know big daddy's preferences.
14:14 🔗 alard Heh.
14:14 🔗 SketchCow I just didn't like us shutting out the potential for contemporary users, and if post-facto conversions to items that are easier to regard is possible then I'm on board.
14:14 🔗 SketchCow Where possible, WARC is what the "legit" sites like
14:14 🔗 balrog alard: what's used to dump sites as WARC?
14:15 🔗 alard wget-warc.
14:15 🔗 balrog also does that deal with when you have to use phantomjs?
14:15 🔗 SketchCow What's the status on those fucks accepting wget-warc
14:15 🔗 balrog or are those special-case?
14:16 🔗 alard SketchCow: The last response was 'wow, that diff is huge', and he was inclined not to include it, but offer it as a separate extension (as in: you'd have to enable it before compiling).
14:16 🔗 balrog alard: your regex doesn't work :/
14:16 🔗 balrog alard: hmmmm… mailing list?
14:16 🔗 alard But I made the mistake to include the whole warctools library, which includes things like the curl-extension etc.
14:16 🔗 SketchCow Well optimize and get that in
14:16 🔗 SketchCow That's a huge win
14:17 🔗 SketchCow It'll change everything out there
14:17 🔗 * balrog reads up on regex
14:17 🔗 alard Yeah, well, I replied that the files that the wget extension uses are much smaller. I haven't yet got a reply to that.
14:17 🔗 SketchCow I say just do it.
14:17 🔗 SketchCow It'll make a huge change in the world.
14:17 🔗 alard I'll probably make a smaller diff and send that to them.
14:18 🔗 alard Or two versions: the small one with built-in warc, the other one with warc included.
14:18 🔗 SketchCow I have now discovered I have two .tar files of the same range.
14:18 🔗 ersi Kick ass effort alard. Kick ass
14:18 🔗 SketchCow One is 111gb. One is 206gb
14:18 🔗 balrog huh, why the difference?
14:18 🔗 SketchCow NO IDEA
14:18 🔗 alard balrog: Did you use grep -E ?
14:18 🔗 balrog oops, no :p
14:19 🔗 balrog that worked, but it grabbed full lines
14:19 🔗 balrog I don't want full lines
14:19 🔗 balrog I want to isolate the relevant parts
14:19 🔗 alard Maybe do grep -oE "http://(homepage|web)\.(mac|me)\.com/[^/]+"
14:20 🔗 balrog alard: does that assume lines start with http://? they don't
14:21 🔗 alard Yes, it does. It also assumes that every url ends with a /
14:21 🔗 alard grep -oE "(homepage|web)\.(mac|me)\.com/[^ ]+" stops as the first whitespace character.
14:22 🔗 balrog URLs are formatted http:// … /username. however they may have text in front, or after them, within the same line
14:22 🔗 balrog you could have like "Check out this site: <a href="http://homepage.mac.com/someone">Here!</a>"
14:22 🔗 alard Oh, sorry, it doesn't assume that the *line* starts with http://, just that the *url* starts with http://.
14:22 🔗 alard grep -oE 'http://(homepage|web)\.(mac|me)\.com/[^/"]+'
14:26 🔗 balrog much shorter list than I expected.
14:26 🔗 alard Then it's probably good to check the regexp.
14:26 🔗 balrog http://pastie.org/private/l5cjotdi58ttf8bq8g4m8g
14:26 🔗 balrog I did.
14:27 🔗 balrog the incoming HTML filter would put http:// before all urls
14:27 🔗 balrog you have all these?
14:40 🔗 balrog alard: did you have these already?
14:43 🔗 alard balrog: Just checked, most of them, not all.
14:43 🔗 balrog OK
15:01 🔗 alard SketchCow: One more question, if you're still there. It's possible to download the gallery contents to WARC. However, I think it doesn't make sense. It certainly wouldn't be useful with the wayback machine.
15:02 🔗 alard So I'm thinking that downloading the metadata xml/json and zipping the images per album is the best solution.
15:03 🔗 SketchCow I agree, then.
15:04 🔗 alard The problem with the gallery is that it isn't really a web page, but a collection of image files that can be renderd in different formats. So for a wayback-thing, you'd have to get every possible format.
15:14 🔗 alard Well then, I think that the scripts are finished.
15:14 🔗 alard If anyone would like to do a test run, please do! https://github.com/ArchiveTeam/mobileme-grab
15:42 🔗 SketchCow -rw-r--r-- 1 root root 205 2011-10-05 17:14 ballsack
15:42 🔗 SketchCow -rw-r--r-- 1 root root 2425 2011-10-05 16:20 balls
15:42 🔗 SketchCow drwxr-xr-x 2 root root 4096 2011-10-05 17:19 DONE
15:42 🔗 SketchCow That's how you know it was me
15:43 🔗 balrog LOL
15:48 🔗 lowtekk i seem to have acquired an "@", considering I may as well be a stranger, someone should probably take it away
15:49 🔗 balrog "@"?
15:49 🔗 lowtekk i do enjoy lurking, and as much as i love collecting old documents, i haven't contributed a darn thing to this cause
15:49 🔗 lowtekk op status, unless I'm mistaken
15:49 🔗 balrog oh, that
15:50 🔗 balrog yeah I don't know :p
15:50 🔗 balrog I think I was made op here once, though. idk either
15:50 🔗 balrog this is efnet though
15:50 🔗 balrog if you were to part and return, it would go away
15:50 🔗 lowtekk i've grown rather fond of it
16:00 🔗 sp0rus lol, i was made ops once in this chan
16:00 🔗 sp0rus happens sometimes
16:01 🔗 SketchCow It's all on my arbitrary observations, bitches
16:31 🔗 yipdw free-flowing ephemeral op-bit
16:31 🔗 yipdw probably the best way to avoid power clashes
16:56 🔗 jjonas hey friends:)
16:56 🔗 sp0rus hello
16:57 🔗 jjonas its old news but i think it would make sense to note the closure of labs.google.com somewhere in the archiveteam.org wiki?
16:58 🔗 sp0rus do it
16:59 🔗 sp0rus http://archiveteam.org/index.php?title=Deathwatch
17:01 🔗 jjonas it made me lose thrust in google and google inovation, i miss google sets and google squared
17:02 🔗 jjonas ok im going to add a line there and to the article about google
17:02 🔗 ersi s/thrust/trust
17:02 🔗 ersi I made that spelling error a lot earlier :)
17:03 🔗 jjonas of course...
17:03 🔗 SketchCow I agree.
17:04 🔗 SketchCow Stupid Google
17:04 🔗 SketchCow It's not impressive to turn off Google Labs
17:04 🔗 SketchCow It was inspiring to go there and see crazy projects
17:04 🔗 SketchCow The only (only) justification I can come up with is that people/businesses/entities were monetizing or showing reliance on them
17:05 🔗 ersi Closing down Google Code Search is fucking stupid as well
17:05 🔗 jjonas that was back when you had gameing equipment by thrustmaster?^^
17:05 🔗 ersi their main shit is/was search once in a time
17:07 🔗 jjonas when did google code search vanish :-O
17:07 🔗 jjonas ?
17:07 🔗 jjonas was it considered part of google labs too?
17:07 🔗 SketchCow It's not gone yet
17:08 🔗 SketchCow It's being killed
17:08 🔗 SketchCow January
17:08 🔗 jjonas *sigh*
17:09 🔗 ersi Also, no, it was a seperate project.
17:36 🔗 Ymgve but is there any content in google code search? or was it just an alternative view of stuff that's already on the web?
17:36 🔗 SketchCow No content
17:36 🔗 SketchCow Just a great tool
17:36 🔗 ersi Which still makes it a fucking shame that they're disbanding it
17:37 🔗 Ymgve someone tell ms to make bing code search
17:37 🔗 ersi I mean, what do you think, when you think Google? Most people think Search.
17:37 🔗 ersi Or did, atleast. I think of advertisement these days.. and crappy search
17:42 🔗 sep332 is there a better search engine? I know blekko and duckduckgo have some cool stuff, but for general web stuff?
17:45 🔗 SketchCow grep
17:52 🔗 * Coderjoe grumbles
17:52 🔗 Coderjoe I am beginning to think I should have used wget-warc
17:53 🔗 Coderjoe 5GB and still going. apparently there are some books in there too
17:55 🔗 jjonas what are you archiveing?
17:55 🔗 sp0rus Coderjoe: wow, when he popped in talking about the site I expected a few hundred megs tops
17:56 🔗 jjonas possibly for google code search there is some rationale to close it down - that it can be used as a tool for hacking in various ways
17:56 🔗 Coderjoe jjonas: lachlan.bluehaze.com.au
17:57 🔗 Coderjoe australian physicist that died last year. doing an AFK pull
17:57 🔗 Coderjoe I should go bluehaze.com.au as well, as that site belonged to a guy that died in 2006
17:57 🔗 Coderjoe s/go/do
17:58 🔗 Coderjoe argh. can't type
17:58 🔗 jjonas but then, who wrote on top if it that he died in 2010 and that it stays as a memorial?
17:59 🔗 Coderjoe the person keeping bluehaze around as well.
17:59 🔗 jjonas .... but i really have no idea why they droped/hide google labs completley
17:59 🔗 jjonas i tried to look it up in the waybackmachine
18:00 🔗 jjonas to see all the various nice tools/attempts that i dont even remember
18:00 🔗 jjonas but its not in the waybackmachine
18:03 🔗 jjonas nvm! googlelabs.com is, just the subdomain isnt
18:15 🔗 ersi jjonas: That's a fucking stupid ass rationale
18:15 🔗 jjonas :D
18:15 🔗 jjonas haha
18:15 🔗 ersi I mean seriously, punch you in the face stupid
18:16 🔗 Coderjoe I can stab someone in the eye with a pencil. should we remove all pencils?
18:16 🔗 jjonas i wasnt trying to defend such a rationale
18:17 🔗 ersi I didn't perhaps mean you as in you
18:17 🔗 ersi If you're a sad frightened panda right now, that is
18:17 🔗 jjonas i would just be as surprised about that kind of reasoing
18:17 🔗 jjonas that google might did before decideing to close it down
18:18 🔗 Coderjoe heh.. it's like someone went "The terrorists crashed planes into buildings. We must outlaw all planes."
18:18 🔗 jjonas than iam about them closeing google labs
18:18 🔗 jjonas *NOT be as surprised
18:19 🔗 sep332 Remember Jonny Long's "Google Hacking" books?
18:20 🔗 sp0rus sep332: aye
18:21 🔗 jjonas but if google and other big companies would think like you consequently
18:21 🔗 jjonas they had realized many usefull features already
18:22 🔗 jjonas that arnt there yet
18:23 🔗 jjonas if you add this as a firefox bookmark and set keyword "mp3"
18:23 🔗 jjonas http://www.google.de/search?hl=de&safe=off&q=intitle%3A%22index.of%22+(mp*|avi|wma|mov)+%s%2Bparent%2Bdirectory+-inurl%3A(htm|html|cf|jsp|asp|php|js)+-site%3Amp3s.pl+-download+-torrent+-inurl%3A(franceradio|null3d|infoweb|realm|boxxet|openftp|indexofmp3|spider|listen77|karelia|randombase|mp3*)&btnG=Suche&meta=
18:23 🔗 ersi shrug
18:23 🔗 sep332 I think we should remove all CoderJoe's, the world will be safer without their(?) violent imaginations
18:23 🔗 jjonas then you can type in the adress bar "mp3 any title/artist"
18:23 🔗 jjonas and find working mp3 links
18:24 🔗 jjonas i changed it the last time like 5 years ago so the excluded spam sites might not be uptodate
18:24 🔗 Coderjoe yeah... there is apparently another coderjoe out there, whose name is actually Joe
18:24 🔗 Coderjoe (mine is not)
18:24 🔗 jjonas ...but it works
18:24 🔗 jjonas and you maybe use something similar already
18:24 🔗 jjonas so why does google not have a tab "mp3" next to images,maps,...
18:25 🔗 ersi Let's get back to talking about archiveteam stuff instead of fluff
18:25 🔗 Coderjoe expected record company outrage?
18:26 🔗 sep332 baidu has an mp3 search, mp3.baidu.com
18:26 🔗 jjonas thats a differnt enviornment, google china also has a million songs freely downloadable
18:26 🔗 jjonas (with a chinese IP only of course
18:27 🔗 jjonas ......
18:28 🔗 jjonas yeah, lets talk about archiving, since i made my point why they maybe would (sadly) close down google code for such a reason :D
18:29 🔗 jjonas if you dont mind check my grammar about google labs in http://archiveteam.org/index.php?title=Deathwatch#2011
18:34 🔗 jjonas btw, just to finish the mp3 subtopic condignly: the russian facebook pendant vkontakte.ru has a great community directory shareing all mp3s paird with lyrics files among 100+ million users just like there is no copyright :D
18:34 🔗 chronomex is no copyright in soviet russia
18:35 🔗 chronomex nor in capitalist russia
18:35 🔗 jjonas so warez sites are legal there too
18:35 🔗 jjonas ?
18:35 🔗 jjonas even if they have international users
18:35 🔗 jjonas :-O
18:35 🔗 * chronomex shrugs
18:35 🔗 chronomex eez joke
18:36 🔗 ersi Calm the fuck down
18:36 🔗 * ersi brings out the sedatives
18:36 🔗 SketchCow http://yfrog.com/z/obj01nxj
18:36 🔗 ersi SketchCow: Hah, awesome
18:37 🔗 jjonas :) im not nervous, just kidding
20:19 🔗 Coderjoe oh joy
20:19 🔗 Coderjoe I don't know where the link was that caused me to go astray
20:19 🔗 Coderjoe but apparently, the server has no trouble treating html files as directories
20:19 🔗 Coderjoe http://lachlan.bluehaze.com.au/deep.html/books/usa2001/usa2001/usa2001/gnomes.html
20:20 🔗 Coderjoe that gives you the "deep.html" page
20:20 🔗 Frigolit that's called "path info"
20:20 🔗 Coderjoe yes, i know. and I've used it on php, just not html
20:21 🔗 Coderjoe but there is a bad link that lead me to an infinite recursion problem
20:21 🔗 Frigolit ah
20:23 🔗 Coderjoe my apache config at home does not appear to allow pathinfo on html, but then I am not parsing html (while the lachlan server is)
20:25 🔗 Coderjoe somewhere on that site is at least one bad link that adds a directory level to the entire site
20:30 🔗 Coderjoe i'm going to terminate that until I have a chance to inspect things a bit more
21:40 🔗 Paradoks http://www.economist.com/node/21529030
21:41 🔗 Paradoks Scanning and destroying books, for a fee. I wonder if this horrifies Sketchcow. Obviously, it's not archiving, though some people might use it that way.
21:44 🔗 Coderjoe scanning good. destruction BAD
21:46 🔗 sep332 related blog post on it http://ascii.textfiles.com/archives/2672
21:52 🔗 goekesmi It's always a hard call when it comes to books.
22:20 🔗 dashcloud if the book needs to be destroyed, I'm expecting perfection for results- anything less isn't worth it (for the sake of archiving, it's not worth it, but I'm sure many people would be happy to make that choice)
22:21 🔗 yipdw oh, I dunno -- people seem perfectly happy to accept 1080p masters for films these days
22:24 🔗 sp0rus yeah, but people are stupid
22:24 🔗 yipdw at least 1DollarScan/Bookscan seem to be clear that they only do this for mass-market copies
22:24 🔗 yipdw that seems to be a bit more sane
22:24 🔗 yipdw well, I think, I dunno -- it's not spelled out in that article
22:26 🔗 sp0rus if it's mass-market and not hard to find, that's a little different
22:27 🔗 yipdw right
22:27 🔗 yipdw I think that's the intent here
22:30 🔗 dashcloud yipdw: what quality masters should people be asking for?
22:32 🔗 SketchCow Hiiii
22:33 🔗 yipdw dashcloud: the highest available, which for some films is 1080p -- Ultraviolet and new scenes in Star Wars come to mind
22:33 🔗 yipdw dashcloud: but it's more that 1080p is markedly inferior in terms of resolution to earlier production techniques, and what with the availability of digital cameras like the RED ONE system it doesn't have to be that way
22:33 🔗 yipdw so, yeah, more of an offhand snark
22:34 🔗 dashcloud at least some of the Blender Foundation's open movies are available as higher than 1080p films
22:36 🔗 yipdw yeah, and with those it's theoretically better because the film assets are available
22:36 🔗 dashcloud here's an awesome article about gifs : http://motherboard.tv/2010/11/19/the-gif-that-keeps-on-gifing-why-animated-images-are-still-a-defining-part-of-our-internets
22:37 🔗 yipdw I say theoretically because I sure as hell haven't been able to e.g. re-render Big Buck Bunny from the assets directory :P
22:37 🔗 dashcloud I know the 2k frames were/are available from xiph's sample site
22:41 🔗 SketchCow My attitude on 1dollarbookscan is it makes more sense than throwing them out
22:54 🔗 SketchCow Barely
23:20 🔗 chronomex ^
23:45 🔗 underscor BURP
23:45 🔗 underscor Another 300GB into the archive

irclogger-viewer