[00:42] Is there any kind of archive of product packages out there? [00:43] Lately (past year), I've been photographing every food/other package I could find [00:47] tfgbd: mm [00:47] this may be useful for a planned future project of mine [00:47] :) [01:02] wikimedia commons would probably take some of them if they're good quality [01:08] All angles? [01:08] I have all sides of the box [01:08] and the date can be gathered from the image metadata [01:11] Do you guys or archive.org ever work with sites like: http://www.oldversion.com/ or http://www.oldapps.com/ [01:13] work with, no [01:13] oldapps was partially crawled with archivebot [01:14] might be easier if they just gave you access to backups [01:14] why only partially? [01:14] job crashed [01:14] Great idea, you're in charge [01:14] actually archive.org has gotten sets of data from some software sites in the past [01:14] tucows circa 2004 https://archive.org/details/tucows [01:15] yeah, I knew about tucows [01:15] old browser versions from evolt.org https://archive.org/details/evolt_browser_archive [01:15] wait, is tucows gone? [01:15] no it's still around last I checked [01:15] but the archive wasn't kept up to date [01:15] They just removed old stuff [01:16] I know there used to be lots of tucows mirrors around years back [01:16] they're like one of the few that had tons of mirrors [01:16] there's an archive team project to crawl all public ftps https://archive.org/details/ftpsites [01:17] do those use WARC too? [01:17] no [01:17] just tar or zip [01:17] where do you put them then? [01:18] archive.org file items [01:18] that sucks [01:18] wayback machine doesn't do ftp [01:18] you have to download the whole FTP? [01:18] well, at least they're there [01:18] depends on the site, some of them are too big and have to be split into subdirectories [01:19] maybe they will be able to be mirrored somewhere if they ever start a project for ftp [01:19] also archive.org lets you browse inside archive files [01:19] ahh [01:20] look for the "[contents]" link or add a / to the download link [02:03] tfgbd: joepie91: DFJustin: if people are seriously interested in jdget then I'm interested in help getting it into a stage where it's maintainable and useful [02:20] cool [02:20] how does it deal with captchas, though [02:31] it doesn't [02:32] apart from jdownloaders captcha solver [02:32] which I might even have disabled [04:02] http://www.ephotobay.com/image/shadowrun-snes-300.jpg [04:35] Boop [04:35] beep [04:56] Working on a Lemmings SNES scan for Psygnosis.org, Jason. [05:19] -bs, please. :) [13:01] espes__: does it actually run as native code? [13:01] or does it still use the JRE [13:57] joepie91: it's all compiled to native code with GCJ [13:58] you get one fat 80MB binary [14:56] You know what we need, a Waterboy parody video that replaces "Water Sucks" with "Yahoo Sucks" and "Gatorade is better" with "ArchiveTeam is better" [14:56] * Jonimus debates working on that tonight. [15:09] Jonimus: we getting a theme song now? :D [15:15] hey, does anyone know what happened to the google video archive? [15:16] all I can find on archive.org is one item that looks like a captured one by them, not us: https://archive.org/details/GVID-20110417095014-crawl340 [15:21] joepie91: well my GF is a pretty good singer, if I can get something together we just might. [15:22] :D [15:22] SketchCow: see above question from qwerty0 [15:22] Good question. [15:48] SketchCow: I saw this contributor uploading old magazines: https://archive.org/search.php?query=uploader%3A%22paulo%40paulogarcia.com%22 [15:48] Maybe something for the magazine archive? [15:51] Sadly, they were already in there. [15:51] All of them, just checked. The exactfiles. [15:57] Hoped there was something new in there :/ [15:58] Nope, just someone blowing through the same collection I did, 2 years ago. [15:58] An hero [15:59] Heh. I've got hundreds of old magazines nobody has scanned, anywhere. [15:59] AAAAND all the coverdiscs and cover CDs too... [16:00] Going back about 20 years [16:00] Oh god, they take up so much room.. [16:00] I have a problem. :( [16:00] SketchCow: about google video, do you think you could find out where it went? [16:21] We grabbed metadata.\ [16:22] Oh, I thought I remembered we handed over all the data for them to host. [16:22] or, store, at least [16:23] I show just metadata. [16:24] Damn. So where'd it go? I hope it wasn't discarded when they announced the Youtube migration feature. [16:24] google did a proper shutdown after we screamed [16:25] https://archive.org/details/google-video-metadata-dumpage [16:27] It'd be a shame if it was lost, since a lot of videos never migrated and gv is toast now. [16:27] Well, good question what happened to the 18gb [16:27] *TB? [16:27] Or tb. [16:28] haha, good [16:28] I am not quite in the proper mood for this investigation. [16:28] First, please, do not talk as if this was the fall of rome. [16:28] We got Google to do a proper migration. [16:29] Second a lot of videos that didn't migrate basically failed the content filter. [16:29] Finally, there's no case where I or archive.org deleted data [16:30] Potentially, it got lost, maybe, but I doubt that. [16:30] But not if it went on archive.org. [16:30] 18tb back then would definitely have been a major deal to put on archive.org. [16:30] We are not perfect. [16:30] We're better than we were and worse than we will be. [16:31] archivebot solved a lot. [16:32] Boy, I better get on top of this energy issue [16:32] I don't like losing hours [16:33] I can't even find a record of what we did with the google video. [16:35] https://archive.org/details/googlevideo2011 [16:35] Looks like IA crawled it off archive team metadata [16:36] Justice. Justice was served. [16:37] I still have 20gb worth from my googlegarge folder, I thought it was rsynced to you at some point [16:37] *googlegargle [16:38] There is a chance [16:39] A slight one, this is after all 3 years ago [16:39] That what I did was work with Kenji to have him crawl the videos out with IA and then I'd delete our copies [16:40] Video continues to be our weak point [16:40] One little maniac with a HD cacorder can film himself eating a bowl of captain crunch for 20 inutes and there's 5gb [16:41] oops, connection problem [16:41] qwerty0: http://sebsauvage.net/paste/?17aacded4c8c1d77#wzItqU02Q/D1JJ5jL6Wjhva/6z0D4EdwDnfjhVPpRKM= [16:42] might it be in a noindex collection somewhere? I remember there was some copyright concerns with e.g. stage6 [16:43] Possibly [16:43] But I am fairly sure, as that was my first year with IA, that Alexis would have told me to make it wayback friendly [16:43] joepie91: awesome, thanks [16:43] And possibly, that meant the swapover via Kenji [16:43] Since this precedes MegaWARC and archivebot [16:45] SketchCow: okay, yeah, to be clear: I'm not looking for blame or anything, just trying to do some follow-up [16:45] I wouldn't let files get deleted. [16:45] But they're likely in wayback as links. [16:46] SketchCow: cool, yeah, that's the last thing I'd assume you'd do. [16:49] I'd say "it's somewhere" [16:50] But you need to know that archive team has a class of materials that are of dubious accessibility [16:50] One of our intentions was to make it so there were much less of those going forward [16:50] And I think we did well. [16:50] Yeah, I figured it was just a matter of surfacing it to the public. [16:50] We were working on an audit but people got bored/lost [16:50] It's tedious work and not as sexy for Our Fine Men [16:50] Right, exactly. [16:51] I know IA is just a group of people trying to do the best with a whole bunch of efforts. [16:52] So, easy to believe it could fall through the cracks. [16:52] #archiveteam.EFNet.20120807.log:[11:04:34] [2:03:22 PM] Kenji Nagahashi (Internet Archive): for stat lovers: % of video migrated from Google Video to YouTube: 11% [16:53] Is that the final stat? [16:53] probably, that's about when they were closing [16:53] Sounds right [16:53] dunno what % IA grabbed but I would assume a lot if he's making statements like that [16:54] SO MUCH of Google Video was DVD MPEGs shoved into the system [16:54] We estimated we got 40% [16:54] http://archiveteam.org/index.php?title=Google_Video#A_Brief_History [16:55] the googlevideo2011 collection weighs in at 72.19 TB [16:55] much more than we got [16:55] huh.. [16:56] So I'd say, for the moment, relax. [16:56] I'm not being a bummer or a burnout. [16:56] Yeah, that's awesome, if they grabbed a ton of video. [16:57] I don't even remember hearing they had a parallel effort. [16:59] They're sneaky, you know. [17:03] In that case, maybe they determined our stuff was just duplicating what they had, and discarded it. [17:22] anyone here ever heard of Boeing Calc [18:18] hey guys [18:19] does anyone know if it's possible to get a warc for a single site out of the Wayback Machine? [18:21] not possible [18:23] I was afraid of that [18:25] I guess I can spider the archive... [18:27] is it possible to download any of the warcs it serves from? [18:29] only the archive team ones [18:39] so i just found something interesting [18:39] looks like even when robots.txt is blocking a url path images still can be viewed [18:40] example: https://web.archive.org/web/20000407082332im_/http://www.msnbc.com/modules/tvnews/today/today_left.jpg [18:46] SketchCow: ouch, the audit didn't really get very far [18:54] SketchCow: how were we checking that our WARCs have been integrated into the Wayback Machine? [19:12] oops: https://archive.org/details/archiveteam_yahooblog [19:26] yes? [19:33] xmc: compare to https://archive.org/details/archiveteam_yahooblogs [19:33] oh [19:39] We just need to get shit right [19:39] If you want to help [19:39] Come to #auditteam [19:43] hey SketchCow [19:43] ever heard of Boeing Calc? [20:10] No [20:48] Hey so, the current default project is in the all-claimed stage, and it seems like most others are too, what project is a good idea to work on? [20:50] qwiki discovery [20:50] Alright, because that'll get left open for a day or so [20:51] Just wanted to make sure it'd get its hour's worth