[00:49] SketchCow: looks some g4tv.com-videos are in g4video [00:49] there should be in g4video-web [00:59] alard: is there a reasonable, compatible way to pad a warc file? I want each response body to start at the beginning of a disk block, so zfs block-level dedup can work with it [01:00] I'm thinking a blank 'metadata' block would be the ticket [01:00] oh, nevermind, readers are supposed to skip unknown blocks [01:00] perfect [01:41] chronomex: interesting idea [01:52] chronomex: have you seen how much memory that takes though? [01:53] I have not [01:54] is it horrendous? [01:55] yes [01:56] 320 bytes per block allocated in your filesystem [02:01] the block size is variable, but assume 64k as a half-way point between the extremes [02:01] how big do you expect your dataset to be? [02:01] actually, if it's one giant warc file, then the block size will be 128k [02:02] http://constantin.glez.de/blog/2011/07/zfs-dedupe-or-not-dedupe is an article I hadn't seen before, but it summarizes the data very well [02:25] are you guys awake yet [02:32] Brenry: nope. what's up? [02:33] did they ever scrape user data for geocities ? [02:34] or was it just those neighborhoods or commercial sites [02:34] what do you mean by user data? [02:35] geocities.com/~username [02:35] so i can get my fkn .jpg pics [02:36] none of those were password protected.. user dirs.. just open files like a directory list unless it had an index.html file [02:37] ah, hrm [02:40] I don't really know [02:40] the geocities project wasn't very organized, I'm afraid [02:41] have you tried one of the mirrors? [02:42] one of those sites said the .jp geocities was still attached.. but that wouldn't have data from other regions eh ? tried that and said no longer exists [02:42] yeah spent alot of time like a month after that crap in 2009.. and a year later [02:42] and trying back [02:43] db48x: keep at it ok.. i'll be back in like 2 years.. and i would really like my pictures [02:43] what was your username? [02:43] oplazzz [02:43] let me see [02:43] it was geocities.com/oplazzz or geocities.com/~oplazzz [02:43] i tried wayback machine.. but doesnt seem to have users [02:45] hmm. you're not in the username list on oocities.org [02:47] nor is it available on reocities.com [02:47] but they'll email you if they come across it [02:47] k.. l8r [02:51] it occurs to me that I probably should have said that they _can_ email him, if he goes and puts in his email address [02:53] welp [03:04] found another broken video [03:04] not cause of errors but cause it slowed down for some reason at the 11min mark [03:11] so the 9200, 9559, and 9717 have some sort of bad encoding [03:12] videos can have variable frame rates [03:12] although it's rarely used, so it more often a mistake [03:12] the frames was move like 1 every 5 seconds [03:12] does the video run on past the audio? [03:13] there is no audio when this starts happen [03:20] got a sample godane? [03:20] https://archive.org/details/g4tv.com-video9200 [03:36] so i found some g4 underground clips [03:36] its better then the episodes that i have found [03:36] the episodes are all croped [03:37] so top and bottom are cut [03:46] i found a microsoft key note from tgs 2008 [03:46] tgs = tokyo game show [03:54] I uploaded this last night https://archive.org/details/osaka-game-show-2009 [08:12] godane - Stop uploading until I tell you to. [08:15] We need to give you direct access to the g4 collections because you successfully killed out opensource_videos, which is frankly amazing. [08:36] SetchCow: your joking right? [08:38] you aways tell me to upload stuff then we will deal with it [08:38] It won't last more than a day. [08:38] also your still puting them into g4video [08:38] not g4video-web [08:38] But it's midnight at Internet Archive, I need to have the privs modified during the busy day. [08:38] Dude, one day [08:38] ok [08:39] The most recent set wasn't put in there by me. [08:39] oh [08:39] It was put in my a desperate jeff trying to stop g4 related video from completely choking our RSS feed [08:39] And other things [08:39] oh [08:39] lol [08:39] Normally, my scooping up your uploads every once in a while was fine. [08:40] But you turned up the heat. [08:40] yes but this 35k+ videos [08:40] godane: make them beg! [08:40] So soon you will have the ability to declare g4video and g4video-web as the collection, and upload that way. [08:40] i hope i will get the twit collections access too [08:41] But we need a day, it's all timeshifted now. It's 9:40 here and 12:40 in California. [08:41] I'll get you ALL the collections you need. [08:41] ok [08:41] I am surprised you're not aware you're one of the single largest non-institution uploaders [08:42] wow [08:43] 35,000 videos is a lot of videos, sir. [08:43] Anyway, like I said, one day, and we'll get this shored up. [08:43] thats ok [08:56] godane: Bwahaha, you're TOO GOOD :) [08:56] That's awesome [08:59] :) [09:04] also its about 255gb now [09:43] there looks to be alot of first 15 mins previews of games [09:43] :-D [10:28] What? How hard is tar? :| tar -xf for extraction, tar -cf file for creation and [10:28] tar -tf to look at it without extracting ;o [11:11] i'm back [11:12] my internet wifi when out [11:12] *went out [11:12] SketchCow: dont miss going to http://www.computerspielemuseum.de/ ! [11:13] hm, their english site is incomplete [12:25] I prefer `tar -xvf ` so you can watch stuff scroll past [12:46] I do that with tars from unknowns, if I made it myself - I just -xf it [15:59] uploaded: http://archive.org/details/BBV.Customer.Service.VHSCap-CG [23:16] wow. I just found the world's first website. In the footer: "There have been [counter] hits to this site since noon GMT, Jan 1st, 4713 BC." [23:20] lol [23:20] technically true on all levels [23:41] hah