[00:02] Yeah, I gave a good talk. [00:02] And the con itself was good. [00:02] I was pretty sick, though. [00:38] SketchCow: was it recorded? [00:39] All my talks are recorded. [00:39] I'll have a copy up at some point. [00:39] it's not a funny talk [00:44] I'll catch it when it's up; the site doesn't even have a program anywhere right now [01:29] are the mobileme grabs being stored on s3? [01:34] Hard drives, mostly [01:34] http://archive.org/details/mobileme-hero-1336159314 example [01:35] ok [01:45] mobileme hero, 'zat some sort of video game where you smash files coming down 5 uplinks? [01:48] Yes [02:19] looks like torrentfreak.com does a lot of 405 [02:19] even when you can still download them thur normal wget [05:19] shaqfu: wät, it's pretty logically built. /m/username and then /m/username/travel_diary/ ;o There's a "All diaries" list view in all user profiles, and a "All posts in diary" list view linked on all diary detail views [05:19] going to look into it more today though [08:13] * SmileyG_ wibbles [08:13] i left it running all weekend. Hopefully it got quite a bit done. [08:17] * SmileyG_ wibbles [08:58] $270/4TB http://www.amazon.com/Hitachi-Deskstar-0S03364-Frustration-Free-Packaging/dp/B005TEU2TQ/ [12:04] fileplanet is done until id 54999, http://www.quaddicted.com/forum/viewtopic.php?pid=251#p251 [12:05] Schbirid, did you get the id list form underscor ? [12:05] if you want to help, just pick a ~5k increment (x0000-x4999 or x5000-x9999). [12:05] nope [12:05] not really needed though [12:06] since my script tries each id. and if it finds one, it immediately has a valid download link available [12:07] after finishing a part, put the pages_ and files_ logs plus the www.fileplanet.com/ into a directory named after your start and end IDs and tar that. eg 40000-49999.tar [12:07] ehm, I'm still busy with wikis [12:07] just posting this for anyone ;) [12:08] hm, do i need to make an item before i can s3cmd add to it? or can i go full commandline? [12:08] you can do everything from CL [12:09] auto-make-bucket or so [12:09] x-archive-auto-make-bucket:1 [12:15] uploading the first one [12:15] with 1megabyte/s :) [12:18] yeah, sweet. item creation worked. http://archive.org/details/FileplanetFiles_00000-09999 (still uploading) [12:30] http://archive.org/details/FileplanetFiles_00000-09999 is done [12:40] :O [12:40] :D [13:22] ugh, tar really could use some modernisation (i am a heretic) [13:23] i seriously cannot make multivolume archives non-interactively (without a script)? :( [13:26] you what [13:27] i want to tar some files. i want to have not one big tar, but several smaller (to ease uploading). [13:29] What supports that? zip? rar? [13:31] use split [13:32] Not really the same thing, since that can split in a header or some shizzle like that [13:32] but it's certainly something to consider using instead [13:32] tar something | split -b 2MB [13:32] and then cat * > file.tar [13:34] it seems like what i really want to use is http://aws.typepad.com/aws/2010/11/amazon-s3-multipart-upload.html [13:34] http://archive.org/help/abouts3.txt says "We support multipart uploads." [13:35] what about rsync? [13:35] wait, i totally did this before, didn't i [13:36] nah, don't think i can use that for archive.org [13:40] or not [13:40] wasn't there something where archive.org would automatically join split files? i vaguely remember something [14:08] Schbirid, yes that's it [14:08] but it's a way to upload a single file faster [14:09] not to join tars or so [14:23] yeah, noticed that :\ [14:23] guess i will move the big ones to my server first [14:25] Schbirid, what's the point? [14:27] that i dont want my poor connection saturated for 6 days [14:29] heh, i should just redownload that chunk on the server [14:34] next slice is up http://archive.org/details/FileplanetFiles_10000-19999 [14:57] I've been grabbing images from nukes messages (from the warez scene): http://www.mediafire.com/?3br2369bs58td are those that were still online and of which I had an url (from doopes.com) [15:09] http://nesdev.parodius.com/bbs/viewtopic.php?p=92701#92701 <- argh is that someone from here? i hope not... [15:09] you can just ASK koitsu for a site image [15:18] someone asked him and he got butthurt [15:46] Schbirid: Wiki page for Fileplanet is up [15:47] If you'd like, I can let you use my wiki login to keep the file mark updated [15:47] DFJustin: *sigh* really? [15:48] balrog_: You weren't around for that episode? [15:48] no? [15:49] I tried to wget a list of x.parodius.com domains and was blacklisted [15:49] that was you that he was complaining about in that post? :/ [15:49] Someone asked the admin for help, and he got "ethically offended" or something [15:49] balrog_: Yeah :P [15:50] first of all, did anyone use the wiki downloading tools to properly mirror any wiki content? [15:50] "such utilities get stuck indefinitely on forums/boards" [15:50] I can see that happening ... [15:50] what I'm afraid is that some of the domain/host maintainers are no longer around [15:50] and therefore won't move their sites :/ [15:51] I thought people who were downloading that site were pacing themselves [15:51] My connection is slow enough that it auto-paces :P [15:51] shaqfu: um no [15:51] obviously not slow enough for that guy [15:51] it's not about speed as much as connection rate [15:51] his graphs show outbound traffic at around 400-600 kbit/s [15:51] or "hammering" [15:51] that too [15:52] use --random-wait [15:52] Yeah, that's what I didn't use [15:52] together with --wait [15:52] It was the sustained connection that did me in [15:52] yeah [15:52] are you still blocked? [15:52] Yep [15:52] :[ [15:53] and why did he get "ethically offended"? [15:53] because that's "other people's stuff"? [15:53] Yep [15:53] bleh [15:53] well do we have a list of what's archived and what isn't [15:54] IIRC we're going to wait until the domain's about to close, then go through again [15:54] I wouldn't wait that long [15:54] since from what it looks, he'll block anyone who causes above-normal usage [15:55] you'd have to scale it so that the extra usage is undetectable [15:55] like 1 request per 20-40 seconds [15:55] which means, painfully slow archiving [15:55] also some sites are already taking stuff down [15:56] also, if you haven't been there before, it's probably not worth doing [15:56] I've been there now and again [15:56] from that post I gather that that admin is crazy enough to go through his access logs [15:56] and check for access patterns in that [15:56] yeah :| [15:56] in other words, fuck him [15:56] as I said, if you keep usage down enough so that it doesn't cause a spike in the graphs... he won't see it [15:57] some wget speed limits and wait times should prevent this [15:58] limit to 20 KB/s and 20-40 second wait [15:58] will take weeks to archive [15:58] also hit the smaller sites first [15:58] the ones that don't have forums [15:58] leave the worst (ones with forums) for last [15:59] it'll probably have to be you or someone else who has a pattern of previously visiting that site [15:59] also, check like http://www.foxhack.net [15:59] "So please don't go around making backups of my site's contents. Doing it without permission isn't cool." — basically, don't release mirrors of sites that are not going away [15:59] yipdw: I don't think he'll check if he doesn't see a spike in the graphs [16:00] shaqfu: can you send me the *.parodius.com list? [16:00] is it what appears as "Hosts" on parodius.com? [16:05] balrog_: Plus others [16:05] aggro may have put it on the wiki; let's see... [16:05] Yeah, that's the list [16:06] basically, people have to go to each one and see what's there [16:09] I think SketchCow may have gotten in touch with koitsu; you may want to ask him what the plan is right now [16:09] imho the most important ones are the static sites whose maintainers have disappeared [16:12] Anyway, I gotta run for a bit [16:12] Schbirid: Let me know if there's anything you need updated on the wiki [16:12] (I think we're going to need a more efficient way to do this, esp. at higher numbers) [16:55] free comic book kh43.com [16:56] shaqfu: so far it is just me so i am comfortable using my own site to track it atm [16:56] but yeah, if we want to get it all, then it will be hell [17:04] http://www.reddit.com/r/AnythingGoesVideos/comments/tbby1/sabrina_johnson_fucked_by_2000_cocks_for_2_days/ [17:06] I assume he wants us to archive all the cocks [17:07] All these cocks will be lost in time, like tears in rain [17:07] Wait, this isn't rain [17:07] that's not milk [17:07] Oh goddamnit it isn't rain IT ISN'T RAIN GETITOFFGETITOFF [17:08] I don't think getting off is the problem here [17:08] SketchCow: And I'll never find that recipe again [17:09] I've got to say, these spambots are pretty weak. I really have to wonder who would be convinced by the "free comic book kh43.com" one earlier [17:10] why was the reddit link spammed? [17:10] So you'd go to it and click on the link of the story. [17:15] oh [17:17] is it calpis? [17:17] i kinda want to watch that video ¬_¬ [17:19] http://www.youtube.com/watch?v=a4NCnH7RPZY watch that instead [17:21] nice [17:21] vocals are super silent though [17:37] They're prominent here. It's your fault, somehow [17:39] what a fantastic video SketchCow [17:39] i love it :D [17:45] 55k-60k here i come [17:46] I dread to think of how large the entire FP set is [17:46] 500GB, at least [17:46] ha [17:47] well, unless there is a gap somewhere, there are ~200k to go. last 5k were almost 25G. so if that size would not increase it would be 1TB [17:48] but, the previous 10k were <20, so this might make a big jump [17:48] Schbirid: How many legit files do you have? [17:49] http://www.quaddicted.com/forum/viewtopic.php?pid=251#p251 [17:49] Okay, about a quarter done [17:49] Wasn't sure if it was denser at the end or beginning [17:54] i wonder if i manage mount the tar files from archive.org with fuse [17:56] that would be stupid, i could just link to the tar viewer instead [18:51] Sigh; barring a miracle, the Arneson papers will end up split in private collections... [18:56] So much for effective research into early game design [18:59] http://archive.org/details/FileplanetFiles_40000-49999 done [19:07] Schbirid: Do you plan on updating that forum thread regularly? [19:07] sure [19:08] I'll link to it then [19:08] people can post without registering too, just need to know/google some fact about quake (spam protection) [19:08] where? [19:09] http://archiveteam.org/index.php?title=Fileplanet [19:09] cheers [19:09] 87,190 is underscor's number i guess? [19:10] Yeah [19:10] nice [19:10] Kinda surprised that ~60% of IDs are dead [19:13] yeah [19:13] that often happens though [19:21] Schbirid, add a fileplanet keyword at least :) [19:35] SAN FRANCISCO (MarketWatch) — Patti Hart, the Yahoo Inc. board member who chaired the committee that led to the hiring of Chief Executive Scott Thompson, is stepping down, AllThingsD reported Tuesday, citing unnamed sources. [19:40] ? [21:05] Wow [21:05] That's crazy [22:09] This may be problematic; the .html pages being pulled off FP don't contain metadata... [22:12] It needs to grab /fileinfo/ also