[00:57] OK, who wants to wget Hackaday? [01:19] SketchCow: i did that [01:20] SketchCow: see here: https://archive.org/search.php?query=hackaday [01:21] cleaner here: https://archive.org/search.php?query=collection%3A%22archiveteam-fire%22%20hackaday [01:30] godane: yo, feel like grabbing help.snapjoy.com and blog.snapjoy.com? [01:35] i'm mirroring snapjoy.com site and all sub dumps linked from there [01:36] godane: huuuuug [01:40] winr4r: I'm uploading it now [01:40] was only 14mb [01:41] Bravo, godane. I've been getting enquiries. [01:42] looks like cloudfront.net hosts images of snapjoy users [01:42] godane: yup, we're on the case [01:47] uploaded: https://archive.org/details/snapjoy.com-20130715 [01:48] looks like the feedback.snapjoy.com forums are gone [01:48] it redirects to the main site [01:55] godane: thanks :D [01:59] SketchCow: what worries you the most about the hackaday plans? [01:59] that the archive of posts could disable [01:59] *disappear [02:00] i'm up to 2013-06 with hackaday [02:02] does anyone know how to make grep stop a grep at another patten line? [02:02] godane: explain [02:02] my idea is to grab gbtv/theblaze video key [02:03] but i'm always going to over grab [02:03] some of the xml data has a lot of keyworks [02:03] *keywords [02:03] so a fix -A20 of something may not work aways [02:04] *always [02:12] hm [02:17] i'm not sure you can with grep [02:19] it looks like the first 5 links work for me for most of the data [02:53] winr4r: i got it to work [02:54] i had to new line variables after find the video key [02:54] since the video key with everything is one line [02:55] i will not get any other data [02:55] ah :) [03:55] Anyone grabbed ftp.atari.com? [04:03] I'm grabbing it. [04:04] * winr4r salutes [04:05] Man, this Manga collection I'm adding is just so much Yaoi [04:05] I think it's possibly because I'm in the A's only so far, and that's got words Yaoi tends to use. [04:19] aaan~ [04:27] two things: [04:27] 1. it would be appreciated if we get a #76days archivist for when things come up [04:27] (on freenode) [04:27] and 2. still looking for some coders in #pushharder [04:28] (here on EFNet) [04:29] (#76days is an investigation of recent happenings on the pronounciationbook youtube channel) [04:34] care to provide some more background for the reprobates among us who don't know what that is? [04:42] 76days? [04:42] https://docs.google.com/document/d/1UamrCTSCj7IleTVnxNn2mCGX7AsLC-AlOGAYghsKZA0 [04:42] tldr pronounciationbook (the YT channel) has begun counting down each day from 76 a few days ago [04:42] currently at 71 [04:42] 4chan, other conspiracy groups investigating [04:43] the reason I bring it up here is because IIRC they found a vimeo page related to it, but its videos were deleted shortly afterwards [04:44] oh it's one of those internet game things [04:45] aka "you're being trolled" [04:48] [23:45:21.721] aka "you're being trolled" [04:48] there's some speculation that it's been in the works for 4+ years [04:48] but only time will tell [04:49] said wp494 in the voice of einstein in the intro video for red alert 1 [07:30] any word on whether PACER makes an effort to track down people with multiple accts, bringing each one up to just under the limit for not being billed? [08:31] ´ [09:33] FINISHED --2013-07-16 08:17:54-- [09:33] Total wall clock time: 4h 14m 33s [09:33] Downloaded: 2036 files, 27G in 4h 3m 1s (1.87 MB/s) [09:34] wroom [09:35] zip -9 -r ftp.atari.com.2013.07.zip ftp.atari.com [09:35] Nice [09:35] That'll take a while. [09:35] 1 file left to upload in pouet.com_full_grab [09:35] 90% done :D [10:52] more news on the hack-a-day buy/sell thing [10:52] http://hackaday.com/2013/07/15/were-going-to-buy-hackaday/ [13:47] Any update on the identi.ca deleted stuff being brought to archive.org? [16:31] xmc: need your help in #jenga [19:59] hello world [20:00] omf_: finally got a NAS for all these repos, 16x harddrive bays [20:01] :-) [20:01] WiK: I might write some software that lets you store more repos [20:01] now i just need to get some harddrives for in it [20:01] ive got a lic copy of unraid for it as well [20:02] the git objects need to be stored uncompressed (but still packed) and the whole repo needs to be LZMA2'ed [20:02] git uses zlib which isn't so great [20:02] well, dont know how well that would work, since im gonna allow ppl to submit egrep/grep strings to run on the data [20:03] doing that wouldnt screw that up would it? [20:03] http://wik-i-pedia.com/gitdigger [20:04] 16x 4TB drives should give me more space then ill ever need [20:04] or at least a mixture of 4TB and 2TB drives [20:04] WiK: I thought you were using git --mirror which stored just the git objects that you can't grep anyway [20:04] are you going to git-grep? [20:04] it would take quite a whole to grep everything [20:04] im just doing a git clone [20:04] and it does take quite awhile [20:05] unless you multi-thread your grep [20:05] building a useful code search is much harder than storing as many repos as possible [20:05] github uses a large ElasticSearch cluster [20:07] ya, i was gonna give that a shot, but i dont really plan on making this open to the public, so i dont really need 'fast' [20:07] also doesn't github already let you search all the github repos? ;) [20:08] well, the ones that haven't been deleted [20:08] no, not with security related searches [20:08] ah [20:10] http://swtch.com/~rsc/regexp/regexp4.html https://code.google.com/p/codesearch/ is basically what Google Code Search did [20:11] you can build a trigram index for all the source files you have [20:13] very interesting reading [20:13] thanks [20:29] now, to figure out how to make this index and keep it updated [20:30] ahh i see, codesearch does that for you [21:45] WiK: What NAS did ya' get? [21:45] ElasticSearch is pretty nifty by the way [22:08] one problem down WiK :) [22:17] ersi: you like it better than Solr? [22:17] they're currently being considered for Wikimedia projects https://www.mediawiki.org/wiki/Requests_for_comment/CirrusSearch