[07:38] godane: images thread is HUGE> [07:38] did you get that whole thread? [07:41] i'm getting it [07:42] i had to redo it cause it was over 4gb [07:46] o_O [07:46] I think it'll be suitably massive, I can grab a copy if you got the wget code? [07:47] i think i can do it [07:48] its past 18k urls and its not ever 1gb yet [07:48] i just had to set it up with --warc-file-size=1G a while back [09:10] so i'm pushing more tech news today episodes [10:16] vegetables the breakfast of companions [10:19] i am not sure why, but i am crawling steam user profiles [10:19] and steam does not block [10:19] find anything interesting [10:19] yes, usernames :P [10:19] nah, i thought there would be tools to "easily" create graphs but hey, i stumbled into high processing territory instead [10:20] but it is fun [10:22] gnuploit! [10:22] gnuplot! [10:22] funny :P [10:22] ;) [10:23] try d3 it is pretty user friendly [10:23] i meant graph as in "user connections" [10:23] should ahve said that [10:24] d3 can do direct graphs which is what you are looking for http://bl.ocks.org/mbostock/4062045 [10:25] yeah but i have a "bit" more nodes and edges [10:25] 300k+ so far [10:25] gephi is working but slow [10:25] I doubt that. I loaded 15 million into d3 for a freebase talk [10:25] and if you are in the billions you need a proper graph database like titan [10:26] whoa [10:26] all a "graph" is, is a directed data structure [10:26] yeah but the sorting for visual display is hard [10:26] yep [10:26] cant believe d3 does 15 million at once [10:26] there are no good solutions as of yet. [10:26] It is limited by memory [10:27] decades got spent on relational databases and little on graph databases [10:27] now the best solutions are still pathetic [10:28] google, facebook, yahoo, ms, twitter all use their own rolled closed solution [10:32] and yet those same companies minus MS use open source relational databases all the time [10:34] I feel your pain Schbirid [10:37] I liken graph databases to lisp, super powerful and not fully understood by most of the programming community [10:41] so i got 2012 of glenn beck show uploaded now [10:41] also i'm close to getting 2011-09 of tech news today uploaded [10:48] now this is very odd: http://web.archive.org/web/*/http://torrentbytes.net [10:49] looks like IA has been trying to mirror torrentbytes.net alot [10:50] looks like my stuff is in wayback machine now [10:54] looks like the forum_index.php i didn't grab but you get all the forum_viewforum.php here: http://web.archive.org/web/*/http://www.torrentbytes.net/forum_viewforum.php* [11:22] the wget log of katproxy.com alone is over 60mb [12:37] SketchCow: defcon docu on archive.org yet? [12:44] damnit i'm in left korea again ¬_¬ [12:48] i am not sure how i feel about non-public forums ending up in the wayback machine [12:51] me either, even though I'm of course thinking of the "expect it to become public, if you share it" mindset as well [12:52] well, things can be shared with a selected group of people, in this case the members of a site [12:52] i was not thinking it was gong to be in wayback this quick [12:53] of course, but any of the parties that has access can make it public or share it along.. so I guess one should consider it public from the get go [12:53] then again, like I said - I'm a bit skeptical as well :) [12:53] that's a post-privacy standpoint i vehemently disagree with [12:53] it's like saying that any kind of communication should be considered public because the other person can make it public [12:54] i only did a panic mirror cause the site maybe going download [12:54] just because the technology makes it easy does not make it ok [12:54] godane: not bashing you, just thinking out loud [12:55] hell, any archiving we do is complicated but to me making previously non-public stuff public is more complicated than preserving public content data [12:55] also this is funny: http://web.archive.org/web/20130724114835/http://www.torrentbytes.net/robots.txt [12:55] Schbirid: I completely agree with you there, though it is always a risk that either the parties that aren't you or aren't the intended audience reveals the data [12:56] I'm not saying that everyone should consider all communications public as default. I just thought out loud about the inherent 'risk' [12:56] the robots disallows everything [12:56] aye :) [12:56] just for the record, my manlihood is huge [13:03] also know that torrentbytes.net was getting hit like 5 times a day for some reason by IA [13:15] uploaded: https://archive.org/details/katproxy.com-community-20130805 [14:14] ok, gephi falls apart with 3 million edges already for me [14:14] also ti updates the graph display when i do stuff in the fucking ui [14:15] which takes many seconds [14:20] it's like "private" irc [14:21] exactly 1Gb godane for katproxy.com?! [14:21] thats... suspcious. [14:21] he is cutting at 1gb warcs [14:22] easier for him to upload [14:22] So the rest aren't uploaded yet? [14:22] ok [14:23] how big did it end up? [14:24] ^ lol [14:24] omf_: i'm confused, something funny? [14:25] my brain is dry rotted by the internet and porn [14:25] :/ [14:26] i need to know if we have it all [14:26] if not, I need to fix that. [14:32] no its the html of katproxy.com/community/ is about 1gb [14:37] the images is about 10gb i think [14:38] alot of the urls are from yuq.me [14:40] k [18:23] looks like there is more g4 archiving fans: http://g4tvarchive.tumblr.com/ [18:59] anyways uploading the katproxy.com community images [19:09] good good [19:43] Whew [19:45] fun times? [19:46] * SmileyG wants to see the docu. :/ [19:46] Or are you selling DVD SketchCow ? [19:51] git tip of the day: git ls-files --other --exclude-standard [19:52] list only the files not tracked by git [20:50] I am not [20:50] hackerstickers.com [20:51] but it's on youtube, piratebay, etc [20:51] * done 1365798.3 MB Rate: 912.9 / 0.0 KB Uploaded: 2115262.0 MB [20:51] * MESS 0.149 Software List CHDs [21:16] Like a boss [21:30] that site just brtoke my eyes D: [22:31] SmileyG, http://imgur.com/gallery/PLiWoj4 [23:46] Anyone got an invite for medium.com