[00:01] If anyone wants to make $50 bucks via paypal, i'm just looking for the files from one author from the fanfiction.net archive. If anyone can get me them in like html or text format (because i'm going to scream if I see another warc) then i'll immediatly pay via paypal. [00:10] graysparr: try: zcat megawarcfile | grep -C 40000 authorname > terrible-dump [00:11] if your corruption is after the author's files you win [00:13] The only problem with that solution is that it appears to involve linux, which I don't have and have never used. Yes, i'm a stupid luser. [00:18] graysparr: cygwin can do that [00:19] note that you will get ugly output that will have text but possibly other stuff and maybe HTTP chunks [00:20] the thing you're looking for isn't in the wayback machine, right? [00:21] Right. All fanfiction.net sites don't show due to their robots.txt [00:21] * graysparr installing cygwin [00:21] I see this http://web.archive.org/liveweb/http://www.fanfiction.net/s/6123894/2/Things-Change [00:22] oh liveweb duh [00:22] * ivan` checks brain [00:22] don't worry, i'm still quite sure you have at least 75% more brain than me. [00:30] ok, i'm trying that command in cygwin, and getting (specifically $ zcat fanfiction.tar.megawarc.warc | grep -C 40000 Altezio > terrible-dump) and getting "gzip: fanfiction.tar.megawarc.warc.gz: No such file or directory". The megawarc file is named fanfiction.tar.megawarc.warc.gz and is located in c:\python27 currently. [00:30] what brain dead mistake have I made? [00:34] graysparr: you need to do this first: cd /cygdrive/c/python27 [00:35] highlight my nick so I notice [00:36] is it a warc and not a .warc.gz? [00:36] nope, it's a warc.qz. sorry, haven't used IRC in about five years. [00:36] ok [00:37] in that case give zcat the right filename [00:37] after cd'ing [00:37] well I changed the directory, hurray, and ran that command, and no error message (big hurray!) but nothing seems to be happening.. [00:37] it'll take a while [00:38] output will be in terrible-dump [00:38] how will I know when it's done, and where is 'terrible-dump' going to be located? the same directory? [00:38] yep [00:38] it will be done when you see a command prompt [00:39] which is the $, right? [00:39] yes [00:40] awesome! You better have paypal 'cause if this works i'll owe you!! [00:40] I don't know if you'll get any usable results [00:40] if there's too much crap surrounding the author's texts, you can reduce the -C (it is the number of lines of context) [00:41] ahhh, cool! [00:42] * graysparr crosses fingers and waits [01:08] you guys aware that webcitation might stop accepting new submissions? [01:08] http://www.webcitation.org/ [01:12] aaand I forgot to check the wiki [01:12] sorry bout that [01:15] I was not aware [01:26] hmm [01:26] does anyone know how I can get a list of all items in a collection? [01:39] ah, you can change the results per page on an advanced query [01:40] Hey ivan`, what type of program can i use to open that terrible-dump file? [01:41] graysparr: how big is it? [01:42] 252 MB [01:43] cat terrible-dump | grep -C 3000 Altezio > terrible-dump-C3000 [01:43] that may make it manageable enough for many normal text editors [01:43] if not, `less` in cygwin can be used to look at things of any size [01:44] that is, less terrible-dump [01:44] * graysparr gives is a whirl [01:45] / and ? can be used to search forwards and backwards, g for top of file, shift-g for end, q to quit, h for help [01:45] ok, thank! [01:58] Well ivan` my very patient helper, that gave me a 1 KB sized file that only contains "Binary file (standard input) matches" [02:03] grep has an option that tells it to treat binary files as text [02:03] (or rather to ignore that they are binary) [02:05] use grep --help to find it; I don't recall what it is off-hand [02:05] oh, ahh, sorry. total linux newb here. thank you! [02:05] you're welcome :) [03:14] 3 days left for greader, if you have interests in particular sites I can handle as many URLs as you throw at me [03:15] the ones with a feedfn= here are being handled so far https://github.com/ludios/greader-item-maker/blob/master/url_filter.py [04:53] FYI the english language wikipedia is currently dead for unknown reasons [04:53] here's where wikimedia's dumps come in handy [04:58] at least for chrome users from the looks of it [05:15] it's back [05:15] ^ [14:27] posted greader to HN, see https://news.ycombinator.com/newest [14:33] 7+ TB?! [14:33] holy shit, you've been busy [14:33] that is a lot of RSS [14:37] yes [14:53] Hola! I'm getting ' No item received. Retrying after 30 secondsâ¦' for the formspring tracker [14:54] I'm guessing it has to do with a limit of some kind that I've reached [14:54] somebody a few days ago reset that when the same problem arose [15:01] pallih: Join up the project channel at #firespring :-) [15:18] that last comment in https://news.ycombinator.com/item?id=5958119 needs a good answer [15:26] I think `eli` answered it adequatly [15:26] yeah, just saw [15:30] http://neocities.org/blog/making-the-web-fun-again might as well start archiving neocities already ;) [15:32] Indeed :D [16:20] i think it's a lovely idea [16:53] that's archiveteam's high-five [17:51] Baljem: did you find the button? [18:15] ivan`: the way to do it would be to have it hosted on archive.org immediately, I guess... [18:26] edsu: uhm. button? [18:35] https://twitter.com/floatingatoll I don't know where this guy should submit his reader_archive, we grab everything based on the feed URLs [19:09] HI GANG [19:10] Well, don't we have a bunch of busy. [19:10] GLADOS [19:11] * SmileyG wibbles [19:48] ping omf_ [19:49] I've got ctp results. I'll be speaking at bsideslv,defcon, as well as derbycon on my gitdigger project [19:50] which is pretty scary and unexpected (well i kinda thought it would get into bsideslv) [19:50] Ooo SketchCow user:djsmiley2k@gmail.com (me) has 9 of the last 11 posterous blogs [19:51] I'm waiting on Balrog for the last two. [19:56] SketchCow: *salutes* [19:57] can i recommend a pre-emptive backup for someone? [19:58] *.port5.com, it was a free web host in the early 2000s that my brother just reminded me of, and it still exists [19:58] winr4r: always. [21:20] SmileyG: I uploaded them to that dir! [21:29] omf_: I stopped my downloading at 10TB for the gitdigger project [21:29] im outta HD space and figured it was a nice round number for my talk [21:31] 10tb is a statistically significant sampling size. I think it will be all good [21:33] i think bsideslv has the abstract posted [21:34] http://www.bsideslv.com/speakers/breaking-ground/ [22:36] http://yahoo.tumblr.com/post/54125001066/keeping-our-focus-on-whats-next [22:39] whoa, Alta Vista? [22:46] SO WOW YAHOO IS DESTROYING A LOT OF SHIT [22:46] woop, more culture getting thrown into the pit [22:46] Closest I could find, across these, is that besides Altavista, only Citizen Sports had major profiles, but they were locked in Apps. [22:49] What a day. [22:49] What a week! [22:49] OK, GLaDOS and Smiley [22:56] SketchCow: i uploaded 25 episodes of HD Nation yesterday [22:57] is there an (easy) way to run the unattended/pick-most-important thing from the warrior VM, without the VM? [22:57] bloody ovh custom kernels [22:59] (I've got all the deps & crap from the posterous-grab already installed, just wondering about the project-selection stuff) [23:02] Thank you, godane. [23:03] HD Nation has a better chance of getting fully up then hak5 and tekzilla right at the moment [23:03] i only say that cause most hd episodes are 200mb or under from 2011 on [23:04] episode 71 was the last full episodes [23:04] *episode [23:21] so so okay, what is yahoo destroying today [23:21] looks like a little bit of everything [23:22] did i miss some news? [23:23] winr4r: http://yahoo.tumblr.com/post/54125001066/keeping-our-focus-on-whats-next [23:25] AltaVista (July 8, 2013) [23:25] Please visit Yahoo! Search for all of your searching needs. [23:25] i'm sad, and not one bit justifiably sad, about that [23:28] i'm sure the one process that woke up once a day to serve an altavista request is looking up at the moon right now [23:29] and playing this song http://www.youtube.com/watch?v=o9Pu92St5O0 [23:30] so looking at that, yeah i agree, fuck yahoo [23:31] I used ALtavista when it came out. [23:31] First month. [23:32] at that time in like 1998 i was like DudE How U find STuf?!11 [23:33] ("that time" being "when i found the internet") [23:35] and someone was like ALTAVISTA "how U speL that bRo" [23:49] Astalavista, altavista. [23:49] I think I've used Astalavista more than altavista in mantime *whistle*