#archiveteam 2013-06-28,Fri

↑back Search

Time Nickname Message
00:01 πŸ”— graysparr If anyone wants to make $50 bucks via paypal, i'm just looking for the files from one author from the fanfiction.net archive. If anyone can get me them in like html or text format (because i'm going to scream if I see another warc) then i'll immediatly pay via paypal.
00:10 πŸ”— ivan` graysparr: try: zcat megawarcfile | grep -C 40000 authorname > terrible-dump
00:11 πŸ”— ivan` if your corruption is after the author's files you win
00:13 πŸ”— graysparr The only problem with that solution is that it appears to involve linux, which I don't have and have never used. Yes, i'm a stupid luser.
00:18 πŸ”— ivan` graysparr: cygwin can do that
00:19 πŸ”— ivan` note that you will get ugly output that will have text but possibly other stuff and maybe HTTP chunks
00:20 πŸ”— ivan` the thing you're looking for isn't in the wayback machine, right?
00:21 πŸ”— graysparr Right. All fanfiction.net sites don't show due to their robots.txt
00:21 πŸ”— * graysparr installing cygwin
00:21 πŸ”— ivan` I see this http://web.archive.org/liveweb/http://www.fanfiction.net/s/6123894/2/Things-Change
00:22 πŸ”— ivan` oh liveweb duh
00:22 πŸ”— * ivan` checks brain
00:22 πŸ”— graysparr don't worry, i'm still quite sure you have at least 75% more brain than me.
00:30 πŸ”— graysparr ok, i'm trying that command in cygwin, and getting (specifically $ zcat fanfiction.tar.megawarc.warc | grep -C 40000 Altezio > terrible-dump) and getting "gzip: fanfiction.tar.megawarc.warc.gz: No such file or directory". The megawarc file is named fanfiction.tar.megawarc.warc.gz and is located in c:\python27 currently.
00:30 πŸ”— graysparr what brain dead mistake have I made?
00:34 πŸ”— ivan` graysparr: you need to do this first: cd /cygdrive/c/python27
00:35 πŸ”— ivan` highlight my nick so I notice
00:36 πŸ”— ivan` is it a warc and not a .warc.gz?
00:36 πŸ”— graysparr nope, it's a warc.qz. sorry, haven't used IRC in about five years.
00:36 πŸ”— ivan` ok
00:37 πŸ”— ivan` in that case give zcat the right filename
00:37 πŸ”— ivan` after cd'ing
00:37 πŸ”— graysparr well I changed the directory, hurray, and ran that command, and no error message (big hurray!) but nothing seems to be happening..
00:37 πŸ”— ivan` it'll take a while
00:38 πŸ”— ivan` output will be in terrible-dump
00:38 πŸ”— graysparr how will I know when it's done, and where is 'terrible-dump' going to be located? the same directory?
00:38 πŸ”— ivan` yep
00:38 πŸ”— ivan` it will be done when you see a command prompt
00:39 πŸ”— graysparr which is the $, right?
00:39 πŸ”— ivan` yes
00:40 πŸ”— graysparr awesome! You better have paypal 'cause if this works i'll owe you!!
00:40 πŸ”— ivan` I don't know if you'll get any usable results
00:40 πŸ”— ivan` if there's too much crap surrounding the author's texts, you can reduce the -C (it is the number of lines of context)
00:41 πŸ”— graysparr ahhh, cool!
00:42 πŸ”— * graysparr crosses fingers and waits
01:08 πŸ”— Muad-Dib you guys aware that webcitation might stop accepting new submissions?
01:08 πŸ”— Muad-Dib http://www.webcitation.org/
01:12 πŸ”— Muad-Dib aaand I forgot to check the wiki
01:12 πŸ”— Muad-Dib sorry bout that
01:15 πŸ”— ivan` I was not aware
01:26 πŸ”— db48x hmm
01:26 πŸ”— db48x does anyone know how I can get a list of all items in a collection?
01:39 πŸ”— db48x ah, you can change the results per page on an advanced query
01:40 πŸ”— graysparr Hey ivan`, what type of program can i use to open that terrible-dump file?
01:41 πŸ”— ivan` graysparr: how big is it?
01:42 πŸ”— graysparr 252 MB
01:43 πŸ”— ivan` cat terrible-dump | grep -C 3000 Altezio > terrible-dump-C3000
01:43 πŸ”— ivan` that may make it manageable enough for many normal text editors
01:43 πŸ”— ivan` if not, `less` in cygwin can be used to look at things of any size
01:44 πŸ”— ivan` that is, less terrible-dump
01:44 πŸ”— * graysparr gives is a whirl
01:45 πŸ”— ivan` / and ? can be used to search forwards and backwards, g for top of file, shift-g for end, q to quit, h for help
01:45 πŸ”— graysparr ok, thank!
01:58 πŸ”— graysparr Well ivan` my very patient helper, that gave me a 1 KB sized file that only contains "Binary file (standard input) matches"
02:03 πŸ”— db48x grep has an option that tells it to treat binary files as text
02:03 πŸ”— db48x (or rather to ignore that they are binary)
02:05 πŸ”— db48x use grep --help to find it; I don't recall what it is off-hand
02:05 πŸ”— graysparr oh, ahh, sorry. total linux newb here. thank you!
02:05 πŸ”— db48x you're welcome :)
03:14 πŸ”— ivan` 3 days left for greader, if you have interests in particular sites I can handle as many URLs as you throw at me
03:15 πŸ”— ivan` the ones with a feedfn= here are being handled so far https://github.com/ludios/greader-item-maker/blob/master/url_filter.py
04:53 πŸ”— wp494 FYI the english language wikipedia is currently dead for unknown reasons
04:53 πŸ”— wp494 here's where wikimedia's dumps come in handy
04:58 πŸ”— wp494 at least for chrome users from the looks of it
05:15 πŸ”— winr4r it's back
05:15 πŸ”— wp494 ^
14:27 πŸ”— ivan` posted greader to HN, see https://news.ycombinator.com/newest
14:33 πŸ”— winr4r 7+ TB?!
14:33 πŸ”— winr4r holy shit, you've been busy
14:33 πŸ”— winr4r that is a lot of RSS
14:37 πŸ”— ivan` yes
14:53 πŸ”— pallih Hola! I'm getting ' No item received. Retrying after 30 secondsҀ¦' for the formspring tracker
14:54 πŸ”— pallih I'm guessing it has to do with a limit of some kind that I've reached
14:54 πŸ”— pallih somebody a few days ago reset that when the same problem arose
15:01 πŸ”— ersi pallih: Join up the project channel at #firespring :-)
15:18 πŸ”— ivan` that last comment in https://news.ycombinator.com/item?id=5958119 needs a good answer
15:26 πŸ”— ersi I think `eli` answered it adequatly
15:26 πŸ”— ivan` yeah, just saw
15:30 πŸ”— ivan` http://neocities.org/blog/making-the-web-fun-again might as well start archiving neocities already ;)
15:32 πŸ”— ersi Indeed :D
16:20 πŸ”— winr4r i think it's a lovely idea
16:53 πŸ”— Aranje that's archiveteam's high-five
17:51 πŸ”— edsu Baljem: did you find the button?
18:15 πŸ”— ats ivan`: the way to do it would be to have it hosted on archive.org immediately, I guess...
18:26 πŸ”— Baljem edsu: uhm. button?
18:35 πŸ”— ivan` https://twitter.com/floatingatoll I don't know where this guy should submit his reader_archive, we grab everything based on the feed URLs
19:09 πŸ”— SketchCow HI GANG
19:10 πŸ”— SketchCow Well, don't we have a bunch of busy.
19:10 πŸ”— SketchCow GLADOS
19:11 πŸ”— * SmileyG wibbles
19:48 πŸ”— WiK ping omf_
19:49 πŸ”— WiK I've got ctp results. I'll be speaking at bsideslv,defcon, as well as derbycon on my gitdigger project
19:50 πŸ”— WiK which is pretty scary and unexpected (well i kinda thought it would get into bsideslv)
19:50 πŸ”— SmileyG Ooo SketchCow user:djsmiley2k@gmail.com (me) has 9 of the last 11 posterous blogs
19:51 πŸ”— SmileyG I'm waiting on Balrog for the last two.
19:56 πŸ”— winr4r SketchCow: *salutes*
19:57 πŸ”— winr4r can i recommend a pre-emptive backup for someone?
19:58 πŸ”— winr4r *.port5.com, it was a free web host in the early 2000s that my brother just reminded me of, and it still exists
19:58 πŸ”— SmileyG winr4r: always.
21:20 πŸ”— balrog SmileyG: I uploaded them to that dir!
21:29 πŸ”— WiK omf_: I stopped my downloading at 10TB for the gitdigger project
21:29 πŸ”— WiK im outta HD space and figured it was a nice round number for my talk
21:31 πŸ”— omf_ 10tb is a statistically significant sampling size. I think it will be all good
21:33 πŸ”— WiK i think bsideslv has the abstract posted
21:34 πŸ”— WiK http://www.bsideslv.com/speakers/breaking-ground/
22:36 πŸ”— ivan` http://yahoo.tumblr.com/post/54125001066/keeping-our-focus-on-whats-next
22:39 πŸ”— Baljem whoa, Alta Vista?
22:46 πŸ”— SketchCow SO WOW YAHOO IS DESTROYING A LOT OF SHIT
22:46 πŸ”— Aranje woop, more culture getting thrown into the pit
22:46 πŸ”— SketchCow Closest I could find, across these, is that besides Altavista, only Citizen Sports had major profiles, but they were locked in Apps.
22:49 πŸ”— SketchCow What a day.
22:49 πŸ”— SketchCow What a week!
22:49 πŸ”— SketchCow OK, GLaDOS and Smiley
22:56 πŸ”— godane SketchCow: i uploaded 25 episodes of HD Nation yesterday
22:57 πŸ”— shabble is there an (easy) way to run the unattended/pick-most-important thing from the warrior VM, without the VM?
22:57 πŸ”— shabble bloody ovh custom kernels
22:59 πŸ”— shabble (I've got all the deps & crap from the posterous-grab already installed, just wondering about the project-selection stuff)
23:02 πŸ”— SketchCow Thank you, godane.
23:03 πŸ”— godane HD Nation has a better chance of getting fully up then hak5 and tekzilla right at the moment
23:03 πŸ”— godane i only say that cause most hd episodes are 200mb or under from 2011 on
23:04 πŸ”— godane episode 71 was the last full episodes
23:04 πŸ”— godane *episode
23:21 πŸ”— winr4r so so okay, what is yahoo destroying today
23:21 πŸ”— godane looks like a little bit of everything
23:22 πŸ”— winr4r did i miss some news?
23:23 πŸ”— Deewiant winr4r: http://yahoo.tumblr.com/post/54125001066/keeping-our-focus-on-whats-next
23:25 πŸ”— winr4r AltaVista (July 8, 2013)
23:25 πŸ”— winr4r Please visit Yahoo! Search for all of your searching needs.
23:25 πŸ”— winr4r i'm sad, and not one bit justifiably sad, about that
23:28 πŸ”— winr4r i'm sure the one process that woke up once a day to serve an altavista request is looking up at the moon right now
23:29 πŸ”— winr4r and playing this song http://www.youtube.com/watch?v=o9Pu92St5O0
23:30 πŸ”— winr4r so looking at that, yeah i agree, fuck yahoo
23:31 πŸ”— SketchCow I used ALtavista when it came out.
23:31 πŸ”— SketchCow First month.
23:32 πŸ”— winr4r at that time in like 1998 i was like DudE How U find STuf?!11
23:33 πŸ”— winr4r ("that time" being "when i found the internet")
23:35 πŸ”— winr4r and someone was like ALTAVISTA "how U speL that bRo"
23:49 πŸ”— ersi Astalavista, altavista.
23:49 πŸ”— ersi I think I've used Astalavista more than altavista in mantime *whistle*

irclogger-viewer