#archiveteam 2012-06-06,Wed

↑back Search

Time Nickname Message
00:02 🔗 dashcloud so any more news on the possible archiveteam get together?
00:02 🔗 SketchCow I need to discuss it over the next few weeks.
00:02 🔗 SketchCow Assume mid January
00:02 🔗 SketchCow Huge bandwidth visual and audio connections
00:03 🔗 chronomex save up terabytes of shit to shove into IA then, yes?
00:03 🔗 SketchCow That works too
00:04 🔗 chronomex or I guess I could walk to the nearby university and use one of the unmetered 100mbit ports
02:06 🔗 SketchCow Warning, bandwidth of archive.org about to disappear as of 10pm PST
02:06 🔗 SketchCow I am thinking we might want to slow down the warriors
02:10 🔗 chronomex that's about 3h from now for all not on pacific time
02:12 🔗 dashcloud is archive.org moving to even more ridiculous levels of bandwidth then?
02:17 🔗 SketchCow No, this is a city thing
02:17 🔗 SketchCow They already blew the bandwidth up
02:34 🔗 Aranje SketchCow: Is it part of ipv6 launch or some other thing?
02:34 🔗 SketchCow SF outage
02:35 🔗 SketchCow The neighborhood's getting fucked
02:35 🔗 aggro Sounds like it's time for some devotion to duty: http://xkcd.com/705/
02:42 🔗 Famicoman anyone suggest a batch youtube channel that I can just put a profile into and hit go?
02:45 🔗 SketchCow They're working hard to try and keep it up, but they don't want to overpromise.
02:45 🔗 SketchCow It's likely to slow down, they ran a satellite link from the top of the building
02:46 🔗 SketchCow And a microwave link to another site
03:00 🔗 jonas__ the me.com crawlers are running out of usernames
03:01 🔗 jonas__ and i would like to add some
03:03 🔗 chronomex http://memac.heroku.com/rescue-me
03:03 🔗 Aranje So does that mean we should all touch stop tonight? Or are we just going to forge on and hope it works?
03:04 🔗 chronomex Aranje: best to stop, I suppose
03:04 🔗 Aranje hmm
03:04 🔗 Aranje You can have the trackers (herokus) stop handing out usernames
03:04 🔗 Moonlit Famicoman - to download all the episodes?
03:04 🔗 Moonlit er... videos?
03:05 🔗 Famicoman nm, found youtube-dl
03:05 🔗 Famicoman cool little cli python tool
03:05 🔗 Aranje youtube-dl is godlike
03:06 🔗 Moonlit handy
03:06 🔗 Moonlit I was gonna say, I had a FF plugin for that, but I haven't used it for a while
03:06 🔗 Moonlit not since I had to borrow all those Robot Wars episodes, anyway...
03:07 🔗 Famicoman *borrow*
03:07 🔗 chronomex yes, borrowing.
03:07 🔗 chronomex like in a library
03:07 🔗 Moonlit yes, absolutely
03:07 🔗 chronomex my favorite part is how the sarcasm patrol jumped right on that
03:07 🔗 Moonlit lol
03:07 🔗 Famicoman hey, he has every intention of returning them to... everybody
03:07 🔗 Moonlit :D
03:08 🔗 Famicoman I feel like you hand fed those to me like 3 years ago
03:08 🔗 Moonlit pretty sure I had some less consistent rips before, which I might've given you
03:08 🔗 Moonlit before I found that channel
03:09 🔗 Famicoman I feel like I found them all on thebox
03:09 🔗 Famicoman or most of them at least
03:09 🔗 Moonlit yeah, their collection was incomplete though
03:09 🔗 Moonlit missing at least a couple of series worth
03:10 🔗 Moonlit and the early series had gaps
03:11 🔗 Moonlit but either way, yay for the internet
03:11 🔗 Moonlit I wish TV archives weren't so tightly locked away in vaults
03:12 🔗 Moonlit I mean I get a lot of the reasons why on a legal level, but man, it seems a shame to let all that content rot
03:12 🔗 chronomex you should start recording then
03:12 🔗 Moonlit I'd still love to see a crowdsourced historical TV guide, where everyone would contribute media to fill up TV schedules from a day in history
03:12 🔗 chronomex :P
03:13 🔗 Moonlit but I know it'd get shut down quicker than you can say "copyright"
03:13 🔗 Moonlit maybe IA could get away with doing it, I dunno
03:13 🔗 Moonlit but it'd definitely be awesome
03:13 🔗 chronomex um, they already are
03:13 🔗 chronomex just isn't very public
03:14 🔗 jonas__ .... http://memac.heroku.com/rescue-me is complicated because it just allows ~50 at a time
03:14 🔗 Moonlit well then, I've learnt something ne
03:14 🔗 Moonlit new, even
03:14 🔗 chronomex jonas__: if you have a pile, you should talk to alard
03:15 🔗 SketchCow Moonlit: Archive.org will have a TV archive soon
03:15 🔗 Moonlit shweet
03:15 🔗 SketchCow Goes back 15 years
03:15 🔗 Moonlit kickass
03:15 🔗 Moonlit international, or just US?
03:16 🔗 SketchCow Just US
03:16 🔗 Moonlit aw
03:16 🔗 SketchCow But via satellite
03:16 🔗 Moonlit still, better than nothing
03:16 🔗 SketchCow Right
03:16 🔗 Aranje You could whip up a script to submit 50 at a time. POST is a wonderful interface.
03:16 🔗 SketchCow http://archive.org/details/tv
03:16 🔗 SketchCow Try it out a little
03:16 🔗 Moonlit oh, hello
03:16 🔗 Moonlit cheers
03:17 🔗 SketchCow http://archive.org/details/tv?q=vibrator
03:17 🔗 Moonlit hah
03:18 🔗 Moonlit that seems quite specific
03:19 🔗 Moonlit but I'm not gonna argue with anything which archives telly
03:19 🔗 tev It's conforting to know I'm not the only one who sometimes felt that old TV listings would be useful
03:19 🔗 Moonlit well, I'd love to have a site where you could, for example, pick your birthday and watch that entire day's TV
03:20 🔗 Moonlit impractical, perhaps, but for recent years it might not be such a crazy idea
03:20 🔗 chronomex that may be exactly what /details/tv is for
03:20 🔗 chronomex who knows
03:20 🔗 jonas___ alard is sleeping i guess,
03:20 🔗 chronomex jonas___: that would not be a surprise, given his timezone
03:20 🔗 jonas___ no one else involved in the tracker?
03:20 🔗 Moonlit timezones? pah!
03:21 🔗 * Moonlit looks down at the clock
03:21 🔗 Moonlit 4:22am
03:21 🔗 Moonlit >_>
03:21 🔗 chronomex I expect he'll be online in 4 to 6 hours
03:21 🔗 chronomex Moonlit: you're approaching that from the good side, right?
03:21 🔗 Moonlit the night side, of course
03:22 🔗 chronomex yes
03:29 🔗 chronomex are other people able to access archive.org as well?
03:29 🔗 chronomex it sure doesn't seem down to me
03:29 🔗 chronomex or is it not that time yet
03:29 🔗 Moonlit not time yet
03:29 🔗 chronomex d'oh, two more hours
03:29 🔗 Moonlit 1h30 left I believe
03:29 🔗 chronomex right
05:56 🔗 * closure wonders if it'll come back with ipv6 for ipv6 day, that'd be cool
05:56 🔗 chronomex indeed
07:32 🔗 alard Hi. I read something about pausing trackers/warriors, is that still necessary or have I slept through the outage?
07:42 🔗 SketchCow The outage came and went.
07:46 🔗 chronomex whoa, there's about ten zillion queued derives
07:49 🔗 chronomex wonder what's up.
11:55 🔗 underscor chronomex: the outage affected ia6* datacenter, which is where catalogd's hosted; because of that, a.o "froze" the cluster for the duration, and efficiency still isn't back up to normal
11:57 🔗 jjonas ....#ovh giveaway - did you guys also after typing the code from twitter DM and trying to log in get this all the time? :"pease login: An error occurred, saving could not be completed, please check your information."
11:57 🔗 underscor nope
11:57 🔗 underscor I just entered my nichandle and it said "Thanks for participating" or something
11:57 🔗 underscor then they dm'd me in frech
11:57 🔗 underscor french*
12:37 🔗 oli https://fbcdn-sphotos-a.akamaihd.net/hphotos-ak-ash4/403466_10150939185809588_574221299_n.jpg
12:37 🔗 oli haha
12:40 🔗 SmileyG <3 grimm brothers
12:41 🔗 SmileyG they wrote some funny stuff :D
12:52 🔗 SmileyG hmmmm
12:52 🔗 SmileyG just started up warrior, selected picplz - sitting on 0% [waiting for headers] when trying to update liblua5.1-0
12:53 🔗 SmileyG mobileme seems fine tho. carrying on with tht for now
12:57 🔗 alard Experiment: http://warctozip.herokuapp.com/
13:01 🔗 SmileyG alard: and that reutrns the thing zipped for you?
13:01 🔗 SmileyG returns*
13:04 🔗 alard Yes.
13:04 🔗 underscor that's awesome
13:05 🔗 * underscor feeds it one of the youporn warcs
13:06 🔗 void_ alard: https://gist.github.com/8a3c5fb1f64d8bd18a4e
13:07 🔗 void_ with Python 2.6.1
13:07 🔗 underscor void_: You need to pass a zip filename
13:07 🔗 underscor after the warc filename
13:07 🔗 underscor example.zip or whatever you want it called
13:07 🔗 void_ same
13:08 🔗 underscor oh, weird
13:08 🔗 SmileyG interestingly it seems Virtualbox may soon have some network speed contorl stuff.
13:08 🔗 void_ https://gist.github.com/f5086e6188f9f6ea3e2a
13:08 🔗 underscor SmileyG: it does, iirc
13:08 🔗 SmileyG underscor: already?
13:09 🔗 SmileyG I see stuff for a linux host, I'm a linux host but I've not yet attempted to test.
13:09 🔗 SmileyG it said for windows users "soon" from what I've read.
13:09 🔗 underscor oh, I'm linux
13:09 🔗 underscor sorry
13:10 🔗 SmileyG me too :P
13:10 🔗 SmileyG Have you tried it?
13:11 🔗 underscor http://www.slashgear.com/6-5m-linkedin-passwords-reportedly-leak-hackers-crowdsourcing-encryption-crack-06232454/
13:11 🔗 underscor no, because two clients doesn't saturate my pipe
13:11 🔗 underscor ;P
13:12 🔗 SmileyG mine nither but I'm at work and if I did then there would be stabbing.
13:12 🔗 underscor hahaha
13:13 🔗 SmileyG as in theres very little from stopping me other than me.,
13:21 🔗 void_ http://bugs.python.org/issue5511
13:22 🔗 void_ maybe is py2.6 the problem
13:27 🔗 floppywar Hello archiveteam. I'm interested in preserving certain aspects (in particular the forums) of some private torrent trackers. These community forums oftentimes have high-quality discussion of various topics and provide an insight into the world of filesharing, but they are also at risk of sudden, unannounced seizure. I'm a noob, currently fiddling with wget without much success. Is anybody interested and/or willing to help me out?
14:18 🔗 S[h]O[r]T floppywar, supposably if you use wget with --mirror and the --load-cookies <file> to load your cookie from when your logged in that should work
14:19 🔗 floppywar try that, didn't work. maybe the robots.txt is preventing me from accessing /forums.php
14:21 🔗 floppywar only /login.php was downloaded and that files says that neither javascript nor cookies are enabled (a necessity for logging into this website).
14:21 🔗 floppywar I've tried HTTrack too, without success.
14:21 🔗 floppywar <S[h]O[r]T>
14:28 🔗 SmileyG javascript... :S
14:28 🔗 SmileyG S[h]O[r]T: can you get it to wget any other page than the login page?
14:33 🔗 S[h]O[r]T why are u asking me? :P floppywar is the one trying to wget hehe
14:33 🔗 floppywar SmileyG: what do you mean?
14:33 🔗 floppywar ah I see
14:33 🔗 floppywar index.php too I think, but the content is the same as login.php
14:34 🔗 floppywar SmileyG : so, no, nothing besides the login page.
14:37 🔗 floppywar SmileyG: Correction, Javascript is not necessary. Cookies are though, obviously.
14:37 🔗 SmileyG hmmm
14:37 🔗 S[h]O[r]T https://chrome.google.com/webstore/detail/lopabhfecdfhgogdbojmaicoicjekelh
14:38 🔗 S[h]O[r]T that seems like a promising easy way to get your login cookie over to wget
14:38 🔗 SmileyG don't you just login on your local machine
14:38 🔗 SmileyG save hte cookie.txt
14:38 🔗 SmileyG job done.
14:39 🔗 floppywar I do
14:39 🔗 floppywar I'll try exporting the cookie using the add-on you just linked to.
14:40 🔗 SketchCow floppywar: Talk with underscor - he's the guy to help.
14:40 🔗 SketchCow http://www.youtube.com/watch?v=rxjHbe7TrjY
14:42 🔗 floppywar SmileyG: It seems to be working!
14:50 🔗 floppywar SmileyG: .. or so I thought. wget doesn't seem to descend into anything; it just grabs (properly) forums.php but not the actual individual forums and threads.
14:50 🔗 floppywar Individual forums can be reached in this way: /forums.php?action=viewforum&forumid=9
14:53 🔗 S[h]O[r]T i imagine as long as wget is following those links it shouldnt have a problem. i guess i would try to play with the options for host checking
14:54 🔗 floppywar "options for host checking", expand please if you will? </noob>
14:55 🔗 S[h]O[r]T http://www.editcorp.com/Personal/Lars_Appel/wget/wget_4.html
14:56 🔗 S[h]O[r]T i suppose you could try -L and -nh. im not sure if -mirror enables these already i bet it does one of them
14:59 🔗 floppywar "wget -load-cookies cookies.txt -m https://[site]/forums.php" only grabs forums.php
15:03 🔗 floppywar replacing -m with -L makes no difference, replacing -m with -nh returns "illegal option" (already default as hinted at on editcorps.com?)
15:03 🔗 DoubleJ floppywar: Is there a redirct from [site] to www.[site] or vice-versa? The -D option may be useful to you in that case.
15:03 🔗 floppywar as far as I'm aware there isn't
15:06 🔗 DoubleJ Might be worth a try anyway. Your "illegal option" error is because the h needs to be capital: -nH but that's a bad idea anyway since you're one outside link away from downloading the internet.
15:08 🔗 floppywar "wget --load-cookies cookies.txt -m https://[site]/forums.php?action=viewforum&forumid=19" does not differ from trying to grab /forums.php; only the page specified is grabbed, no descend into the actual threads.
15:10 🔗 DoubleJ Dunno then; I usually rely on someone smarter than me to set this stuff up. Like how I forgot just now that in wget you can have multi-letter options with a single dash so -nH didn't mean what I thought it did.
15:10 🔗 DoubleJ I do know from previous conversations in here that downloading forums is far more of a pain than it ought to be.
15:11 🔗 floppywar could it be because the links aren't static?
15:12 🔗 floppywar anyway, replacing -m with -nH does not fix it. "-m -nH" doesn't fix it either.
15:13 🔗 floppywar I've got to go in about 5 minutes. I'll probably be back within a couple of hours.
15:13 🔗 floppywar Thanks for the help guys.
15:13 🔗 aggro wget --no-parent --html-extension --page-requisites --convert-links -e robots=off --exclude-directories=any,directory,you,do,not,want,comma,delimited --reject "*action=print,*any-parameter-you-do-not-want,comma,delimited" -w 5 --random-wait --warc-file=warc-file-name http://thesite.tld
15:13 🔗 aggro without having a login to the site it's difficult to troubleshoot, but I've used the follow before for public grabs of forums. If you've got the cookie working correctly, then this combo might work:
15:15 🔗 floppywar I'll try that aggro, cookie seems to be working correctly.
15:16 🔗 aggro the reject is useful for php-type sites that can have lots of different parameters for a page that are all mostly the same. Without the reject, wget will download lots of different versions of the same page, just with different parameters. You can fiddle with it until you find what url parameters are good to keep and which to ditch.
15:17 🔗 floppywar aggro: will wget ascend into higher directories if I specify https://[site]/forums.php?
15:18 🔗 aggro if it's just somesite.com/forums.php , then that's already at the highest directory.
15:19 🔗 aggro And wget will follow any links within that domain
15:19 🔗 aggro typically though forums are installed in somesite.com/forums/index.php or forums.somesite.com
15:19 🔗 aggro (forums.somesite.com/index.php)
15:23 🔗 floppywar aggro: It grabbed forums.php(.html is the final output) and is grabbing stuff from /static/ now.
15:24 🔗 floppywar aggro: aaaand, it's finished. so no descend into threads again :(
15:25 🔗 floppywar I left out --exclude-directories, --reject and --warc-file
15:25 🔗 aggro that's fine.
15:25 🔗 SmileyG :<
15:25 🔗 aggro Does that "forums.php" page have the typical listing of different forums?
15:25 🔗 aggro (I'm assuming phpbb or vbulletin or one of those major ones)
15:25 🔗 floppywar aggro: no, custom board I think.
15:26 🔗 floppywar It's what.cd
15:27 🔗 floppywar my connection dropped. did I miss anything after "(I'm assuming [...]"?
15:27 🔗 aggro Hmmm... perhaps a cookie or login issue? Try running it with the "-d debug.log" option.
15:27 🔗 aggro Oh: repeat: phpbb or vbulletin or one of those major ones)
15:27 🔗 aggro I was guessing the type of system the forum is running on
15:28 🔗 floppywar what.cd runs question forum software, probably included with their in-house though open-source Gazelle private tracker software solution.
15:29 🔗 floppywar what.cd runs homebrew forum software*
15:29 🔗 aggro huh. interesting :D
15:29 🔗 SmileyG what.cd?
15:29 🔗 SmileyG Sounds like one of the magizines
15:29 🔗 aggro The first thing I'd be looking at is the debug output and probably some of the pages to see where specifically content is getting grabbed from.
15:29 🔗 floppywar biggest private music tracker in existance
15:30 🔗 floppywar printscreen incoming
15:32 🔗 floppywar http://image.bayimg.com/mapjeaada.jpg
15:33 🔗 aggro Bingo. You'll want the URL to be "https://ssl.what.cd/"
15:33 🔗 SmileyG :D
15:33 🔗 floppywar aggro: It is
15:33 🔗 SmileyG only thing I notice is the https.... Oh
15:34 🔗 floppywar perhaps I could try the non-https version.
15:34 🔗 floppywar but I've got to go now. I'll be back in a couple of hours.
15:34 🔗 floppywar bye
15:34 🔗 SmileyG wget doesn't trust the ssl cert they using or something bonkers?
15:34 🔗 aggro Same here.
15:34 🔗 aggro And possibly what Smiley said. So look at the debug output from wget :P
15:34 🔗 aggro Back in a few hours all :D
15:34 🔗 floppywar kbai
15:34 🔗 SmileyG o/
16:17 🔗 SketchCow WHERE'S THE HUG
16:17 🔗 SketchCow I hope that floppywar does some greatness.
16:17 🔗 * Schbirid MUGS SketchCow
16:18 🔗 Schbirid i learned that my city's surveying office sits on 2 cabinets full of tapes with historic aerial photos they might just throw away because they dont have the readers any more and dont care
16:18 🔗 Schbirid i made my interest clear :)
16:18 🔗 Schbirid i think
16:18 🔗 Schbirid at least to a colleague who works in the department
16:18 🔗 SketchCow Keep on it.
16:18 🔗 SketchCow Be willing with a trunk.
16:18 🔗 SketchCow What form are they
16:19 🔗 Schbirid i dont have the slightest idea
16:19 🔗 Schbirid but it probably wont be decided in the near future
16:21 🔗 arkhive http://www.engadget.com/2012/06/02/picplz-shutting-down-permanently-on-july-3rd-all-photos-to-be-d/
16:22 🔗 arkhive and http://torrentfreak.com/worlds-oldest-bittorrent-site-shuts-down-120605/ Thought this may be of interest to some.
16:31 🔗 SketchCow We've already grabbed 75% of picplz
16:34 🔗 mistym Wow, that's fast. How many gigs so far?
16:35 🔗 Schbirid http://picplz.heroku.com/
16:51 🔗 arkhive alright. I'm using archiveteam warrior and am now working on picpiz
16:53 🔗 SketchCow Excellent.
16:56 🔗 SketchCow For some reason, my archiveteam warrior fails out, can't see anything with the network.
17:06 🔗 DoubleJ What's your host setup? I've had problems with VirtualBox on Win7 not believing the network exists.
17:10 🔗 SketchCow Yeah
17:10 🔗 SketchCow But it worked a while ago.
17:11 🔗 DoubleJ That... actually sounds like the problem I had with VirtualBox. For whatever reason bridged mode decided it didn't want to work any more.
17:18 🔗 yipdw the VirtualBox network drivers may no longer be loaded
17:19 🔗 yipdw I've had that happen on Linux and OS X; unfortunately the only way I found to fix that was to unload and reload the modules/kexts
17:19 🔗 yipdw supposedly restarting VirtualBox fixes it, too, though I've had little luck with that procedure
17:22 🔗 DoubleJ yipdw: Yeah, it was a pain. Restarting VBox didn't work, rebooting the host occasionally did. Then it gave up altogether so now I just deal with the pain that is NAT.
17:24 🔗 LordNlptp may mean windows is up for a reinstall
17:26 🔗 DoubleJ Meh. I've got something that works now, so that's how it's gonna stay. I don't have a spare weekend to deal with paving over the system.
18:02 🔗 SketchCow I just got my warrior working
18:03 🔗 SketchCow Which is a huge-ass euphemism
18:03 🔗 SketchCow But also true.
18:03 🔗 SmileyG oh crap I left mine running at work
18:03 🔗 SmileyG doh
18:03 🔗 SketchCow Archive Team: Saving History... By Mistake
18:03 🔗 SmileyG ;)
18:19 🔗 Schbirid oh poop, my gamespy forum script is not getting it all
18:37 🔗 Schbirid it's wget's fault :(
18:38 🔗 Aranje Oh yeah, blame your failures on wget
18:38 🔗 Aranje :3
18:41 🔗 Schbirid might have been my tries and timeout limits
18:42 🔗 Schbirid those would be logged as errors even with -nv, right? because no errors here
18:46 🔗 alard SketchCow: Following the twitter feedback, the warrior scripts now show the URL of the tracker stats page.
18:56 🔗 SketchCow Excellent.
18:56 🔗 SketchCow Codinghorror's got ideas aplenty
18:57 🔗 SketchCow They're generally very good, so they're worth considering.
18:58 🔗 SketchCow Love that warrior
18:59 🔗 SketchCow https://twitter.com/codinghorror/status/210446032104980480
19:00 🔗 SketchCow Ha ha, codinghorror is now in on the uploading.
19:06 🔗 SmileyG lol
19:08 🔗 yipdw maybe Jeff Atwood can make Stack Overflow do some warrior work
19:08 🔗 SketchCow http://www.catwholaughed.com/previous/index.html is the artist who will do somehting for warrior
19:10 🔗 Schbirid what the hell, --level=10 did not do what i thought it would
19:10 🔗 Schbirid "wget -a test.log -m -nv --adjust-extension --convert-links --page-requisites -np --level=10 -X PrivateMessages -X Static http://forums.gamespy.com/fileplanet_command_center/b67434/p1"
19:10 🔗 Schbirid with the level=10 it would not download all the pXXXX, only 533
19:10 🔗 Schbirid ooh
19:11 🔗 Schbirid i thought that would control how many directories it would go down. but it makes it not crawl further than 10 "links" in terms of recursion, right?
19:11 🔗 Schbirid is there something to limit the directory level traversal?
20:22 🔗 Solar_Dra Hello, could someone change the status of ShoutWiki ( http://archiveteam.org/index.php?title=ShoutWiki ) to Online please as we are very much up and running, as you can see here: http://shoutwiki.com/wiki/Main_Page
20:22 🔗 Solar_Dra I would do so myself but account creation appears to not be working
20:25 🔗 Solar_Dra Further evidence is in our blog: http://blog.shoutwiki.com/
20:30 🔗 Solar_Dra Anyone?
20:30 🔗 arkhive It will get done
21:32 🔗 arkhive I've had two errors when using archiveteam warrior
21:34 🔗 arkhive Both my fault. Network connected but internet stops working. Maybe because I'm pumping too much in and out? I"m on a 12Mb/s down 896Kb/s up connection.
21:35 🔗 arkhive Anyway... Thought I'd let you know. I hope I didn't mess anything up or those 'items' get skipped?
21:46 🔗 chronomex it depends
21:46 🔗 chronomex what kind of errors?
21:48 🔗 chronomex items that have been claimed but not marked as complete will eventually be sent to someone else to re-run
21:48 🔗 chronomex if you've uploaded a broken pack, that's a bit different
22:16 🔗 swebb Hey, any Google Reader users out there, I'm building a site to filter/crawl RSS articles, so for Feeds like Gizmodo and others that only provide a snippet of the content in their RSS feed, I crawl the page, use the Readability algorithm to pull the full content and then replace their content in the RSS feed. If you guys want, feel free to test out the service and let me know how you like it.
22:18 🔗 swebb http://rsshose.com
22:19 🔗 chronomex neat
22:20 🔗 swebb Also, if you read Hacker News, it'll provide a feed of all of the actual articles linked to, instead of "comments" as the body of each article in the feed.
22:20 🔗 swebb I'm adding de-duping to it soon too, so if/when the next apple event happens, you won't have to re-read the same damn article 100 times.
22:21 🔗 chronomex I gave up HN recently
23:39 🔗 DrainLbry moral quandry: is a website dedicated to acts of gore threatened with shutdown a worthy item of archiving? http://www.cbc.ca/news/politics/story/2012/06/05/pol-magnotta-best-gore-police.html?cmp=rss
23:40 🔗 chronomex yes
23:40 🔗 DrainLbry well, have at that... that site is NSFL.
23:40 🔗 DrainLbry click the puppy, trust me.
23:40 🔗 chronomex it will help to provide cultural context to future historians
23:41 🔗 DrainLbry yeah i'd agree. i might think it's way fucked, but it's newsworthy, and noteworthy as a result
23:41 🔗 chronomex anything that would cause someone 100 years from now to say "huh, I didn't know ..." is definitely worh it
23:41 🔗 chronomex newsworthy and noteworthy is not the same as what I'm talking about
23:42 🔗 DrainLbry right, i get your take on it though, it is historically significant to future sociologists and the like
23:42 🔗 chronomex everyone archives for their own reasons
23:42 🔗 chronomex these are mine
23:44 🔗 balrog_ DrainLbry: welcome back
23:44 🔗 balrog_ don't see you on freenode... :p
23:51 🔗 DrainLbry attempting a magi.com mirror with wget-warc - if anyone eyeballs anything weird i'd have to do off the top of your head let me know, cause i just --mirror'd that bitch and hope it works - http://www.insidesocialgames.com/2012/06/06/magi-com-shutting-down/
23:51 🔗 DrainLbry meh yeah, and that totally did not work.
23:52 🔗 DrainLbry or maybe it did... site's not as epic as i thought
23:55 🔗 DrainLbry yeah not so sure, anyone want to school me with some knowledge? 12 megs doesnt seem right

irclogger-viewer