#archiveteam 2013-05-27,Mon

↑back Search

Time Nickname Message
00:40 πŸ”— ivan` if anyone wants to discover blog URLs/usernames, or even the blog platforms themselves, that would be an enormous help: http://www.archiveteam.org/index.php?title=Google_Reader
00:41 πŸ”— ivan` especially some foreign ones that we generally ignore
00:45 πŸ”— ivan` I'm going to fix the pipeline script and set up the database that will generate the work items
00:50 πŸ”— ivan` also, if anyone is friends with Jeff Barr or Bill Kearney of http://www.syndic8.com/ maybe you can bug them for the data
00:50 πŸ”— ivan` their site does not respond to requests beyond the homepage
01:30 πŸ”— ivan` did anyone grab all the opml files from opmlmanager before it went down sometime in 2012?
01:30 πŸ”— ivan` IA doesn't seem to have much, http://web.archive.org/web/20120210125326/http://www.opmlmanager.com/user_list.php
02:01 πŸ”— zenguy_pc i did the google takeout thing for reader
02:01 πŸ”— zenguy_pc extracted the xml
02:02 πŸ”— zenguy_pc it should be 3-4 years old but i don't see the stored posts
02:02 πŸ”— zenguy_pc you guys need reddit rss?
02:02 πŸ”— zenguy_pc i only have gonewild though for about a year
02:02 πŸ”— zenguy_pc nevermind .. only individual posts
02:04 πŸ”— zenguy_pc does giving you the google reader subscriptionsxml allow you to get all posts via google api that they have from those urls like 3 years old?
02:20 πŸ”— ivan` zenguy_pc: yes, Reader serves data even for URLs that don't exist anymore
02:20 πŸ”— ivan` and as far as I can tell, Reader does keep all posts
02:20 πŸ”— zenguy_pc ok thats good
02:20 πŸ”— ivan` maybe there's some really high limit so they don't store a million spam posts
02:20 πŸ”— zenguy_pc i wish i had more urls.. i have had had 30 tops
02:20 πŸ”— zenguy_pc i will check my other accounts later
02:21 πŸ”— zenguy_pc i got my subscriptions . i thought they would have given me all the posts i had.. i see starred posts
02:21 πŸ”— ivan` right
02:21 πŸ”— zenguy_pc i hadn't touched it in a year and io have some music blogs i wanted to go through
02:22 πŸ”— zenguy_pc i can't do that in a month
02:22 πŸ”— ivan` which music blogs?
02:25 πŸ”— ivan` Feed API is supposed to stay up after July 1, but the eternal spring cleaning will probably kill it soon too
02:26 πŸ”— zenguy_pc 2dopeboyz
02:26 πŸ”— zenguy_pc http://pastie.org/7966029
02:27 πŸ”— BlueMax http://tracker.archiveteam.org/posterous/ 10,000 items left
02:27 πŸ”— BlueMax wow
02:29 πŸ”— zenguy_pc i had the overheard in NY , ATL Office.. sites in rss than they seem to have died in dec 2012
02:29 πŸ”— zenguy_pc will you guys retrieve the urls from it ro can you get cached info from google api ?
02:30 πŸ”— zenguy_pc do the sites need to be live?
02:34 πŸ”— ivan` for Reader the sites can be dead; just need the feed URL
02:35 πŸ”— ivan` I have a "Deleted" folder in my Reader that has a lot of stuff
02:44 πŸ”— zenguy_pc what about random urls?
02:44 πŸ”— zenguy_pc if you're after google cached stuff , isn't there a way to get random rss urls any see if google has it in their db?
02:45 πŸ”— ivan` zenguy_pc: you can do searches with https://www.google.com/reader/directory/search?q=keyword-here
02:45 πŸ”— ivan` you can also use their recommendations feature in Reader
03:04 πŸ”— zenguy_pc how do you intend to deak with reddit rate limit.. i've been banned several times just update feeds every hours.. 1000+
03:05 πŸ”— ivan` we grab the data from Google, reddit probably doesn't rate-limit Google
03:05 πŸ”— zenguy_pc i wanted to get posts before users deleted them which some gonewild posters were apt to do
03:06 πŸ”— zenguy_pc can i use the same google reader upload for reddit upload
03:06 πŸ”— ivan` I don't know what that means
03:06 πŸ”— zenguy_pc nevermind .. you answered the question
03:07 πŸ”— zenguy_pc you'll index reddit through google
03:08 πŸ”— ivan` I'm not even really interested in new data, just old data, but Reader is going to hit the site if it doesn't have the feed in its cache already
03:09 πŸ”— zenguy_pc ah good idea.
03:09 πŸ”— zenguy_pc i saw reddit's own backup blog post and it was a lot of data
03:10 πŸ”— zenguy_pc wish they had proper search.. so much content/gems is lost in all that data without proper search
03:15 πŸ”— zenguy_pc has anyone ever used this http://buzz.sourceforge.net/
03:18 πŸ”— zenguy_pc i could never get it to work
03:55 πŸ”— SketchCow http://www.flickr.com/photos/textfiles/sets/72157633722203885/
03:59 πŸ”— BlueMax "The Lucky Byte" huh
04:10 πŸ”— S[h]O[r]T thats awesome
04:13 πŸ”— S[h]O[r]T i didnt realize IA was a christian organization
04:18 πŸ”— S[h]O[r]T what are the tv tuners recording, anything special 24/7?
05:44 πŸ”— ivan` what the heck is the password for http://areca.co/8/Feed-Item-Dataset-TUDCS5
05:45 πŸ”— ivan` I have tried quite a few
05:59 πŸ”— ivan` hopefully the author knows and will reply
08:40 πŸ”— Smiley S[h]O[r]T: It's not afaik, it's in a old "scientific christian" church tho from what that movie said yesterday.
08:40 πŸ”— Smiley Maaan, yahoo have basically killed flickr already, it's so slow :/
08:41 πŸ”— BlueMax it wasn't dead when Yahoo took control of it in the first place?
08:42 πŸ”— Smiley No.
08:42 πŸ”— Smiley It was quick at least.
08:44 πŸ”— ivan` http://www.archiveteam.org/index.php?title=Google_Reader please edit if you know of more blog platforms
08:46 πŸ”— ivan` hm, I should go through my feed URLs and discover some myself
08:49 πŸ”— godane so i'm finding more techtv stuff
08:50 πŸ”— godane i didn't do a full scan of the 21000s yet
08:50 πŸ”— godane i also only the 21800s video ids
08:50 πŸ”— godane *only did
09:14 πŸ”— godane i'm upload 2nd season of secret life of machines
09:15 πŸ”— antomatic S[h]O[r]T: I think they archive TV News and stuff.
09:15 πŸ”— antomatic (indexed via closed captioning data - very cool)
09:17 πŸ”— Smiley "It's reported that Yahoo has formally put in a bid to buy Hulu only a week after adding Tumblr to the family.
09:17 πŸ”— Smiley Really yahoo? REALLY?
09:17 πŸ”— Smiley BUY ALL THE THINGS!
09:24 πŸ”— BlueMax christ they don't have the cash to buy Hulu surely
09:30 πŸ”— godane so i got 45k videos uploaded to my g4video-web collection
10:22 πŸ”— godane so is anyone willing to give me money to buy a new hard dirve?
11:55 πŸ”— ivan` The following text is what triggered our spam filter: http://USERNAME.tumblr.com
11:55 πŸ”— ivan` The text you wanted to save was blocked by the spam filter. This is probably caused by a link to a blacklisted external site.
12:53 πŸ”— edoc jux.com is closing.
12:59 πŸ”— ivan` google finds 30 blogs on jux.com
12:59 πŸ”— ivan` okay actually a lot more when I click that redundant link
15:05 πŸ”— omf_ Is anyone else building applications using the Internet Archive's APIs?
15:05 πŸ”— omf_ I wrote up a bunch of documentation but I could be missing something since the IA lacks developer documentation
15:36 πŸ”— Smiley ivan`: # of results?
15:43 πŸ”— WiK afternoon
15:48 πŸ”— omf_ WiK, will you take a pull request that expands your language coverage?
15:51 πŸ”— WiK do what?
15:53 πŸ”— WiK i understood the pull request part
15:55 πŸ”— omf_ I added a few lines to pullfromdb.sh to track more programming language files
16:04 πŸ”— WiK nopaste your changes, let me to a look
16:05 πŸ”— WiK meh, just do a pull request
16:05 πŸ”— WiK ill take a look at it
20:48 πŸ”— soultcer ivan`: tumblr is on the spam blacklist, since they are bad at filtering spam and the only way to report spam to them is by first signing up on tumblr - http://www.archiveteam.org/index.php?title=MediaWiki:Spam-blacklist
21:27 πŸ”— ivan` Smiley: 760,000ish
21:27 πŸ”— ivan` that was just with a site:
22:11 πŸ”— ivan` soultcer: thanks, too bad
22:30 πŸ”— citruspi Hey, could an admin PM me?
22:51 πŸ”— balrog site:http://www3.telus.net/ stuff needs to be archived... dunno how much longer that will survive
23:10 πŸ”— citruspi Hey, I'm looking to help with the Archive Team.
23:10 πŸ”— citruspi I'm primarily a python progammer
23:10 πŸ”— citruspi Is there anything I could do to help right now? (I'll hang around in the channel in the future)
23:11 πŸ”— BlueMax citruspi, I'm sure someone can find some use for you :P
23:12 πŸ”— citruspi Sweet, thanks :)
23:13 πŸ”— ivan` citruspi: http://www.archiveteam.org/index.php?title=Google_Reader needs your help
23:13 πŸ”— ivan` you can write crawlers to discover usernames on the services mentioned, or if you want to do some C, add gzip support to wget
23:14 πŸ”— ivan` also a crawler to hit https://www.google.com/reader/directory/search with every keyword imaginable
23:14 πŸ”— citruspi Thanks ivan`, I'll join the channel
23:14 πŸ”— ivan` thanks!
23:14 πŸ”— * citruspi join #donereading
23:15 πŸ”— citruspi yeah, didn't mean to add the /meҀ¦

irclogger-viewer