| Time |
Nickname |
Message |
|
00:40
π
|
ivan` |
if anyone wants to discover blog URLs/usernames, or even the blog platforms themselves, that would be an enormous help: http://www.archiveteam.org/index.php?title=Google_Reader |
|
00:41
π
|
ivan` |
especially some foreign ones that we generally ignore |
|
00:45
π
|
ivan` |
I'm going to fix the pipeline script and set up the database that will generate the work items |
|
00:50
π
|
ivan` |
also, if anyone is friends with Jeff Barr or Bill Kearney of http://www.syndic8.com/ maybe you can bug them for the data |
|
00:50
π
|
ivan` |
their site does not respond to requests beyond the homepage |
|
01:30
π
|
ivan` |
did anyone grab all the opml files from opmlmanager before it went down sometime in 2012? |
|
01:30
π
|
ivan` |
IA doesn't seem to have much, http://web.archive.org/web/20120210125326/http://www.opmlmanager.com/user_list.php |
|
02:01
π
|
zenguy_pc |
i did the google takeout thing for reader |
|
02:01
π
|
zenguy_pc |
extracted the xml |
|
02:02
π
|
zenguy_pc |
it should be 3-4 years old but i don't see the stored posts |
|
02:02
π
|
zenguy_pc |
you guys need reddit rss? |
|
02:02
π
|
zenguy_pc |
i only have gonewild though for about a year |
|
02:02
π
|
zenguy_pc |
nevermind .. only individual posts |
|
02:04
π
|
zenguy_pc |
does giving you the google reader subscriptionsxml allow you to get all posts via google api that they have from those urls like 3 years old? |
|
02:20
π
|
ivan` |
zenguy_pc: yes, Reader serves data even for URLs that don't exist anymore |
|
02:20
π
|
ivan` |
and as far as I can tell, Reader does keep all posts |
|
02:20
π
|
zenguy_pc |
ok thats good |
|
02:20
π
|
ivan` |
maybe there's some really high limit so they don't store a million spam posts |
|
02:20
π
|
zenguy_pc |
i wish i had more urls.. i have had had 30 tops |
|
02:20
π
|
zenguy_pc |
i will check my other accounts later |
|
02:21
π
|
zenguy_pc |
i got my subscriptions . i thought they would have given me all the posts i had.. i see starred posts |
|
02:21
π
|
ivan` |
right |
|
02:21
π
|
zenguy_pc |
i hadn't touched it in a year and io have some music blogs i wanted to go through |
|
02:22
π
|
zenguy_pc |
i can't do that in a month |
|
02:22
π
|
ivan` |
which music blogs? |
|
02:25
π
|
ivan` |
Feed API is supposed to stay up after July 1, but the eternal spring cleaning will probably kill it soon too |
|
02:26
π
|
zenguy_pc |
2dopeboyz |
|
02:26
π
|
zenguy_pc |
http://pastie.org/7966029 |
|
02:27
π
|
BlueMax |
http://tracker.archiveteam.org/posterous/ 10,000 items left |
|
02:27
π
|
BlueMax |
wow |
|
02:29
π
|
zenguy_pc |
i had the overheard in NY , ATL Office.. sites in rss than they seem to have died in dec 2012 |
|
02:29
π
|
zenguy_pc |
will you guys retrieve the urls from it ro can you get cached info from google api ? |
|
02:30
π
|
zenguy_pc |
do the sites need to be live? |
|
02:34
π
|
ivan` |
for Reader the sites can be dead; just need the feed URL |
|
02:35
π
|
ivan` |
I have a "Deleted" folder in my Reader that has a lot of stuff |
|
02:44
π
|
zenguy_pc |
what about random urls? |
|
02:44
π
|
zenguy_pc |
if you're after google cached stuff , isn't there a way to get random rss urls any see if google has it in their db? |
|
02:45
π
|
ivan` |
zenguy_pc: you can do searches with https://www.google.com/reader/directory/search?q=keyword-here |
|
02:45
π
|
ivan` |
you can also use their recommendations feature in Reader |
|
03:04
π
|
zenguy_pc |
how do you intend to deak with reddit rate limit.. i've been banned several times just update feeds every hours.. 1000+ |
|
03:05
π
|
ivan` |
we grab the data from Google, reddit probably doesn't rate-limit Google |
|
03:05
π
|
zenguy_pc |
i wanted to get posts before users deleted them which some gonewild posters were apt to do |
|
03:06
π
|
zenguy_pc |
can i use the same google reader upload for reddit upload |
|
03:06
π
|
ivan` |
I don't know what that means |
|
03:06
π
|
zenguy_pc |
nevermind .. you answered the question |
|
03:07
π
|
zenguy_pc |
you'll index reddit through google |
|
03:08
π
|
ivan` |
I'm not even really interested in new data, just old data, but Reader is going to hit the site if it doesn't have the feed in its cache already |
|
03:09
π
|
zenguy_pc |
ah good idea. |
|
03:09
π
|
zenguy_pc |
i saw reddit's own backup blog post and it was a lot of data |
|
03:10
π
|
zenguy_pc |
wish they had proper search.. so much content/gems is lost in all that data without proper search |
|
03:15
π
|
zenguy_pc |
has anyone ever used this http://buzz.sourceforge.net/ |
|
03:18
π
|
zenguy_pc |
i could never get it to work |
|
03:55
π
|
SketchCow |
http://www.flickr.com/photos/textfiles/sets/72157633722203885/ |
|
03:59
π
|
BlueMax |
"The Lucky Byte" huh |
|
04:10
π
|
S[h]O[r]T |
thats awesome |
|
04:13
π
|
S[h]O[r]T |
i didnt realize IA was a christian organization |
|
04:18
π
|
S[h]O[r]T |
what are the tv tuners recording, anything special 24/7? |
|
05:44
π
|
ivan` |
what the heck is the password for http://areca.co/8/Feed-Item-Dataset-TUDCS5 |
|
05:45
π
|
ivan` |
I have tried quite a few |
|
05:59
π
|
ivan` |
hopefully the author knows and will reply |
|
08:40
π
|
Smiley |
S[h]O[r]T: It's not afaik, it's in a old "scientific christian" church tho from what that movie said yesterday. |
|
08:40
π
|
Smiley |
Maaan, yahoo have basically killed flickr already, it's so slow :/ |
|
08:41
π
|
BlueMax |
it wasn't dead when Yahoo took control of it in the first place? |
|
08:42
π
|
Smiley |
No. |
|
08:42
π
|
Smiley |
It was quick at least. |
|
08:44
π
|
ivan` |
http://www.archiveteam.org/index.php?title=Google_Reader please edit if you know of more blog platforms |
|
08:46
π
|
ivan` |
hm, I should go through my feed URLs and discover some myself |
|
08:49
π
|
godane |
so i'm finding more techtv stuff |
|
08:50
π
|
godane |
i didn't do a full scan of the 21000s yet |
|
08:50
π
|
godane |
i also only the 21800s video ids |
|
08:50
π
|
godane |
*only did |
|
09:14
π
|
godane |
i'm upload 2nd season of secret life of machines |
|
09:15
π
|
antomatic |
S[h]O[r]T: I think they archive TV News and stuff. |
|
09:15
π
|
antomatic |
(indexed via closed captioning data - very cool) |
|
09:17
π
|
Smiley |
"It's reported that Yahoo has formally put in a bid to buy Hulu only a week after adding Tumblr to the family. |
|
09:17
π
|
Smiley |
Really yahoo? REALLY? |
|
09:17
π
|
Smiley |
BUY ALL THE THINGS! |
|
09:24
π
|
BlueMax |
christ they don't have the cash to buy Hulu surely |
|
09:30
π
|
godane |
so i got 45k videos uploaded to my g4video-web collection |
|
10:22
π
|
godane |
so is anyone willing to give me money to buy a new hard dirve? |
|
11:55
π
|
ivan` |
The following text is what triggered our spam filter: http://USERNAME.tumblr.com |
|
11:55
π
|
ivan` |
The text you wanted to save was blocked by the spam filter. This is probably caused by a link to a blacklisted external site. |
|
12:53
π
|
edoc |
jux.com is closing. |
|
12:59
π
|
ivan` |
google finds 30 blogs on jux.com |
|
12:59
π
|
ivan` |
okay actually a lot more when I click that redundant link |
|
15:05
π
|
omf_ |
Is anyone else building applications using the Internet Archive's APIs? |
|
15:05
π
|
omf_ |
I wrote up a bunch of documentation but I could be missing something since the IA lacks developer documentation |
|
15:36
π
|
Smiley |
ivan`: # of results? |
|
15:43
π
|
WiK |
afternoon |
|
15:48
π
|
omf_ |
WiK, will you take a pull request that expands your language coverage? |
|
15:51
π
|
WiK |
do what? |
|
15:53
π
|
WiK |
i understood the pull request part |
|
15:55
π
|
omf_ |
I added a few lines to pullfromdb.sh to track more programming language files |
|
16:04
π
|
WiK |
nopaste your changes, let me to a look |
|
16:05
π
|
WiK |
meh, just do a pull request |
|
16:05
π
|
WiK |
ill take a look at it |
|
20:48
π
|
soultcer |
ivan`: tumblr is on the spam blacklist, since they are bad at filtering spam and the only way to report spam to them is by first signing up on tumblr - http://www.archiveteam.org/index.php?title=MediaWiki:Spam-blacklist |
|
21:27
π
|
ivan` |
Smiley: 760,000ish |
|
21:27
π
|
ivan` |
that was just with a site: |
|
22:11
π
|
ivan` |
soultcer: thanks, too bad |
|
22:30
π
|
citruspi |
Hey, could an admin PM me? |
|
22:51
π
|
balrog |
site:http://www3.telus.net/ stuff needs to be archived... dunno how much longer that will survive |
|
23:10
π
|
citruspi |
Hey, I'm looking to help with the Archive Team. |
|
23:10
π
|
citruspi |
I'm primarily a python progammer |
|
23:10
π
|
citruspi |
Is there anything I could do to help right now? (I'll hang around in the channel in the future) |
|
23:11
π
|
BlueMax |
citruspi, I'm sure someone can find some use for you :P |
|
23:12
π
|
citruspi |
Sweet, thanks :) |
|
23:13
π
|
ivan` |
citruspi: http://www.archiveteam.org/index.php?title=Google_Reader needs your help |
|
23:13
π
|
ivan` |
you can write crawlers to discover usernames on the services mentioned, or if you want to do some C, add gzip support to wget |
|
23:14
π
|
ivan` |
also a crawler to hit https://www.google.com/reader/directory/search with every keyword imaginable |
|
23:14
π
|
citruspi |
Thanks ivan`, I'll join the channel |
|
23:14
π
|
ivan` |
thanks! |
|
23:14
π
|
* |
citruspi join #donereading |
|
23:15
π
|
citruspi |
yeah, didn't mean to add the /meΓ’ΒΒ¦ |