#archiveteam 2013-03-04,Mon

โ†‘back Search

Time Nickname Message
00:05 ๐Ÿ”— godane i can upload my g4tv.com crap again :-D
00:17 ๐Ÿ”— omf_ godane, how many gigs you got left to upload?
00:23 ๐Ÿ”— godane alot
00:23 ๐Ÿ”— godane its in the 100s i know
04:41 ๐Ÿ”— kennethr- spinning up some boxes :)
05:03 ๐Ÿ”— kennethr- wonder if i should increase concurrency
05:03 ๐Ÿ”— kennethr- gr, wrong room
06:18 ๐Ÿ”— jonas_ wow, what is yahoo trying to prove this time
06:19 ๐Ÿ”— Cameron_D they are good at destroying data?
06:22 ๐Ÿ”— Lord_Nigh whats happening now
06:28 ๐Ÿ”— jonas_ To help focus our efforts on core Yahoo! product experiences, we will discontinue Yahoo! Message Boards by 31st March 2013
06:29 ๐Ÿ”— jonas_ hm, what might the core experience...?
06:29 ๐Ÿ”— jonas_ be?
06:35 ๐Ÿ”— chronomex ads and shutting down products
06:37 ๐Ÿ”— Lord_Nigh wait... *WHAT????*
06:37 ๐Ÿ”— Lord_Nigh are they fucking serious?
06:38 ๐Ÿ”— GLaDOS What service is this now?
06:38 ๐Ÿ”— Lord_Nigh there goes about 5 groups i'm subscribed to as well as the old gameboy group which i no longer can access because the admin is awol
06:38 ๐Ÿ”— Lord_Nigh GLaDOS: yahoo is killing their groups/messageboards
06:38 ๐Ÿ”— Lord_Nigh i.e. yahoo mailing lists etc
06:39 ๐Ÿ”— jonas__ it is 60 million urls in google
06:39 ๐Ÿ”— GLaDOS NOT THE MAILING LISTS
06:39 ๐Ÿ”— Lord_Nigh yes, those
06:40 ๐Ÿ”— jonas__ but not groups.yahoo.com
06:40 ๐Ÿ”— Lord_Nigh jonas__: wait... hold on
06:40 ๐Ÿ”— Lord_Nigh theyre NOT killing groups.yahoo.com?
06:40 ๐Ÿ”— jonas__ only messages.yahoo.com
06:41 ๐Ÿ”— chronomex correct
06:41 ๐Ÿ”— chronomex groups is going next quarter
06:41 ๐Ÿ”— * chronomex ducks
06:41 ๐Ÿ”— GLaDOS So, in other words, not as much panic for us?
06:41 ๐Ÿ”— jonas__ yahoo.com has 700 million urls according to google site:...
06:42 ๐Ÿ”— jonas__ groups is 400 of that
06:42 ๐Ÿ”— kennethr- talk about surface area
06:42 ๐Ÿ”— jonas__ and messages is 60
06:42 ๐Ÿ”— Lord_Nigh ok. most of my interest personally is in groups.yahoo.com
06:42 ๐Ÿ”— brayden wow. yahoo will have no content by the end of this year anymore except a home page full of ads
06:42 ๐Ÿ”— GLaDOS Well, time to start scraping groups URLs..
06:42 ๐Ÿ”— Lord_Nigh chronomex: i hope you're not serious about groups being killed next quarter
06:43 ๐Ÿ”— chronomex I have no inside line
06:43 ๐Ÿ”— chronomex yahoo is yahoo
06:43 ๐Ÿ”— Lord_Nigh however planning for the inevitable fall of groups is not a bad idea
06:43 ๐Ÿ”— chronomex right
06:44 ๐Ÿ”— chronomex I've spent some time wandering around groups with an eye towards archiving
06:45 ๐Ÿ”— jonas__ by amount of content answers.yahoo comes first
06:46 ๐Ÿ”— jonas__ geocities, before, had 30 million urls answers has 120 ^_^
06:46 ๐Ÿ”— brayden good thing answers isn't going down
06:47 ๐Ÿ”— brayden ..yet
06:47 ๐Ÿ”— jonas__ :D
06:47 ๐Ÿ”— jonas__ i wonder why messages.yahoo isnt listed here http://www.alexa.com/siteinfo/yahoo.com
06:47 ๐Ÿ”— DFJustin oh thank god I still need to find out how babby is formed
06:49 ๐Ÿ”— brayden lol just looking at the posts on there. so much spam :( http://messages.yahoo.com/Computers_%26_Internet/threadview?m=tm&bn=17887109&tid=3437&mid=3437&tof=6&frt=2 but this.. lol
06:49 ๐Ÿ”— GLaDOS swebb: vincentchu: @archiveteam Check this out: https:/รขย€ย‹/รขย€ย‹t.co/รขย€ย‹IRBwrM6QN5 [less than one minute ago]
06:49 ๐Ÿ”— GLaDOS They're watching us...
06:49 ๐Ÿ”— jonas__ haha
06:50 ๐Ÿ”— jonas__ guess they took it off from by request but is there a way to find a snapshot of http://www.alexa.com/siteinfo/yahoo.com ?
06:51 ๐Ÿ”— chronomex that's really interesitng GLaDOS
06:51 ๐Ÿ”— jonas__ site:answers.yahoo.com viagra is only 0.1million urls, while site:answers.yahoo.com science is 10million so thats a good sign ;)
06:56 ๐Ÿ”— jonas__ (while recent users may be spam more often) # http://www.youtube.com/watch?v=_f9oJikx0-I
06:58 ๐Ÿ”— ersi https://twitter.com/vincentchu/status/308467595416334336/photo/1
06:58 ๐Ÿ”— ersi that's nice :D
07:04 ๐Ÿ”— omf_ Yahoo is taking the google approach to products. The will do a once a year sweep in the spring and that will be it. From a business standpoint if a product is costing money, you should get rid of it. Would we be as upset if as soon as they announced it they had uploaded a full copy to IA, or dropbox or on bittorrent
07:04 ๐Ÿ”— omf_ I know that sites like stackoverflow and stackexchange spoil us but it took years for ideas like that to evolve.
07:05 ๐Ÿ”— omf_ stackexchange == gamified, more bragging rights, moderated usenet
07:05 ๐Ÿ”— omf_ Also I have a spider poking around in messages.yahoo.com to see if I can get a full list
07:08 ๐Ÿ”— jonas__ instead of shuting it down or even uploading it somewhere they could just keep it in place but terminate new messages
07:08 ๐Ÿ”— jonas__ *the ability to post new messages
07:08 ๐Ÿ”— jonas__ only
07:08 ๐Ÿ”— ersi Yes, but seemingly it doesn't work that way.
07:22 ๐Ÿ”— jonas__ or they could permanently forward the urls to archive.org after makeing sure they exist there
07:25 ๐Ÿ”— jonas__ in favor of the people who still click links to them in every second...
07:43 ๐Ÿ”— Lord_Nigh i think for legal liability reasons they can't just GIVE us the raw databases
07:43 ๐Ÿ”— Lord_Nigh what if god forbid some cartilage-brained moron posted their bank info there
07:44 ๐Ÿ”— Lord_Nigh or social security number
07:44 ๐Ÿ”— Lord_Nigh etc
08:11 ๐Ÿ”— jonas__ even when its already public and the only differnce that action would make, would be to help archive.org archive the site more efficient/complete?
08:12 ๐Ÿ”— chronomex more complete
08:12 ๐Ÿ”— jonas__ :D
08:19 ๐Ÿ”— jonas__ i mean i can only imagine that they cant provide the data as a whole, when the service beeing shutdown had TOS that would only allow the messages to be published on its website exclusively and when archive.org crawlers would be locked out from it
08:20 ๐Ÿ”— jonas__ but apparently they agree with archival / archive.org
08:50 ๐Ÿ”— ersi This is getting more and more into philosophical fanatsy magic land.. Put it in #archiveteam-bs
08:50 ๐Ÿ”— chronomex no, not really bs material
08:50 ๐Ÿ”— chronomex -bs is for random off topic side chats
08:50 ๐Ÿ”— chronomex not philosophy of archivism
08:52 ๐Ÿ”— ersi I'd say it is, when it's not actively contributes to archiving things. Oh well.
08:53 ๐Ÿ”— chronomex I should really figure out how to wire up the off topic siren in weechat
08:56 ๐Ÿ”— jonas__ i just questioned that there a really legal reasons that they can provide whole data dumps like Lord_Nigh assumed
08:57 ๐Ÿ”— jonas__ if not you have to assume its only that they dont care or lack a relationship with archive.org waybackmachine to just get it down easily along with their yearly shutdown acttivites.
09:02 ๐Ÿ”— jonas__ i assumed not to bother anyone with that wondering, since the ongoing work seems to happen in #preposterus
09:05 ๐Ÿ”— ersi You're not being a bother in particular, it's just that this conversation is such a empitome of boringness and has been had countless amounts of time. The next topic you can talk about is "Why is IA not fixing the robots issue?" which would be equally boring
09:07 ๐Ÿ”— SketchCow Let's discuss it more.
09:07 ๐Ÿ”— SketchCow Whichever archive team members don't die of boredom, get to reproduce
09:08 ๐Ÿ”— ersi We can also talk about copyright and trademarks!
09:08 ๐Ÿ”— ersi And filing standards
09:09 ๐Ÿ”— chronomex SketchCow: or say "forget irc I'm going to have sex for a while"
09:09 ๐Ÿ”— chronomex which I suppose is the same thing
13:40 ๐Ÿ”— norbert79 Ehm, guys, Mimi12 is an automatic bot
13:41 ๐Ÿ”— norbert79 would you be so kind get rid of it?
13:41 ๐Ÿ”— norbert79 thank you
13:42 ๐Ÿ”— norbert79 ironically it worked pretty dumb
13:42 ๐Ÿ”— norbert79 eh, damn, forgat to change a setting, brb
13:43 ๐Ÿ”— omf_ Anyone have the bitly url list already downloaded
13:44 ๐Ÿ”— omf_ I was thinking you could run a search on that using the NYT green blog base url and have a list of article to pull down since it is going offline
13:44 ๐Ÿ”— omf_ I was going to download it but 33GB is going to take a while
13:44 ๐Ÿ”— ersi omf_: There's the URLTeam dataset. Available over torrent
13:44 ๐Ÿ”— ersi Oh, hm.
13:45 ๐Ÿ”— omf_ I am going to set it to download later today
13:45 ๐Ÿ”— omf_ I am just throwing the idea out incase anyone else wanted to try something like it
13:45 ๐Ÿ”— GLaDOS I might download it and store it on my VPS..
13:46 ๐Ÿ”— GLaDOS So people can access the dataset over HTTP, and grab only chunks of it
13:48 ๐Ÿ”— omf_ You can already do file selection via the torrent
23:18 ๐Ÿ”— SketchCow So, everyone.
23:18 ๐Ÿ”— SketchCow I'm uploading most of the TOSEC I found
23:18 ๐Ÿ”— SketchCow Everything except late Nintendo is going up
23:39 ๐Ÿ”— wehidden is there anything we can do to help?
23:43 ๐Ÿ”— SketchCow Yes.
23:43 ๐Ÿ”— SketchCow OK, so shortly I'll have them ALL up.
23:43 ๐Ÿ”— SketchCow All that was in this set.
23:43 ๐Ÿ”— SketchCow But I'd like help to find if I'm missing some sets.
23:43 ๐Ÿ”— SketchCow They claim they're at 3.8t
23:43 ๐Ÿ”— SketchCow I had 100g
23:43 ๐Ÿ”— SketchCow Granted, maybe they mean unpacked.
23:44 ๐Ÿ”— robbiet48 twitter killed tweetdeck
23:44 ๐Ÿ”— robbiet48 they annonunced it by posting on their posterous blog
23:44 ๐Ÿ”— robbiet48 which they are also killing
23:44 ๐Ÿ”— chronomex hahahahaha
23:44 ๐Ÿ”— wehidden lololol
23:44 ๐Ÿ”— chronomex really
23:44 ๐Ÿ”— robbiet48 https://tweetdeck.posterous.com/an-update-on-tweetdeck
23:44 ๐Ÿ”— chronomex perfect.
23:50 ๐Ÿ”— S[h]O[r]T omf_
23:50 ๐Ÿ”— S[h]O[r]T i have the torrent downloaded
23:51 ๐Ÿ”— S[h]O[r]T what do you want me to run against it?
23:52 ๐Ÿ”— SketchCow http://archive.org/details/tosec
23:52 ๐Ÿ”— SketchCow So if people want to see stuff as I'm adding it, there you go
23:52 ๐Ÿ”— SketchCow New stuff showing up every few minutes
23:54 ๐Ÿ”— SketchCow This is relatively big news in the software world, as far as I'm concerned.

irclogger-viewer