#archiveteam 2013-10-08,Tue

↑back Search

Time Nickname Message
00:30 πŸ”— kyan I've downloaded all of the Insurgency Wiki that is on the porfusion website. Is there something I should do with the data?
00:32 πŸ”— arkhive Might be widely know.. I just found out though.. Bebo's old Bebo pages are now under the .archive.bebo.com/
00:33 πŸ”— arkhive and http://archive.bebo.com/Profile.jsp?MemberId=4134490647
00:33 πŸ”— arkhive example
00:33 πŸ”— arkhive http://www.bebo.com/#faq Your old photos and blog posts are safe. They will be available for download in a couple of months. Other things (skins, quizzes, wall posts, games etcҀ¦) unfortunately will all be retired.
00:34 πŸ”— arkhive WeҀ™re just as sentimental as you are (ok, probably more), and have left all public profiles visible for now. Private profiles are also saved, but not visible at the moment.
00:35 πŸ”— arkhive It might be time to put in the AT Warrior and start a tracker and save/grab what we can
00:44 πŸ”— arkhive Lol. but i am not that experienced to do so, yet. If I can help any way along with my bandwidth I'd love to. I had a Bebo years ago. Heh.
03:47 πŸ”— chfoo i made the wretch wiki page: http://archiveteam.org/index.php?title=Wretch
04:15 πŸ”— chfoo i made the bebo wiki page: http://archiveteam.org/index.php?title=Bebo
04:39 πŸ”— SketchCow Wheeee
04:39 πŸ”— SketchCow Did zapd die?
04:43 πŸ”— SketchCow Nope, still there.
04:43 πŸ”— yipdw hasn't been zapd out of existence yet
04:44 πŸ”— link343 So an old music Blog / Aggregator is shutting down
04:44 πŸ”— link343 http://pitchfork.com/news/52578-music-blog-aggregator-elbows-shuts-down/
04:44 πŸ”— link343 End of November. Do you think it's worth archiving?
04:50 πŸ”— bsmith093 omf_: at some point the rsync died and now i get this sending incremental file list it hung on this for **at least** 3 hours, after i tried to re run the rsync
05:09 πŸ”— chfoo link343: since no one else has answered yet, i'll say yes. i'll keep an eye on it.
05:10 πŸ”— link343 ok
06:39 πŸ”— Nemo_bis SketchCow: could you move all wikimediacommons* items in wikimedia-other collection to wikimediacommons collection?
06:39 πŸ”— Nemo_bis (the collection was somehow broken but seems to work now)
06:42 πŸ”— Nemo_bis are these on archive.org? http://www.bl.uk/bibliographic/download.html
07:25 πŸ”— SketchCow http://archive.org/details/BritishLibraryRdf earlier one
07:37 πŸ”— SketchCow I'm going to make a new one.
07:37 πŸ”— SketchCow yay
07:38 πŸ”— Nemo_bis :)
07:38 πŸ”— Nemo_bis and the new book deriver got rid of some of my redrows, sweet
08:14 πŸ”— ersi God damn it, Yahoo! - what the fuck is your problem.
08:14 πŸ”— ersi I might have some friends who can read Traditional Chinese (Used in for example Taiwan/Republic of China) - havn't talked to them in ages though.
08:25 πŸ”— Nemo_bis ersi: should be easy to find some if it's a quick task
08:29 πŸ”— ersi It's for wretch.cc. We'll need it for finding important structure and for content verification
08:38 πŸ”— Nemo_bis hmm, not so quick, I have only 1 option then
08:39 πŸ”— SketchCow guy: I have Amiga files not in TOSEC. Would like to add them.
08:39 πŸ”— SketchCow me: cool! here's where to go to TOSEC to contribute
08:39 πŸ”— SketchCow guy: Oh, that's requiring to make a login, I'm not gonna do that
08:40 πŸ”— SketchCow So I think he wants me to put the spoon in AND move his chin so he chews it
08:59 πŸ”— godane SketcchCow: Move these to a tekzilla-daily collection when you can: http://archive.org/search.php?query=collection%3Atekzilla%20AND%20subject%3A%22tekzilla%20daily%20tip%22&sort=-date
08:59 πŸ”— godane this is more to keep the tekzilla collection just for full episodes of tekzilla
09:02 πŸ”— godane since there is 1500+ tekzilla daily episodes
09:09 πŸ”— SketchCow tekzilla-daily now created and you own it
09:10 πŸ”— godane thanks
09:18 πŸ”— SketchCow http://archive.org/details/BritishLibraryRdf-2013-09
09:22 πŸ”— Nemo_bis SketchCow: https://twitter.com/BLMetadata/status/387145272951701504 if you want to announce it to them
09:23 πŸ”— godane looks like something is wrong with this item: http://archive.org/details/Tekzilla_Daily_36
09:30 πŸ”— SketchCow It's set dark.
09:30 πŸ”— SketchCow I'll find out why tomorrow.
09:36 πŸ”— godane it patrick talking about pricewatch.com
09:36 πŸ”— godane so in less they think that one was spam i don't see why it would go dark
09:40 πŸ”— godane also looks like episode 41 and 42 of oneoff epsiodes maybe in revision3 bestof collection
09:41 πŸ”— godane there was 3 episodes of e3 2009 live streaming
09:43 πŸ”— godane SketchCow: thought i point out that i have 33 more episodes of geekbeat.tv to be add the collection: http://archive.org/search.php?query=subject%3A%22GeekBeat.TV%22%20AND%20collection%3Aopensource_movies&sort=-date
10:30 πŸ”— Schbirid grab it while you can http://www-users.cs.umn.edu/~sarwat/foursquaredata/
10:44 πŸ”— GLaDOS And anarchive has a copy.
12:23 πŸ”— Keni hi
12:24 πŸ”— Schbirid hello3
12:26 πŸ”— Schbirid and the dataset is gone :D
12:26 πŸ”— Schbirid for the better
12:27 πŸ”— Keni oh
12:28 πŸ”— Keni I feel 24hour like 150~200hour
12:34 πŸ”— joepie91 Schbirid: lol
12:34 πŸ”— * joepie91 has a copy
12:34 πŸ”— Schbirid it looked like quite the gross privacy violation so its for the better to be gone
12:34 πŸ”— joepie91 Schbirid: privacy violation? this is data that users have put on foursquare themselves publicly, no?
12:35 πŸ”— kyan Hi yall, I'm working on downloading elbo.ws by the way
12:35 πŸ”— Schbirid there is a difference between putting data on fq and mass aggregating it
12:35 πŸ”— joepie91 Schbirid: hardly
12:35 πŸ”— ersi Take that discussion somewhere else
12:35 πŸ”— ersi In this channel: Grabbing is GO and OK for whatever reason.
12:35 πŸ”— Schbirid but lets not get into that discussion again, last time people showed to have a different understandnig of privacy than me
12:35 πŸ”— Schbirid aye
12:36 πŸ”— ersi Feel free to talk about privacy/downloading moral in #archiveteam-bs though
12:39 πŸ”— Keni okay, sry both of you. Really sorry.
12:39 πŸ”— ersi Keni: No worries, I'm saying it because everyone needs a reminder occationally. And we're many people, so if we drift off-topic in this channel, something important might get lost.
12:40 πŸ”— norbert79 ersi: Use simple English... He is Japanese.
12:40 πŸ”— norbert79 ersi: He is having hard time understanding...
12:40 πŸ”— Keni thx but I'ts allright
12:41 πŸ”— Keni This is so GAP than learn to school.
12:42 πŸ”— Keni So don't mind that thx.
12:42 πŸ”— norbert79 sure
12:42 πŸ”— norbert79 :)
15:02 πŸ”— SketchCow I want that dataset.
15:13 πŸ”— SketchCow I wish I was awake sooner.
15:14 πŸ”— SketchCow The SECOND datasets like that appear, grab.
15:24 πŸ”— fz I am not sure if Archivebot is still working on Silk Road Forums, but the site is still squirming
15:24 πŸ”— fz was just able to load it a few minutes ago.
15:26 πŸ”— omf_ SketchCow, GLaDOS grabbed a copy of that foursquare data
16:10 πŸ”— godane SketchCow: i failed you on that one
16:11 πŸ”— godane but i found another smaller dataset
16:39 πŸ”— joepie91 SketchCow: check your PM
17:06 πŸ”— yipdw it works well enough at this point
17:06 πŸ”— yipdw !status
17:06 πŸ”— ATBot yipdw: Job status: 5039 completed, 14 aborted, 3 in progress, 0 pending
17:06 πŸ”— yipdw yep
19:27 πŸ”— Nemo_bis https://www.mediawiki.org/w/index.php?title=Language_portal&diff=prev&oldid=797446
19:40 πŸ”— Schbirid anyone got some good wget --reject-regexp for blogspot sites to reduce duplicates and search result shite?
19:46 πŸ”— omf_ Schbirid, let us know if you find anything, that sounds really useful
19:56 πŸ”— Nemo_bis Schbirid: and remember to update http://archiveteam.org/index.php?title=Blogger
19:57 πŸ”— Schbirid nice, thanks
19:58 πŸ”— omf_ Do you have url lists from a few sites?
20:00 πŸ”— Schbirid nope
20:04 πŸ”— Schbirid any idea what the "*\\?*,*@*" is supposed to reject?
21:15 πŸ”— Nemo_bis He went away, but I guess any URL with parameters?
21:29 πŸ”— Nemo_bis Funny http://oami.europa.eu/robots.txt
22:02 πŸ”— diffalot CAT SIGNAL ACTIVATED: blip.tv is deleting years of vloggers videos, can you help?
22:03 πŸ”— diffalot i'm checking that archive.org is willing to ingest it all
22:05 πŸ”— omf_ diffalot, start by giving us a link to the announcement page
22:06 πŸ”— diffalot no page, this is something blip is quietly doing, see tweets from https://twitter.com/schlomo , quirk, and trine
22:07 πŸ”— diffalot here's a news story: http://www.zennie62blog.com/2013/10/08/blip-tv-er-blip-networks-sacks-ceo-kelly-day-shortest-exec-career-since-john-paul-i-24113/
22:08 πŸ”— diffalot archive.org says, "hell yes" https://twitter.com/tracey_pooh/status/387700340176351233
22:12 πŸ”— omf_ Well this is an interesting problem. How to find the vloggers they are going to erase
22:15 πŸ”— omf_ and I already want to shit on the heads of the developers of blip.tv
22:16 πŸ”— omf_ good we already have a page http://archiveteam.org/index.php?title=Blip.tv
22:19 πŸ”— SketchCow It's 30 days.
22:19 πŸ”— SketchCow We have 30 days.
22:20 πŸ”— diffalot perhaps blip would provide a list? or we create an opt-in form? i'm not seeing any mediaRSS feeds on the user profile pages in question
22:22 πŸ”— diffalot i'm ok with phantomJS and jsdom, so i'll see what i can do
22:24 πŸ”— joepie91 I guess that the last ditch effort would be
22:25 πŸ”— joepie91 "just archive all of blip and we'll figure out what's gone later"
22:28 πŸ”— diffalot i'm looking for an example of a past scraper the team has used, any recommendations?
22:29 πŸ”— diffalot iirc, y'all have some sophisticated turnkey solutions ;)
22:33 πŸ”— omf_ While I am adding info I find to the wiki about blip.tv I would like to remind everyone we had serious server problems during backing up zapd and frankly blip.tv is going to require bigger metal to suck down that much data
22:33 πŸ”— SketchCow what sort of server problems.
22:34 πŸ”— omf_ we went down, we ran out of space, the usual
22:34 πŸ”— SketchCow Well, that's because the same people aren't using the tracker - we'll have the use of FOS for a dump.
22:34 πŸ”— omf_ the hosting company randomly turns off or reboots the server
22:39 πŸ”— SketchCow That's because people take over the project and use their own central servers instead of internet archive.
23:04 πŸ”— omf_ I am searching the commoncrawl index for urls
23:07 πŸ”— diffalot ah ha: http://blip.tv/schlomo/rss/
23:07 πŸ”— diffalot (must be turned on by the producer?)
23:15 πŸ”— diffalot WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
23:17 πŸ”— omf_ THY SECRET WORD is "yahoosucks" GO FORTH AND IMPART THY KNOWLEDGE
23:17 πŸ”— * diffalot kneels and accepts the mantle
23:17 πŸ”— omf_ SketchCow, we need a cool name to make an irc channel
23:19 πŸ”— bsmith093 bloop
23:25 πŸ”— kyan Has anyone here looked into working with the Majestic-12 project to find usernames for websites and such? It seems like it could be a really valuable source of data (they have ~2.7 trillion URLs in their databases)Ҁ¦
23:27 πŸ”— omf_ kyan, url?
23:27 πŸ”— kyan omf_: this is their "real" website: http://www.majestic12.co.uk/ This is their commercial website: https://www.majesticseo.com/
23:39 πŸ”— diffalot contacting the public relations team at blip (http://annieisms.com/about/), good idea or bad idea?
23:40 πŸ”— Cameron_D kyan: I do a lot of MJ12 crawling, I've considered asking them in the past, but due to the commercial nature of what they do I doubt they'd work with us.
23:41 πŸ”— diffalot the question would be: can we get a list of the shows that are being deleted?
23:41 πŸ”— kyan Cameron_D, that would be understandable
23:41 πŸ”— omf_ and yet they use the public to do the bulk of the world
23:43 πŸ”— kyan AT wouldn't be using the data for profitҀ¦ might be worth asking
23:43 πŸ”— Cameron_D yeah, maybe

irclogger-viewer