#archiveteam 2013-10-08,Tue

↑back Search

Time	Nickname	Message
00:30 ^🔗	kyan	I've downloaded all of the Insurgency Wiki that is on the porfusion website. Is there something I should do with the data?
00:32 ^🔗	arkhive	Might be widely know.. I just found out though.. Bebo's old Bebo pages are now under the .archive.bebo.com/
00:33 ^🔗	arkhive	and http://archive.bebo.com/Profile.jsp?MemberId=4134490647
00:33 ^🔗	arkhive	example
00:33 ^🔗	arkhive	http://www.bebo.com/#faq Your old photos and blog posts are safe. They will be available for download in a couple of months. Other things (skins, quizzes, wall posts, games etcâ¦) unfortunately will all be retired.
00:34 ^🔗	arkhive	Weâre just as sentimental as you are (ok, probably more), and have left all public profiles visible for now. Private profiles are also saved, but not visible at the moment.
00:35 ^🔗	arkhive	It might be time to put in the AT Warrior and start a tracker and save/grab what we can
00:44 ^🔗	arkhive	Lol. but i am not that experienced to do so, yet. If I can help any way along with my bandwidth I'd love to. I had a Bebo years ago. Heh.
03:47 ^🔗	chfoo	i made the wretch wiki page: http://archiveteam.org/index.php?title=Wretch
04:15 ^🔗	chfoo	i made the bebo wiki page: http://archiveteam.org/index.php?title=Bebo
04:39 ^🔗	SketchCow	Wheeee
04:39 ^🔗	SketchCow	Did zapd die?
04:43 ^🔗	SketchCow	Nope, still there.
04:43 ^🔗	yipdw	hasn't been zapd out of existence yet
04:44 ^🔗	link343	So an old music Blog / Aggregator is shutting down
04:44 ^🔗	link343	http://pitchfork.com/news/52578-music-blog-aggregator-elbows-shuts-down/
04:44 ^🔗	link343	End of November. Do you think it's worth archiving?
04:50 ^🔗	bsmith093	omf_: at some point the rsync died and now i get this sending incremental file list it hung on this for at least 3 hours, after i tried to re run the rsync
05:09 ^🔗	chfoo	link343: since no one else has answered yet, i'll say yes. i'll keep an eye on it.
05:10 ^🔗	link343	ok
06:39 ^🔗	Nemo_bis	SketchCow: could you move all wikimediacommons* items in wikimedia-other collection to wikimediacommons collection?
06:39 ^🔗	Nemo_bis	(the collection was somehow broken but seems to work now)
06:42 ^🔗	Nemo_bis	are these on archive.org? http://www.bl.uk/bibliographic/download.html
07:25 ^🔗	SketchCow	http://archive.org/details/BritishLibraryRdf earlier one
07:37 ^🔗	SketchCow	I'm going to make a new one.
07:37 ^🔗	SketchCow	yay
07:38 ^🔗	Nemo_bis	:)
07:38 ^🔗	Nemo_bis	and the new book deriver got rid of some of my redrows, sweet
08:14 ^🔗	ersi	God damn it, Yahoo! - what the fuck is your problem.
08:14 ^🔗	ersi	I might have some friends who can read Traditional Chinese (Used in for example Taiwan/Republic of China) - havn't talked to them in ages though.
08:25 ^🔗	Nemo_bis	ersi: should be easy to find some if it's a quick task
08:29 ^🔗	ersi	It's for wretch.cc. We'll need it for finding important structure and for content verification
08:38 ^🔗	Nemo_bis	hmm, not so quick, I have only 1 option then
08:39 ^🔗	SketchCow	guy: I have Amiga files not in TOSEC. Would like to add them.
08:39 ^🔗	SketchCow	me: cool! here's where to go to TOSEC to contribute
08:39 ^🔗	SketchCow	guy: Oh, that's requiring to make a login, I'm not gonna do that
08:40 ^🔗	SketchCow	So I think he wants me to put the spoon in AND move his chin so he chews it
08:59 ^🔗	godane	SketcchCow: Move these to a tekzilla-daily collection when you can: http://archive.org/search.php?query=collection%3Atekzilla%20AND%20subject%3A%22tekzilla%20daily%20tip%22&sort=-date
08:59 ^🔗	godane	this is more to keep the tekzilla collection just for full episodes of tekzilla
09:02 ^🔗	godane	since there is 1500+ tekzilla daily episodes
09:09 ^🔗	SketchCow	tekzilla-daily now created and you own it
09:10 ^🔗	godane	thanks
09:18 ^🔗	SketchCow	http://archive.org/details/BritishLibraryRdf-2013-09
09:22 ^🔗	Nemo_bis	SketchCow: https://twitter.com/BLMetadata/status/387145272951701504 if you want to announce it to them
09:23 ^🔗	godane	looks like something is wrong with this item: http://archive.org/details/Tekzilla_Daily_36
09:30 ^🔗	SketchCow	It's set dark.
09:30 ^🔗	SketchCow	I'll find out why tomorrow.
09:36 ^🔗	godane	it patrick talking about pricewatch.com
09:36 ^🔗	godane	so in less they think that one was spam i don't see why it would go dark
09:40 ^🔗	godane	also looks like episode 41 and 42 of oneoff epsiodes maybe in revision3 bestof collection
09:41 ^🔗	godane	there was 3 episodes of e3 2009 live streaming
09:43 ^🔗	godane	SketchCow: thought i point out that i have 33 more episodes of geekbeat.tv to be add the collection: http://archive.org/search.php?query=subject%3A%22GeekBeat.TV%22%20AND%20collection%3Aopensource_movies&sort=-date
10:30 ^🔗	Schbirid	grab it while you can http://www-users.cs.umn.edu/~sarwat/foursquaredata/
10:44 ^🔗	GLaDOS	And anarchive has a copy.
12:23 ^🔗	Keni	hi
12:24 ^🔗	Schbirid	hello3
12:26 ^🔗	Schbirid	and the dataset is gone :D
12:26 ^🔗	Schbirid	for the better
12:27 ^🔗	Keni	oh
12:28 ^🔗	Keni	I feel 24hour like 150~200hour
12:34 ^🔗	joepie91	Schbirid: lol
12:34 ^🔗	*	joepie91 has a copy
12:34 ^🔗	Schbirid	it looked like quite the gross privacy violation so its for the better to be gone
12:34 ^🔗	joepie91	Schbirid: privacy violation? this is data that users have put on foursquare themselves publicly, no?
12:35 ^🔗	kyan	Hi yall, I'm working on downloading elbo.ws by the way
12:35 ^🔗	Schbirid	there is a difference between putting data on fq and mass aggregating it
12:35 ^🔗	joepie91	Schbirid: hardly
12:35 ^🔗	ersi	Take that discussion somewhere else
12:35 ^🔗	ersi	In this channel: Grabbing is GO and OK for whatever reason.
12:35 ^🔗	Schbirid	but lets not get into that discussion again, last time people showed to have a different understandnig of privacy than me
12:35 ^🔗	Schbirid	aye
12:36 ^🔗	ersi	Feel free to talk about privacy/downloading moral in #archiveteam-bs though
12:39 ^🔗	Keni	okay, sry both of you. Really sorry.
12:39 ^🔗	ersi	Keni: No worries, I'm saying it because everyone needs a reminder occationally. And we're many people, so if we drift off-topic in this channel, something important might get lost.
12:40 ^🔗	norbert79	ersi: Use simple English... He is Japanese.
12:40 ^🔗	norbert79	ersi: He is having hard time understanding...
12:40 ^🔗	Keni	thx but I'ts allright
12:41 ^🔗	Keni	This is so GAP than learn to school.
12:42 ^🔗	Keni	So don't mind that thx.
12:42 ^🔗	norbert79	sure
12:42 ^🔗	norbert79	:)
15:02 ^🔗	SketchCow	I want that dataset.
15:13 ^🔗	SketchCow	I wish I was awake sooner.
15:14 ^🔗	SketchCow	The SECOND datasets like that appear, grab.
15:24 ^🔗	fz	I am not sure if Archivebot is still working on Silk Road Forums, but the site is still squirming
15:24 ^🔗	fz	was just able to load it a few minutes ago.
15:26 ^🔗	omf_	SketchCow, GLaDOS grabbed a copy of that foursquare data
16:10 ^🔗	godane	SketchCow: i failed you on that one
16:11 ^🔗	godane	but i found another smaller dataset
16:39 ^🔗	joepie91	SketchCow: check your PM
17:06 ^🔗	yipdw	it works well enough at this point
17:06 ^🔗	yipdw	!status
17:06 ^🔗	ATBot	yipdw: Job status: 5039 completed, 14 aborted, 3 in progress, 0 pending
17:06 ^🔗	yipdw	yep
19:27 ^🔗	Nemo_bis	https://www.mediawiki.org/w/index.php?title=Language_portal&diff=prev&oldid=797446
19:40 ^🔗	Schbirid	anyone got some good wget --reject-regexp for blogspot sites to reduce duplicates and search result shite?
19:46 ^🔗	omf_	Schbirid, let us know if you find anything, that sounds really useful
19:56 ^🔗	Nemo_bis	Schbirid: and remember to update http://archiveteam.org/index.php?title=Blogger
19:57 ^🔗	Schbirid	nice, thanks
19:58 ^🔗	omf_	Do you have url lists from a few sites?
20:00 ^🔗	Schbirid	nope
20:04 ^🔗	Schbirid	any idea what the "\\?,@" is supposed to reject?
21:15 ^🔗	Nemo_bis	He went away, but I guess any URL with parameters?
21:29 ^🔗	Nemo_bis	Funny http://oami.europa.eu/robots.txt
22:02 ^🔗	diffalot	CAT SIGNAL ACTIVATED: blip.tv is deleting years of vloggers videos, can you help?
22:03 ^🔗	diffalot	i'm checking that archive.org is willing to ingest it all
22:05 ^🔗	omf_	diffalot, start by giving us a link to the announcement page
22:06 ^🔗	diffalot	no page, this is something blip is quietly doing, see tweets from https://twitter.com/schlomo , quirk, and trine
22:07 ^🔗	diffalot	here's a news story: http://www.zennie62blog.com/2013/10/08/blip-tv-er-blip-networks-sacks-ceo-kelly-day-shortest-exec-career-since-john-paul-i-24113/
22:08 ^🔗	diffalot	archive.org says, "hell yes" https://twitter.com/tracey_pooh/status/387700340176351233
22:12 ^🔗	omf_	Well this is an interesting problem. How to find the vloggers they are going to erase
22:15 ^🔗	omf_	and I already want to shit on the heads of the developers of blip.tv
22:16 ^🔗	omf_	good we already have a page http://archiveteam.org/index.php?title=Blip.tv
22:19 ^🔗	SketchCow	It's 30 days.
22:19 ^🔗	SketchCow	We have 30 days.
22:20 ^🔗	diffalot	perhaps blip would provide a list? or we create an opt-in form? i'm not seeing any mediaRSS feeds on the user profile pages in question
22:22 ^🔗	diffalot	i'm ok with phantomJS and jsdom, so i'll see what i can do
22:24 ^🔗	joepie91	I guess that the last ditch effort would be
22:25 ^🔗	joepie91	"just archive all of blip and we'll figure out what's gone later"
22:28 ^🔗	diffalot	i'm looking for an example of a past scraper the team has used, any recommendations?
22:29 ^🔗	diffalot	iirc, y'all have some sophisticated turnkey solutions ;)
22:33 ^🔗	omf_	While I am adding info I find to the wiki about blip.tv I would like to remind everyone we had serious server problems during backing up zapd and frankly blip.tv is going to require bigger metal to suck down that much data
22:33 ^🔗	SketchCow	what sort of server problems.
22:34 ^🔗	omf_	we went down, we ran out of space, the usual
22:34 ^🔗	SketchCow	Well, that's because the same people aren't using the tracker - we'll have the use of FOS for a dump.
22:34 ^🔗	omf_	the hosting company randomly turns off or reboots the server
22:39 ^🔗	SketchCow	That's because people take over the project and use their own central servers instead of internet archive.
23:04 ^🔗	omf_	I am searching the commoncrawl index for urls
23:07 ^🔗	diffalot	ah ha: http://blip.tv/schlomo/rss/
23:07 ^🔗	diffalot	(must be turned on by the producer?)
23:15 ^🔗	diffalot	WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
23:17 ^🔗	omf_	THY SECRET WORD is "yahoosucks" GO FORTH AND IMPART THY KNOWLEDGE
23:17 ^🔗	*	diffalot kneels and accepts the mantle
23:17 ^🔗	omf_	SketchCow, we need a cool name to make an irc channel
23:19 ^🔗	bsmith093	bloop
23:25 ^🔗	kyan	Has anyone here looked into working with the Majestic-12 project to find usernames for websites and such? It seems like it could be a really valuable source of data (they have ~2.7 trillion URLs in their databases)â¦
23:27 ^🔗	omf_	kyan, url?
23:27 ^🔗	kyan	omf_: this is their "real" website: http://www.majestic12.co.uk/ This is their commercial website: https://www.majesticseo.com/
23:39 ^🔗	diffalot	contacting the public relations team at blip (http://annieisms.com/about/), good idea or bad idea?
23:40 ^🔗	Cameron_D	kyan: I do a lot of MJ12 crawling, I've considered asking them in the past, but due to the commercial nature of what they do I doubt they'd work with us.
23:41 ^🔗	diffalot	the question would be: can we get a list of the shows that are being deleted?
23:41 ^🔗	kyan	Cameron_D, that would be understandable
23:41 ^🔗	omf_	and yet they use the public to do the bulk of the world
23:43 ^🔗	kyan	AT wouldn't be using the data for profitâ¦ might be worth asking
23:43 ^🔗	Cameron_D	yeah, maybe

irclogger-viewer