#archiveteam 2014-11-05,Wed

↑back Search

Time	Nickname	Message
05:28 ^🔗	wp494	we should probably do some research into looking at grabbing the win10 feedback stuff
06:45 ^🔗	wp494	HEY
06:45 ^🔗	wp494	THINGS ARE SUTTING DOWN
06:45 ^🔗	wp494	over at dawngate
06:45 ^🔗	wp494	https://www.dawngate.com/news/detail/an-important-announcement-about-the_32964
07:16 ^🔗	joepie91	wp494: asked them to release source yet?
07:33 ^🔗	arkiver	wp494: I'll take a good look at dawngate later today
17:32 ^🔗	chfoo	viddy.com is shutting down december 15
17:33 ^🔗	arkiver	woooh
17:33 ^🔗	arkiver	new project
17:35 ^🔗	balrog	http://tidbits.com/article/15212
17:35 ^🔗	balrog	http://robservatory.com/rip-mac-os-x-hints-nov-4-2000-nov-4-2014/
17:39 ^🔗	arkiver	chfoo: I can create the scripts to download the items from viddy
17:39 ^🔗	arkiver	chfoo: but getting a list of items to be downloaded is going to be a problem
17:40 ^🔗	Smiley	crawl job?
17:40 ^🔗	chfoo	you need to iterate the cdx api to get a list of things already fetched and then scrape off google and bing using the existing url keywords
17:41 ^🔗	arkiver	yeah
17:41 ^🔗	arkiver	with cdx api you mean wayback machine's api? or from viddy?
17:42 ^🔗	*	arkiver is back in an hour
17:42 ^🔗	chfoo	archive.org's cdx api or wayback machine api. whichever helps avoid dedup and discovers more urls
17:44 ^🔗	Smiley	hmmm viddy.com/username/v/video_id
17:54 ^🔗	espes___	also viddy.com/media/video_guid
17:55 ^🔗	espes___	hmm http://developer.viddy.com/docs/media
17:56 ^🔗	espes___	the app and api has search
18:07 ^🔗	antomatic	Bieber located!: www.viddy.com/JustinBieber/v/soundcheck-LDS3p9
18:08 ^🔗	antomatic	Further Bieber: http://www.viddy.com/ilybiebercrew/v/baab-W47bSY
18:08 ^🔗	antomatic	That young man will be a star one day, I just kow it.
18:09 ^🔗	arkiver	yeah
18:09 ^🔗	arkiver	saw those api's
18:09 ^🔗	arkiver	will be working on this
18:10 ^🔗	antomatic	Save The Bieber.
18:40 ^🔗	Smiley	lol i found bieber right away
18:40 ^🔗	Smiley	antomatic: thats the one i found
18:41 ^🔗	antomatic	Clearly a case of bieber fever.
18:41 ^🔗	Smiley	made me reconcider this whole project :D
18:44 ^🔗	antomatic	Any website frequented by key thought leaders such as Justin Bieber and MTV are clearly on the rise
18:46 ^🔗	Smiley	then again re:bieber, those who forget history are doomed to repeat it
18:46 ^🔗	Smiley	THINK OF THE CHILDREN!.
18:49 ^🔗	db48x	heh
18:50 ^🔗	aaaaaaaaa	https://www.youtube.com/watch?v=RybNI0KB1bg
19:51 ^🔗	balrog	http://features.jsomers.net/how-i-reverse-engineered-google-docs/
19:51 ^🔗	balrog	I can see some uses for that. :)
20:10 ^🔗	*	SketchCow thinks of children. mmmmm
20:10 ^🔗	SketchCow	I'm still packing up FOS stuff into archive.org.
20:10 ^🔗	SketchCow	Handed off twitpic creds for that upload.
20:19 ^🔗	tfgbd	How do you guys usually save Facebook profiles?
20:23 ^🔗	arkiver	SketchCow: I'm looking at the cdx files from halo
20:23 ^🔗	arkiver	there seem to be some things missing?
20:23 ^🔗	arkiver	but first, what exactly is this line:
20:23 ^🔗	arkiver	- 11846 1835689521 archiveteam_halo_20141105063141/halo_20141105063141.megawarc.warc.gz
20:23 ^🔗	arkiver	net,bungie,halo)/stats/odstg.aspx?gameguid=967580589871585280 20141104194757 http://halo.bungie.net/Stats/odstg.aspx?gameguid=967580589871585280 text/html 200 7MRD7NYHX2SSCZ7RTM3URW725BI3KGPK - -
20:24 ^🔗	arkiver	I'm not sure how the cdx's are being read
20:24 ^🔗	arkiver	but to me it looks like the last part of a line has been added to the first part of the next line
20:25 ^🔗	arkiver	but that's probably just generated by IA right?
20:25 ^🔗	arkiver	the derivers I mean ^
20:26 ^🔗	arkiver	second, it looks like the screenshots are missing in the cdx file (not sure about the warc)
20:26 ^🔗	arkiver	what I'm thinking is that IA leaves the "ssid" part out of the url when deriving into cdx
20:27 ^🔗	arkiver	so what has to be this url:
20:27 ^🔗	arkiver	http://halo.bungie.net/Stats/Halo3/Screenshot.ashx?size=full&ssid=74B7A3ABDB872A2EE59203CAB3E6A494
20:27 ^🔗	SketchCow	No idea, frankly.
20:27 ^🔗	SketchCow	This is megawarc putting them together.
20:27 ^🔗	arkiver	becomes this url in the wayback machine: http://halo.bungie.net/Stats/Halo3/Screenshot.ashx?size=full
20:27 ^🔗	chfoo	arkiver: see https://github.com/internetarchive/CDX-Writer
20:27 ^🔗	arkiver	I haven't taken a look yet at the warc
20:27 ^🔗	SketchCow	Tell me if I'm stopping
20:28 ^🔗	arkiver	I'll take a look at the warc
20:28 ^🔗	arkiver	... when it is downloaded
20:28 ^🔗	SketchCow	Talking to our team
20:30 ^🔗	arkiver	thanks. so I'm mainly concerned about the cdx not having the ssid from the screenshot url
20:31 ^🔗	arkiver	I'm also seeing some url in the cdx that need to be excluded
20:34 ^🔗	arkiver	SketchCow: nevermind about that first error, where I thought it was added the last part of a line to the start of the next line.
20:34 ^🔗	arkiver	that was just my text converter
20:34 ^🔗	arkiver	viewer*
20:35 ^🔗	arkiver	ah got it
20:35 ^🔗	arkiver	net,bungie,halo)/stats/halo3/screenshot.ashx?s&size=medium 20141104203046 http://halo.bungie.net/Stats/Halo3/Screenshot.ashx?size=medium&ssid=46ABD023C467371D0FD6CE14E95C10FC image/jpeg 200 66RFBFMWA3AXC3A54KEQI5CL4LSHON57 - - 32825 16531636453 archiveteam_halo_20141105063141/halo_20141105063141.megawarc.warc.gz
20:35 ^🔗	SketchCow	I need you to speak to kenji@archive.org if you have concerns.
20:35 ^🔗	SketchCow	OK?
20:35 ^🔗	arkiver	ok
20:35 ^🔗	SketchCow	Otherwise in they go for the moment.
20:35 ^🔗	SketchCow	I can't hold them in stasis on FOS, it's at 96% full
20:36 ^🔗	arkiver	so that first url doesn't have the ssid in the line, but the second one, after the date, has it
20:36 ^🔗	arkiver	SketchCow: I'll see how it looks when indexed in the wayback machine
20:37 ^🔗	SketchCow	OK.
20:40 ^🔗	chfoo	arkiver: cdx-writer uses surt which is supposed to strip out sids https://github.com/internetarchive/surt
20:42 ^🔗	arkiver	chfoo: I see, that might be a problem then
20:45 ^🔗	chfoo	i know archivebot grabbed forums where the session id doesn't go away. if you have time, you can could check the wayback machine on how it handled the urls.
20:51 ^🔗	arkiver	chfoo: I'll do that
21:30 ^🔗	zgrant	Hi Archive Team.
21:30 ^🔗	zgrant	I just received an email stating Jux was shutting down.
21:30 ^🔗	zgrant	I'm guessing you all know this, but just in case I thought I'd contact you about it.
21:31 ^🔗	joepie91	zgrant: jux.com?
21:31 ^🔗	zgrant	yes.
21:31 ^🔗	Void_	sites down atm lel
21:31 ^🔗	joepie91	I get a black page, and it's infinitely loading
21:31 ^🔗	joepie91	and I wasn't aware of it yet :P
21:32 ^🔗	zgrant	From their email:
21:32 ^🔗	joepie91	okay, so a kind of non-ephemeral instagram?
21:32 ^🔗	zgrant	We realize that some of you have media stored on the Jux servers that you would like to preserve, so we have a link for you that will enable you to retrieve and download all your stored media files with one simple step. Visit farewell.jux.com and sign in with your Jux credentials to export your media files.
21:32 ^🔗	zgrant	So, I guess farewell.jux.com
21:32 ^🔗	zgrant	I should have read the email first. \
21:32 ^🔗	Void_	ah farewell loads
21:33 ^🔗	zgrant	@joepie91: Yes, I think so.
21:33 ^🔗	Void_	well you have to sign in to grab files
21:33 ^🔗	joepie91	looks like they've been threatening to shut down before
21:33 ^🔗	joepie91	http://thenextweb.com/insider/2013/05/25/jux-closing-down-citing-slow-growth/
21:33 ^🔗	zgrant	@joepie91 I looked at it about a year ago to see if I could use it for instruction.
21:33 ^🔗	Void_	so im not sure how that would work if archiveteam were to try and grab
21:34 ^🔗	joepie91	Void_: nope, public pages
21:34 ^🔗	zgrant	@joepie91 yes, I remember that, but someone came along and helped them stay afloat.
21:34 ^🔗	joepie91	Void_: eg. random thing I ran across when googling "jux.com farewell": https://brisbane-down-under.jux.com/1953333
21:34 ^🔗	Void_	doesnt load for me
21:35 ^🔗	joepie91	does for me :P
21:35 ^🔗	Void_	hmm
21:35 ^🔗	joepie91	guess their infra is a bit screwed right now
21:35 ^🔗	Void_	ye
21:36 ^🔗	schbirid	midas: i've been thinking of crawling the boerse sites while logged in to harvest precious links
21:36 ^🔗	Kazzy	time for a nice google/bing/twitter scrape
22:25 ^🔗	joepie91	SketchCow: http://motherboard.vice.com/read/two-decades-ago-a-cyber-hippie-collective-launched-a-ddos-attack-to-save-rave?utm_source=mbfb
22:25 ^🔗	joepie91	"Except that this history is largely unwritten, and much of it is no longer online. The early act of online protest was organised through newsletters, bulletin board systems, and student networks, as well as leaflets and zines. Few traces remain: sites have been shut down, information scrubbed from the internet as the community grew up."
22:25 ^🔗	joepie91	this seems like something relevant to your interests :)

irclogger-viewer