#archiveteam 2014-11-05,Wed

↑back Search

Time Nickname Message
05:28 🔗 wp494 we should probably do some research into looking at grabbing the win10 feedback stuff
06:45 🔗 wp494 HEY
06:45 🔗 wp494 THINGS ARE SUTTING DOWN
06:45 🔗 wp494 over at dawngate
06:45 🔗 wp494 https://www.dawngate.com/news/detail/an-important-announcement-about-the_32964
07:16 🔗 joepie91 wp494: asked them to release source yet?
07:33 🔗 arkiver wp494: I'll take a good look at dawngate later today
17:32 🔗 chfoo viddy.com is shutting down december 15
17:33 🔗 arkiver woooh
17:33 🔗 arkiver new project
17:35 🔗 balrog http://tidbits.com/article/15212
17:35 🔗 balrog http://robservatory.com/rip-mac-os-x-hints-nov-4-2000-nov-4-2014/
17:39 🔗 arkiver chfoo: I can create the scripts to download the items from viddy
17:39 🔗 arkiver chfoo: but getting a list of items to be downloaded is going to be a problem
17:40 🔗 Smiley crawl job?
17:40 🔗 chfoo you need to iterate the cdx api to get a list of things already fetched and then scrape off google and bing using the existing url keywords
17:41 🔗 arkiver yeah
17:41 🔗 arkiver with cdx api you mean wayback machine's api? or from viddy?
17:42 🔗 * arkiver is back in an hour
17:42 🔗 chfoo archive.org's cdx api or wayback machine api. whichever helps avoid dedup and discovers more urls
17:44 🔗 Smiley hmmm viddy.com/username/v/video_id
17:54 🔗 espes___ also viddy.com/media/video_guid
17:55 🔗 espes___ hmm http://developer.viddy.com/docs/media
17:56 🔗 espes___ the app and api has search
18:07 🔗 antomatic Bieber located!: www.viddy.com/JustinBieber/v/soundcheck-LDS3p9
18:08 🔗 antomatic Further Bieber: http://www.viddy.com/ilybiebercrew/v/baab-W47bSY
18:08 🔗 antomatic That young man will be a star one day, I just kow it.
18:09 🔗 arkiver yeah
18:09 🔗 arkiver saw those api's
18:09 🔗 arkiver will be working on this
18:10 🔗 antomatic Save The Bieber.
18:40 🔗 Smiley lol i found bieber right away
18:40 🔗 Smiley antomatic: thats the one i found
18:41 🔗 antomatic Clearly a case of bieber fever.
18:41 🔗 Smiley made me reconcider this whole project :D
18:44 🔗 antomatic Any website frequented by key thought leaders such as Justin Bieber and MTV are clearly on the rise
18:46 🔗 Smiley then again re:bieber, those who forget history are doomed to repeat it
18:46 🔗 Smiley THINK OF THE CHILDREN!.
18:49 🔗 db48x heh
18:50 🔗 aaaaaaaaa https://www.youtube.com/watch?v=RybNI0KB1bg
19:51 🔗 balrog http://features.jsomers.net/how-i-reverse-engineered-google-docs/
19:51 🔗 balrog I can see some uses for that. :)
20:10 🔗 * SketchCow thinks of children. mmmmm
20:10 🔗 SketchCow I'm still packing up FOS stuff into archive.org.
20:10 🔗 SketchCow Handed off twitpic creds for that upload.
20:19 🔗 tfgbd How do you guys usually save Facebook profiles?
20:23 🔗 arkiver SketchCow: I'm looking at the cdx files from halo
20:23 🔗 arkiver there seem to be some things missing?
20:23 🔗 arkiver but first, what exactly is this line:
20:23 🔗 arkiver - 11846 1835689521 archiveteam_halo_20141105063141/halo_20141105063141.megawarc.warc.gz
20:23 🔗 arkiver net,bungie,halo)/stats/odstg.aspx?gameguid=967580589871585280 20141104194757 http://halo.bungie.net/Stats/odstg.aspx?gameguid=967580589871585280 text/html 200 7MRD7NYHX2SSCZ7RTM3URW725BI3KGPK - -
20:24 🔗 arkiver I'm not sure how the cdx's are being read
20:24 🔗 arkiver but to me it looks like the last part of a line has been added to the first part of the next line
20:25 🔗 arkiver but that's probably just generated by IA right?
20:25 🔗 arkiver the derivers I mean ^
20:26 🔗 arkiver second, it looks like the screenshots are missing in the cdx file (not sure about the warc)
20:26 🔗 arkiver what I'm thinking is that IA leaves the "ssid" part out of the url when deriving into cdx
20:27 🔗 arkiver so what has to be this url:
20:27 🔗 arkiver http://halo.bungie.net/Stats/Halo3/Screenshot.ashx?size=full&ssid=74B7A3ABDB872A2EE59203CAB3E6A494
20:27 🔗 SketchCow No idea, frankly.
20:27 🔗 SketchCow This is megawarc putting them together.
20:27 🔗 arkiver becomes this url in the wayback machine: http://halo.bungie.net/Stats/Halo3/Screenshot.ashx?size=full
20:27 🔗 chfoo arkiver: see https://github.com/internetarchive/CDX-Writer
20:27 🔗 arkiver I haven't taken a look yet at the warc
20:27 🔗 SketchCow Tell me if I'm stopping
20:28 🔗 arkiver I'll take a look at the warc
20:28 🔗 arkiver ... when it is downloaded
20:28 🔗 SketchCow Talking to our team
20:30 🔗 arkiver thanks. so I'm mainly concerned about the cdx not having the ssid from the screenshot url
20:31 🔗 arkiver I'm also seeing some url in the cdx that need to be excluded
20:34 🔗 arkiver SketchCow: nevermind about that first error, where I thought it was added the last part of a line to the start of the next line.
20:34 🔗 arkiver that was just my text converter
20:34 🔗 arkiver viewer*
20:35 🔗 arkiver ah got it
20:35 🔗 arkiver net,bungie,halo)/stats/halo3/screenshot.ashx?s&size=medium 20141104203046 http://halo.bungie.net/Stats/Halo3/Screenshot.ashx?size=medium&ssid=46ABD023C467371D0FD6CE14E95C10FC image/jpeg 200 66RFBFMWA3AXC3A54KEQI5CL4LSHON57 - - 32825 16531636453 archiveteam_halo_20141105063141/halo_20141105063141.megawarc.warc.gz
20:35 🔗 SketchCow I need you to speak to kenji@archive.org if you have concerns.
20:35 🔗 SketchCow OK?
20:35 🔗 arkiver ok
20:35 🔗 SketchCow Otherwise in they go for the moment.
20:35 🔗 SketchCow I can't hold them in stasis on FOS, it's at 96% full
20:36 🔗 arkiver so that first url doesn't have the ssid in the line, but the second one, after the date, has it
20:36 🔗 arkiver SketchCow: I'll see how it looks when indexed in the wayback machine
20:37 🔗 SketchCow OK.
20:40 🔗 chfoo arkiver: cdx-writer uses surt which is supposed to strip out sids https://github.com/internetarchive/surt
20:42 🔗 arkiver chfoo: I see, that might be a problem then
20:45 🔗 chfoo i know archivebot grabbed forums where the session id doesn't go away. if you have time, you can could check the wayback machine on how it handled the urls.
20:51 🔗 arkiver chfoo: I'll do that
21:30 🔗 zgrant Hi Archive Team.
21:30 🔗 zgrant I just received an email stating Jux was shutting down.
21:30 🔗 zgrant I'm guessing you all know this, but just in case I thought I'd contact you about it.
21:31 🔗 joepie91 zgrant: jux.com?
21:31 🔗 zgrant yes.
21:31 🔗 Void_ sites down atm lel
21:31 🔗 joepie91 I get a black page, and it's infinitely loading
21:31 🔗 joepie91 and I wasn't aware of it yet :P
21:32 🔗 zgrant From their email:
21:32 🔗 joepie91 okay, so a kind of non-ephemeral instagram?
21:32 🔗 zgrant We realize that some of you have media stored on the Jux servers that you would like to preserve, so we have a link for you that will enable you to retrieve and download all your stored media files with one simple step. Visit farewell.jux.com and sign in with your Jux credentials to export your media files.
21:32 🔗 zgrant So, I guess farewell.jux.com
21:32 🔗 zgrant I should have read the email first. \
21:32 🔗 Void_ ah farewell loads
21:33 🔗 zgrant @joepie91: Yes, I think so.
21:33 🔗 Void_ well you have to sign in to grab files
21:33 🔗 joepie91 looks like they've been threatening to shut down before
21:33 🔗 joepie91 http://thenextweb.com/insider/2013/05/25/jux-closing-down-citing-slow-growth/
21:33 🔗 zgrant @joepie91 I looked at it about a year ago to see if I could use it for instruction.
21:33 🔗 Void_ so im not sure how that would work if archiveteam were to try and grab
21:34 🔗 joepie91 Void_: nope, public pages
21:34 🔗 zgrant @joepie91 yes, I remember that, but someone came along and helped them stay afloat.
21:34 🔗 joepie91 Void_: eg. random thing I ran across when googling "jux.com farewell": https://brisbane-down-under.jux.com/1953333
21:34 🔗 Void_ doesnt load for me
21:35 🔗 joepie91 does for me :P
21:35 🔗 Void_ hmm
21:35 🔗 joepie91 guess their infra is a bit screwed right now
21:35 🔗 Void_ ye
21:36 🔗 schbirid midas: i've been thinking of crawling the boerse sites while logged in to harvest precious links
21:36 🔗 Kazzy time for a nice google/bing/twitter scrape
22:25 🔗 joepie91 SketchCow: http://motherboard.vice.com/read/two-decades-ago-a-cyber-hippie-collective-launched-a-ddos-attack-to-save-rave?utm_source=mbfb
22:25 🔗 joepie91 "Except that this history is largely unwritten, and much of it is no longer online. The early act of online protest was organised through newsletters, bulletin board systems, and student networks, as well as leaflets and zines. Few traces remain: sites have been shut down, information scrubbed from the internet as the community grew up."
22:25 🔗 joepie91 this seems like something relevant to your interests :)

irclogger-viewer