[05:28] we should probably do some research into looking at grabbing the win10 feedback stuff [06:45] HEY [06:45] THINGS ARE SUTTING DOWN [06:45] over at dawngate [06:45] https://www.dawngate.com/news/detail/an-important-announcement-about-the_32964 [07:16] wp494: asked them to release source yet? [07:33] wp494: I'll take a good look at dawngate later today [17:32] viddy.com is shutting down december 15 [17:33] woooh [17:33] new project [17:35] http://tidbits.com/article/15212 [17:35] http://robservatory.com/rip-mac-os-x-hints-nov-4-2000-nov-4-2014/ [17:39] chfoo: I can create the scripts to download the items from viddy [17:39] chfoo: but getting a list of items to be downloaded is going to be a problem [17:40] crawl job? [17:40] you need to iterate the cdx api to get a list of things already fetched and then scrape off google and bing using the existing url keywords [17:41] yeah [17:41] with cdx api you mean wayback machine's api? or from viddy? [17:42] * arkiver is back in an hour [17:42] archive.org's cdx api or wayback machine api. whichever helps avoid dedup and discovers more urls [17:44] hmmm viddy.com/username/v/video_id [17:54] also viddy.com/media/video_guid [17:55] hmm http://developer.viddy.com/docs/media [17:56] the app and api has search [18:07] Bieber located!: www.viddy.com/JustinBieber/v/soundcheck-LDS3p9 [18:08] Further Bieber: http://www.viddy.com/ilybiebercrew/v/baab-W47bSY [18:08] That young man will be a star one day, I just kow it. [18:09] yeah [18:09] saw those api's [18:09] will be working on this [18:10] Save The Bieber. [18:40] lol i found bieber right away [18:40] antomatic: thats the one i found [18:41] Clearly a case of bieber fever. [18:41] made me reconcider this whole project :D [18:44] Any website frequented by key thought leaders such as Justin Bieber and MTV are clearly on the rise [18:46] then again re:bieber, those who forget history are doomed to repeat it [18:46] THINK OF THE CHILDREN!. [18:49] heh [18:50] https://www.youtube.com/watch?v=RybNI0KB1bg [19:51] http://features.jsomers.net/how-i-reverse-engineered-google-docs/ [19:51] I can see some uses for that. :) [20:10] * SketchCow thinks of children. mmmmm [20:10] I'm still packing up FOS stuff into archive.org. [20:10] Handed off twitpic creds for that upload. [20:19] How do you guys usually save Facebook profiles? [20:23] SketchCow: I'm looking at the cdx files from halo [20:23] there seem to be some things missing? [20:23] but first, what exactly is this line: [20:23] - 11846 1835689521 archiveteam_halo_20141105063141/halo_20141105063141.megawarc.warc.gz [20:23] net,bungie,halo)/stats/odstg.aspx?gameguid=967580589871585280 20141104194757 http://halo.bungie.net/Stats/odstg.aspx?gameguid=967580589871585280 text/html 200 7MRD7NYHX2SSCZ7RTM3URW725BI3KGPK - - [20:24] I'm not sure how the cdx's are being read [20:24] but to me it looks like the last part of a line has been added to the first part of the next line [20:25] but that's probably just generated by IA right? [20:25] the derivers I mean ^ [20:26] second, it looks like the screenshots are missing in the cdx file (not sure about the warc) [20:26] what I'm thinking is that IA leaves the "ssid" part out of the url when deriving into cdx [20:27] so what has to be this url: [20:27] http://halo.bungie.net/Stats/Halo3/Screenshot.ashx?size=full&ssid=74B7A3ABDB872A2EE59203CAB3E6A494 [20:27] No idea, frankly. [20:27] This is megawarc putting them together. [20:27] becomes this url in the wayback machine: http://halo.bungie.net/Stats/Halo3/Screenshot.ashx?size=full [20:27] arkiver: see https://github.com/internetarchive/CDX-Writer [20:27] I haven't taken a look yet at the warc [20:27] Tell me if I'm stopping [20:28] I'll take a look at the warc [20:28] ... when it is downloaded [20:28] Talking to our team [20:30] thanks. so I'm mainly concerned about the cdx not having the ssid from the screenshot url [20:31] I'm also seeing some url in the cdx that need to be excluded [20:34] SketchCow: nevermind about that first error, where I thought it was added the last part of a line to the start of the next line. [20:34] that was just my text converter [20:34] viewer* [20:35] ah got it [20:35] net,bungie,halo)/stats/halo3/screenshot.ashx?s&size=medium 20141104203046 http://halo.bungie.net/Stats/Halo3/Screenshot.ashx?size=medium&ssid=46ABD023C467371D0FD6CE14E95C10FC image/jpeg 200 66RFBFMWA3AXC3A54KEQI5CL4LSHON57 - - 32825 16531636453 archiveteam_halo_20141105063141/halo_20141105063141.megawarc.warc.gz [20:35] I need you to speak to kenji@archive.org if you have concerns. [20:35] OK? [20:35] ok [20:35] Otherwise in they go for the moment. [20:35] I can't hold them in stasis on FOS, it's at 96% full [20:36] so that first url doesn't have the ssid in the line, but the second one, after the date, has it [20:36] SketchCow: I'll see how it looks when indexed in the wayback machine [20:37] OK. [20:40] arkiver: cdx-writer uses surt which is supposed to strip out sids https://github.com/internetarchive/surt [20:42] chfoo: I see, that might be a problem then [20:45] i know archivebot grabbed forums where the session id doesn't go away. if you have time, you can could check the wayback machine on how it handled the urls. [20:51] chfoo: I'll do that [21:30] Hi Archive Team. [21:30] I just received an email stating Jux was shutting down. [21:30] I'm guessing you all know this, but just in case I thought I'd contact you about it. [21:31] zgrant: jux.com? [21:31] yes. [21:31] sites down atm lel [21:31] I get a black page, and it's infinitely loading [21:31] and I wasn't aware of it yet :P [21:32] From their email: [21:32] okay, so a kind of non-ephemeral instagram? [21:32] We realize that some of you have media stored on the Jux servers that you would like to preserve, so we have a link for you that will enable you to retrieve and download all your stored media files with one simple step. Visit farewell.jux.com and sign in with your Jux credentials to export your media files. [21:32] So, I guess farewell.jux.com [21:32] I should have read the email first. \ [21:32] ah farewell loads [21:33] @joepie91: Yes, I think so. [21:33] well you have to sign in to grab files [21:33] looks like they've been threatening to shut down before [21:33] http://thenextweb.com/insider/2013/05/25/jux-closing-down-citing-slow-growth/ [21:33] @joepie91 I looked at it about a year ago to see if I could use it for instruction. [21:33] so im not sure how that would work if archiveteam were to try and grab [21:34] Void_: nope, public pages [21:34] @joepie91 yes, I remember that, but someone came along and helped them stay afloat. [21:34] Void_: eg. random thing I ran across when googling "jux.com farewell": https://brisbane-down-under.jux.com/1953333 [21:34] doesnt load for me [21:35] does for me :P [21:35] hmm [21:35] guess their infra is a bit screwed right now [21:35] ye [21:36] midas: i've been thinking of crawling the boerse sites while logged in to harvest precious links [21:36] time for a nice google/bing/twitter scrape [22:25] SketchCow: http://motherboard.vice.com/read/two-decades-ago-a-cyber-hippie-collective-launched-a-ddos-attack-to-save-rave?utm_source=mbfb [22:25] "Except that this history is largely unwritten, and much of it is no longer online. The early act of online protest was organised through newsletters, bulletin board systems, and student networks, as well as leaflets and zines. Few traces remain: sites have been shut down, information scrubbed from the internet as the community grew up." [22:25] this seems like something relevant to your interests :)