[05:03] so what the fuck, via.me [05:03] three DAYS warning before deleting everything? [05:03] THREE FUCKING DAYS? [05:04] potential backup: http://www.pspminis.com/ [05:05] won't be deleted but nothing more will be posted, might be a good idea to grab it now [05:17] BlueMax: i'm backing it up right now [05:25] uploaded: https://archive.org/details/polytroncorporation.com-20130727 [05:41] looks like search on archive.org is not updating [05:41] i say that cause i'm around episode 90 of labrats and i'm still stuck at episode 55 in search [06:23] how's pspminis going, godane [06:24] its still going [06:24] 86+mb [06:25] i'm not mirroring the forums right now cause i just want to focus on the main site [14:13] BACK [14:13] OK, let's catch up. [14:19] SketchCow: ok. [14:20] winr4r: Until I can see proof, I don't think it was 3 days warning for via.me. [14:20] * SmileyG ponders where to start. [14:20] I've been reading [14:20] K, short version [14:21] GLaDOS/antomic were thinking about/working on generating a list of users already in wayback for xanga, and putting them to the back of the queue [14:21] snapjoy is ready to go into it's own subcollection. [14:21] * SmileyG can't think of anything else [14:24] SketchCow: well, we heard about it yesterday, so i went to IA, and their most recent crawl which had the notice was yesterday [14:25] http://web.archive.org/web/20130721062456/http://via.me/ [14:25] I'm SURE it was at LEAST a month. [14:25] the one before that, on the 21st, did not, so that means they might have given 10 days notice [14:25] unless they told folks by email before that [14:27] https://twitter.com/izayoi1616/status/359514629963128836 [14:27] consider me corrected [14:27] i did check the usual suspects like techcrunch first [14:28] Well, not corrected, it's 10 days. [14:29] http://via.me/help#retirement [14:30] which isn't a whole lot better, but i was still wrong [14:30] "Links to your photos on Via.me will still function until July 30th" [14:30] "This means photo hosting will go away on August 1st." [14:30] ? [14:36] SketchCow, I am downloading all of buzzdata before they close in 2 days [14:36] winr4r: one day.... [14:36] can you bash something out? [14:36] SmileyG: that sounds like an indecent proposal [14:36] but yes [14:43] http://www.archiveteam.org/index.php?title=File:BDclosed-03.jpg [14:45] crap, one that got away :( [14:45] Omg if working on it [14:45] He just jsaid so. [14:45] OK, apparently a little woozy typing [14:46] too early? :) [14:47] yeah buzzdata requires you to drive a browser to get all the js bullshit to work to access the public datasets [14:47] downloading the data is easy, discovering it is the time consuming part [14:48] omf_: and there's no JSON-spewing interface running behind that? [14:50] the API is how I am downloading the datasets but the api does not give me a list of usernames which is the key for access [14:50] oh, shit [14:51] GET `https://:HIVE_NAME.buzzdata.com/api/:USERNAME` where HIVE_NAME is optional (if you leave it out, it just gets the public stuff) [14:51] the API assumes you know the username but does not provide a username discovery mechanism [14:51] and getting the data set is /api/:USERNAME/:some_id ? [14:54] http://buzzdata.com/faq/api/api-methods#download [14:54] yeah i just looked, should have done that rather than asking questions [14:54] :) [14:54] https://:HIVE_NAME.buzzdata.com/api/:USERNAME/:DATASET_SHORT_NAME/:DATAFILE_UUID/download_request [14:54] yeah their API is alright for the features they have [14:58] anything i can do to help? [15:00] mistym: hiiiii [15:01] winr4r: Morning! [15:01] winr4r, if it is not done in 4-5 hours I might [15:02] omf_: is your big problem finding usernames? [15:03] The problem is the script takes a while since it has to load and run the js bullshit [15:03] Nothing requires any kind of extra work [15:04] ah, gotcha :\ [15:04] They only have 2,375 users total [15:04] and some have no datasets [15:04] its a very small site [15:06] mm :\ [15:15] I just remembered the question I had for you winr4r [15:17] So we got a great talks section and an in the media section. Maybe we should a section for technical articles about how we do stuff [15:20] i can only think of one article that goes in there ("Site exploration"), though that's a case for writing more [15:20] oh, yeah, wget recipes and shit, we have a page on that i think [15:21] though there's likely lots on the wiki that i do not know about :) [15:29] I mean more along the lines of a blog post I wrote on how I wrote a script to collect all the mailing archives for opensolaris. I know a few others wrote blog posts about tools they made as well [15:30] A walk through of how the tool was built versus just how to use the tool [15:32] omf_: yes, that would be excellent [18:31] YOU GUISE MY FIRST PANICGRAB UPLOAD TO IA WORKED! [18:31] http://archive.org/details/jewishgen.org-panicgrab-20130710 [18:31] Just needs to get moved to the ArchiveTeam section, not communitytexts. [18:33] Asparagir: PROUD OF YOU SON [18:33] I'M A DAUGHTER, DAAAAD [18:35] YOU'RE SON IF I SAY SO SON [18:36] YOU NEVER LET ME HAVE ANY FUN! *cries, runs to room, slams door* [18:40] http://blog.theoldreader.com/post/56798895350/desperate-times-call-for-desperate-measures "You will have two weeks to export your OPML file regardless of our decision" [18:41] AAAARGH! [18:42] Guess those 'you have 3 days left to subscribe to FeedHQ' emails were well-timed then. [18:50] Hey omf_ -- thank you for getting the BuzzData stuff. I liked that site; too bad it's going away. [18:52] closure: what the fuuuuuuuuuuuck [18:53] wow, i actually switched to theoldreader as well [18:55] I moved to feedhq's 30-day trial and oldreader at the same time, intending to pick one - was going to choose the free option (because I am awfully stingy) but looks like I won't get away with that now. :) [19:00] 'Last week difficulty level was changed to âhellâ in every possible aspect we could imagine, we have been sleep deprived for 10 days and this impacts us way too much.' [19:00] how about, if you're going to call yourself a google reader alternative, and actually invite people to use you, *be willing to do the fucking work to not let people down* [19:00] even if it's free [19:01] It has been an incredible journey.. [19:01] sad though, nevertheless. [19:03] actually, at this point in my life, i wouldn't mind one less thing to keep up with, so maybe that is a sign [19:03] in the same way that seeing a distant city nuked is a sign that you should buy more stuff locally, but [19:03] not something welcome, but heyyyyeyyy i'm your silver lining [19:04] Asparagir: nice work, poking around the website most of it seems to be behind a login wall so I guess you did a crawl with cookies? [19:05] No, I just crawled all the non-cookie areas of the site, which include hundreds of town and shtetl pages, photos, family artifacts, etc. [19:05] definitely worth grabbing anything with a "powered by ancestry.com" logo, they seem to have a habit of buying free resources and then making them only for ancestry subscribers [19:05] I didn't want to get in trouble by using my login and cookie -- some parts have a strict user agreement. [19:05] Oh, I know, Believe me I know. [19:06] I also didn't grab any personal content, like family trees (which are also behind the login). Just the public stuff. [19:07] Thing I do for fun: build open source database systems for genealogy and historical groups, so they can publish their data *without* handing it over to for-profit groups liek Ancestry. [19:07] http://www.LeafSeek.com/ [19:07] o/\o [19:08] In use: http://genealogy.org.il/AID/ [19:08] And: search.geshergalicia.org [19:08] http://search.geshergalicia.org/ [19:09] Asparagir: hey, that's awesome! [19:09] Thanks! It really bothers me how much public vital records data and historical data is getting swallowed up by for-profit groups. [19:10] Asparagir: there's a big problem in that field, actually, which is that 1) machine-readable data is tied up in that way 2) even more is tied up in proprietary formats [19:10] There are lots of cases in the past few years of formerly-public data becoming hidden behind paywalls. [19:10] which i am sure you know about, i'm saying i've been aware of the problem for a while [19:10] Yep! [19:10] Sad genealogy panda. [19:10] Asparagir: so, i'll sit here and admire you for a bit [19:10] If you like. :-) [19:10] k! [19:10] * winr4r sits, admires [19:11] * antomatic agrees [19:15] that ends our coverage of the Asparagir admiration society [19:17] Well, hold off on the kudoes until I start rescuing some more shit. I was surprised to see a lot of major genealogy websites not well-represented in the Wayback Machine. [19:17] Gotta fix that. [19:18] Asparagir: yes, you do [19:18] Some might be robots.txt bs [19:19] yeah wayback is going to be essential in the future, for example here is great info on one of my ancestors from a website which is now gone thanks to yahoo http://web.archive.org/web/20091027171723/http://www.geocities.com/SouthBeach/Canal/5891/john.html [19:20] Whoever was rescuing the old webtv stuff a few weeks, ago, thank you for remembering to include family history search terms in the stuff you were pulling. [19:26] Frustrated by only finding long deleted forum posts which supposedly held the golden answer to my, and people long dead's questions: How would somebody with a software engineering background start archiving community sites in the most ideal way? [19:26] Felt like http://xkcd.com/979/ but only worse since it's often so close [19:28] ntnd: two sec [19:28] http://archiveteam.org/index.php?title=Wget#Creating_WARC_with_wget [19:31] SmileyG: This only follows links on the same domain doesn't it? [19:32] that is where --span-domains comes in [19:33] Ahh, very nice [19:35] just grab ancestry.com and be done with it :) [19:37] i thought that xkcb.com was closing cause of slashdot: http://entertainment.slashdot.org/story/13/07/28/2227246/signs-point-to-xkcds-time-ending?utm_source=rss1.0mainlinkanon&utm_medium=feed [19:38] title saying 'signs point to xkcds time ending' [19:38] it was just a very long comic they made [19:39] i may do a panic craw later just incase [19:40] that seems like the kind of thing wayback would have all of [19:44] s/they/he [19:44] godane: no chance of it closing afaik [19:44] I think randall would announce it for archiving purposes far before it's time [19:44] and I think everything is crawled anyway [20:16] * ivan` grabs all of ftp://ftp.supermicro.com [20:19] http://blog.theoldreader.com/post/56798895350/desperate-times-call-for-desperate-measures [20:20] oh [20:20] was already posted [20:28] When Scott joined the Internet Archive, the Loon rejoiced; she believed (and still vehemently believes) that the world at large and the library/archives world desperately need Scott to do the work he does. [20:28] Notwithstanding that belief, the Loon knows full well that Scott would never survive in an ordinary archives or library context. Scott doesn.t just break The Rules, you see; Scott stomps The Rules flat and pisses gleefully on them, particularly though not exclusively online for all to see. [20:28] Given that, not even Scott.regardless of his hands-on knowledge of digital archiving, regardless of his skill at assembling technical communities for useful ends, regardless of his many and varied accomplishments, regardless of his high public profile.could stay in a library or archives job with the Rules-enforcers gunning for him, as they inevitably would. [20:28] http://gavialib.com/2013/07/silencing-librarianship-and-gender-who-can-break-the-rules/ [20:30] hmmmm [20:30] Translation: I'm going to jail [20:31] librarian jail [20:31] I'm probably going to radio jail, for what it's worth [20:31] We're all going to jail. [20:31] ALL [20:32] I'm ok with this. [20:32] I just figured out who this person is. [20:33] OH MAN [20:33] Forgot to mention [20:33] They did a WARC session at this NDSA thing I went to [20:33] I got up during the Q&A and said we'd added WARC to WGET. [20:33] Cheering. Cheering! [20:34] and then? [20:35] there *has* to be an "and then" [20:35] "buxom cross-dressers threw fake gold coins at our feet as we discussed the fate of the revolution." [20:35] People were happy the end [20:35] I seem to remember hearing about this [20:36] ok [20:36] also good [20:36] so this Loon didn't piss all over wget [20:36] great [22:34] http://thenextweb.com/insider/2013/07/29/the-old-reader-to-close-public-site-in-two-weeks-users-who-joined-before-google-reader-axing-news-can-stay/