#archiveteam 2013-07-29,Mon

↑back Search

Time Nickname Message
05:03 πŸ”— winr4r so what the fuck, via.me
05:03 πŸ”— winr4r three DAYS warning before deleting everything?
05:03 πŸ”— winr4r THREE FUCKING DAYS?
05:04 πŸ”— BlueMax potential backup: http://www.pspminis.com/
05:05 πŸ”— BlueMax won't be deleted but nothing more will be posted, might be a good idea to grab it now
05:17 πŸ”— godane BlueMax: i'm backing it up right now
05:25 πŸ”— godane uploaded: https://archive.org/details/polytroncorporation.com-20130727
05:41 πŸ”— godane looks like search on archive.org is not updating
05:41 πŸ”— godane i say that cause i'm around episode 90 of labrats and i'm still stuck at episode 55 in search
06:23 πŸ”— BlueMax how's pspminis going, godane
06:24 πŸ”— godane its still going
06:24 πŸ”— godane 86+mb
06:25 πŸ”— godane i'm not mirroring the forums right now cause i just want to focus on the main site
14:13 πŸ”— SketchCow BACK
14:13 πŸ”— SketchCow OK, let's catch up.
14:19 πŸ”— SmileyG SketchCow: ok.
14:20 πŸ”— SketchCow winr4r: Until I can see proof, I don't think it was 3 days warning for via.me.
14:20 πŸ”— * SmileyG ponders where to start.
14:20 πŸ”— SketchCow I've been reading
14:20 πŸ”— SmileyG K, short version
14:21 πŸ”— SmileyG GLaDOS/antomic were thinking about/working on generating a list of users already in wayback for xanga, and putting them to the back of the queue
14:21 πŸ”— SmileyG snapjoy is ready to go into it's own subcollection.
14:21 πŸ”— * SmileyG can't think of anything else
14:24 πŸ”— winr4r SketchCow: well, we heard about it yesterday, so i went to IA, and their most recent crawl which had the notice was yesterday
14:25 πŸ”— winr4r http://web.archive.org/web/20130721062456/http://via.me/
14:25 πŸ”— SketchCow I'm SURE it was at LEAST a month.
14:25 πŸ”— winr4r the one before that, on the 21st, did not, so that means they might have given 10 days notice
14:25 πŸ”— winr4r unless they told folks by email before that
14:27 πŸ”— SketchCow https://twitter.com/izayoi1616/status/359514629963128836
14:27 πŸ”— winr4r consider me corrected
14:27 πŸ”— winr4r i did check the usual suspects like techcrunch first
14:28 πŸ”— SketchCow Well, not corrected, it's 10 days.
14:29 πŸ”— SketchCow http://via.me/help#retirement
14:30 πŸ”— winr4r which isn't a whole lot better, but i was still wrong
14:30 πŸ”— winr4r "Links to your photos on Via.me will still function until July 30th"
14:30 πŸ”— winr4r "This means photo hosting will go away on August 1st."
14:30 πŸ”— winr4r ?
14:36 πŸ”— omf_ SketchCow, I am downloading all of buzzdata before they close in 2 days
14:36 πŸ”— SmileyG winr4r: one day....
14:36 πŸ”— SmileyG can you bash something out?
14:36 πŸ”— winr4r SmileyG: that sounds like an indecent proposal
14:36 πŸ”— winr4r but yes
14:43 πŸ”— SketchCow http://www.archiveteam.org/index.php?title=File:BDclosed-03.jpg
14:45 πŸ”— winr4r crap, one that got away :(
14:45 πŸ”— SketchCow Omg if working on it
14:45 πŸ”— SketchCow He just jsaid so.
14:45 πŸ”— SketchCow OK, apparently a little woozy typing
14:46 πŸ”— winr4r too early? :)
14:47 πŸ”— omf_ yeah buzzdata requires you to drive a browser to get all the js bullshit to work to access the public datasets
14:47 πŸ”— omf_ downloading the data is easy, discovering it is the time consuming part
14:48 πŸ”— winr4r omf_: and there's no JSON-spewing interface running behind that?
14:50 πŸ”— omf_ the API is how I am downloading the datasets but the api does not give me a list of usernames which is the key for access
14:50 πŸ”— winr4r oh, shit
14:51 πŸ”— omf_ GET `https://:HIVE_NAME.buzzdata.com/api/:USERNAME` where HIVE_NAME is optional (if you leave it out, it just gets the public stuff)
14:51 πŸ”— omf_ the API assumes you know the username but does not provide a username discovery mechanism
14:51 πŸ”— winr4r and getting the data set is /api/:USERNAME/:some_id ?
14:54 πŸ”— omf_ http://buzzdata.com/faq/api/api-methods#download
14:54 πŸ”— winr4r yeah i just looked, should have done that rather than asking questions
14:54 πŸ”— winr4r :)
14:54 πŸ”— omf_ https://:HIVE_NAME.buzzdata.com/api/:USERNAME/:DATASET_SHORT_NAME/:DATAFILE_UUID/download_request
14:54 πŸ”— omf_ yeah their API is alright for the features they have
14:58 πŸ”— winr4r anything i can do to help?
15:00 πŸ”— winr4r mistym: hiiiii
15:01 πŸ”— mistym winr4r: Morning!
15:01 πŸ”— omf_ winr4r, if it is not done in 4-5 hours I might
15:02 πŸ”— winr4r omf_: is your big problem finding usernames?
15:03 πŸ”— omf_ The problem is the script takes a while since it has to load and run the js bullshit
15:03 πŸ”— omf_ Nothing requires any kind of extra work
15:04 πŸ”— winr4r ah, gotcha :\
15:04 πŸ”— omf_ They only have 2,375 users total
15:04 πŸ”— omf_ and some have no datasets
15:04 πŸ”— omf_ its a very small site
15:06 πŸ”— winr4r mm :\
15:15 πŸ”— omf_ I just remembered the question I had for you winr4r
15:17 πŸ”— omf_ So we got a great talks section and an in the media section. Maybe we should a section for technical articles about how we do stuff
15:20 πŸ”— winr4r i can only think of one article that goes in there ("Site exploration"), though that's a case for writing more
15:20 πŸ”— winr4r oh, yeah, wget recipes and shit, we have a page on that i think
15:21 πŸ”— winr4r though there's likely lots on the wiki that i do not know about :)
15:29 πŸ”— omf_ I mean more along the lines of a blog post I wrote on how I wrote a script to collect all the mailing archives for opensolaris. I know a few others wrote blog posts about tools they made as well
15:30 πŸ”— omf_ A walk through of how the tool was built versus just how to use the tool
15:32 πŸ”— winr4r omf_: yes, that would be excellent
18:31 πŸ”— Asparagir YOU GUISE MY FIRST PANICGRAB UPLOAD TO IA WORKED!
18:31 πŸ”— Asparagir http://archive.org/details/jewishgen.org-panicgrab-20130710
18:31 πŸ”— Asparagir Just needs to get moved to the ArchiveTeam section, not communitytexts.
18:33 πŸ”— winr4r Asparagir: PROUD OF YOU SON
18:33 πŸ”— Asparagir I'M A DAUGHTER, DAAAAD
18:35 πŸ”— winr4r YOU'RE SON IF I SAY SO SON
18:36 πŸ”— Asparagir YOU NEVER LET ME HAVE ANY FUN! *cries, runs to room, slams door*
18:40 πŸ”— closure http://blog.theoldreader.com/post/56798895350/desperate-times-call-for-desperate-measures "You will have two weeks to export your OPML file regardless of our decision"
18:41 πŸ”— antomatic AAAARGH!
18:42 πŸ”— antomatic Guess those 'you have 3 days left to subscribe to FeedHQ' emails were well-timed then.
18:50 πŸ”— Asparagir Hey omf_ -- thank you for getting the BuzzData stuff. I liked that site; too bad it's going away.
18:52 πŸ”— winr4r closure: what the fuuuuuuuuuuuck
18:53 πŸ”— winr4r wow, i actually switched to theoldreader as well
18:55 πŸ”— antomatic I moved to feedhq's 30-day trial and oldreader at the same time, intending to pick one - was going to choose the free option (because I am awfully stingy) but looks like I won't get away with that now. :)
19:00 πŸ”— winr4r 'Last week difficulty level was changed to Γ’Β€ΒœhellҀ in every possible aspect we could imagine, we have been sleep deprived for 10 days and this impacts us way too much.'
19:00 πŸ”— winr4r how about, if you're going to call yourself a google reader alternative, and actually invite people to use you, *be willing to do the fucking work to not let people down*
19:00 πŸ”— winr4r even if it's free
19:01 πŸ”— antomatic It has been an incredible journey..
19:01 πŸ”— antomatic sad though, nevertheless.
19:03 πŸ”— winr4r actually, at this point in my life, i wouldn't mind one less thing to keep up with, so maybe that is a sign
19:03 πŸ”— winr4r in the same way that seeing a distant city nuked is a sign that you should buy more stuff locally, but
19:03 πŸ”— winr4r not something welcome, but heyyyyeyyy i'm your silver lining
19:04 πŸ”— DFJustin Asparagir: nice work, poking around the website most of it seems to be behind a login wall so I guess you did a crawl with cookies?
19:05 πŸ”— Asparagir No, I just crawled all the non-cookie areas of the site, which include hundreds of town and shtetl pages, photos, family artifacts, etc.
19:05 πŸ”— DFJustin definitely worth grabbing anything with a "powered by ancestry.com" logo, they seem to have a habit of buying free resources and then making them only for ancestry subscribers
19:05 πŸ”— Asparagir I didn't want to get in trouble by using my login and cookie -- some parts have a strict user agreement.
19:05 πŸ”— Asparagir Oh, I know, Believe me I know.
19:06 πŸ”— Asparagir I also didn't grab any personal content, like family trees (which are also behind the login). Just the public stuff.
19:07 πŸ”— Asparagir Thing I do for fun: build open source database systems for genealogy and historical groups, so they can publish their data *without* handing it over to for-profit groups liek Ancestry.
19:07 πŸ”— Asparagir http://www.LeafSeek.com/
19:07 πŸ”— DFJustin o/\o
19:08 πŸ”— Asparagir In use: http://genealogy.org.il/AID/
19:08 πŸ”— Asparagir And: search.geshergalicia.org
19:08 πŸ”— Asparagir http://search.geshergalicia.org/
19:09 πŸ”— winr4r Asparagir: hey, that's awesome!
19:09 πŸ”— Asparagir Thanks! It really bothers me how much public vital records data and historical data is getting swallowed up by for-profit groups.
19:10 πŸ”— winr4r Asparagir: there's a big problem in that field, actually, which is that 1) machine-readable data is tied up in that way 2) even more is tied up in proprietary formats
19:10 πŸ”— Asparagir There are lots of cases in the past few years of formerly-public data becoming hidden behind paywalls.
19:10 πŸ”— winr4r which i am sure you know about, i'm saying i've been aware of the problem for a while
19:10 πŸ”— Asparagir Yep!
19:10 πŸ”— Asparagir Sad genealogy panda.
19:10 πŸ”— winr4r Asparagir: so, i'll sit here and admire you for a bit
19:10 πŸ”— Asparagir If you like. :-)
19:10 πŸ”— winr4r k!
19:10 πŸ”— * winr4r sits, admires
19:11 πŸ”— * antomatic agrees
19:15 πŸ”— winr4r that ends our coverage of the Asparagir admiration society
19:17 πŸ”— Asparagir Well, hold off on the kudoes until I start rescuing some more shit. I was surprised to see a lot of major genealogy websites not well-represented in the Wayback Machine.
19:17 πŸ”— Asparagir Gotta fix that.
19:18 πŸ”— winr4r Asparagir: yes, you do
19:18 πŸ”— omf_2 Some might be robots.txt bs
19:19 πŸ”— DFJustin yeah wayback is going to be essential in the future, for example here is great info on one of my ancestors from a website which is now gone thanks to yahoo http://web.archive.org/web/20091027171723/http://www.geocities.com/SouthBeach/Canal/5891/john.html
19:20 πŸ”— Asparagir Whoever was rescuing the old webtv stuff a few weeks, ago, thank you for remembering to include family history search terms in the stuff you were pulling.
19:26 πŸ”— ntnd Frustrated by only finding long deleted forum posts which supposedly held the golden answer to my, and people long dead's questions: How would somebody with a software engineering background start archiving community sites in the most ideal way?
19:26 πŸ”— ntnd Felt like http://xkcd.com/979/ but only worse since it's often so close
19:28 πŸ”— SmileyG ntnd: two sec
19:28 πŸ”— SmileyG http://archiveteam.org/index.php?title=Wget#Creating_WARC_with_wget
19:31 πŸ”— ntnd SmileyG: This only follows links on the same domain doesn't it?
19:32 πŸ”— omf_2 that is where --span-domains comes in
19:33 πŸ”— ntnd Ahh, very nice
19:35 πŸ”— CowerZZZZ just grab ancestry.com and be done with it :)
19:37 πŸ”— godane i thought that xkcb.com was closing cause of slashdot: http://entertainment.slashdot.org/story/13/07/28/2227246/signs-point-to-xkcds-time-ending?utm_source=rss1.0mainlinkanon&utm_medium=feed
19:38 πŸ”— godane title saying 'signs point to xkcds time ending'
19:38 πŸ”— godane it was just a very long comic they made
19:39 πŸ”— godane i may do a panic craw later just incase
19:40 πŸ”— DFJustin that seems like the kind of thing wayback would have all of
19:44 πŸ”— SmileyG s/they/he
19:44 πŸ”— SmileyG godane: no chance of it closing afaik
19:44 πŸ”— SmileyG I think randall would announce it for archiving purposes far before it's time
19:44 πŸ”— SmileyG and I think everything is crawled anyway
20:16 πŸ”— * ivan` grabs all of ftp://ftp.supermicro.com
20:19 πŸ”— joepie91 http://blog.theoldreader.com/post/56798895350/desperate-times-call-for-desperate-measures
20:20 πŸ”— joepie91 oh
20:20 πŸ”— joepie91 was already posted
20:28 πŸ”— SketchCow When Scott joined the Internet Archive, the Loon rejoiced; she believed (and still vehemently believes) that the world at large and the library/archives world desperately need Scott to do the work he does.
20:28 πŸ”— SketchCow Notwithstanding that belief, the Loon knows full well that Scott would never survive in an ordinary archives or library context. Scott doesn.t just break The Rules, you see; Scott stomps The Rules flat and pisses gleefully on them, particularly though not exclusively online for all to see.
20:28 πŸ”— SketchCow Given that, not even Scott.regardless of his hands-on knowledge of digital archiving, regardless of his skill at assembling technical communities for useful ends, regardless of his many and varied accomplishments, regardless of his high public profile.could stay in a library or archives job with the Rules-enforcers gunning for him, as they inevitably would.
20:28 πŸ”— SketchCow http://gavialib.com/2013/07/silencing-librarianship-and-gender-who-can-break-the-rules/
20:30 πŸ”— xmc hmmmm
20:30 πŸ”— SketchCow Translation: I'm going to jail
20:31 πŸ”— xmc librarian jail
20:31 πŸ”— xmc I'm probably going to radio jail, for what it's worth
20:31 πŸ”— SketchCow We're all going to jail.
20:31 πŸ”— SketchCow ALL
20:32 πŸ”— xmc I'm ok with this.
20:32 πŸ”— SketchCow I just figured out who this person is.
20:33 πŸ”— SketchCow OH MAN
20:33 πŸ”— SketchCow Forgot to mention
20:33 πŸ”— SketchCow They did a WARC session at this NDSA thing I went to
20:33 πŸ”— SketchCow I got up during the Q&A and said we'd added WARC to WGET.
20:33 πŸ”— SketchCow Cheering. Cheering!
20:34 πŸ”— xmc and then?
20:35 πŸ”— xmc there *has* to be an "and then"
20:35 πŸ”— antomatic "buxom cross-dressers threw fake gold coins at our feet as we discussed the fate of the revolution."
20:35 πŸ”— SketchCow People were happy the end
20:35 πŸ”— xmc I seem to remember hearing about this
20:36 πŸ”— xmc ok
20:36 πŸ”— antomatic also good
20:36 πŸ”— xmc so this Loon didn't piss all over wget
20:36 πŸ”— xmc great
22:34 πŸ”— balrog http://thenextweb.com/insider/2013/07/29/the-old-reader-to-close-public-site-in-two-weeks-users-who-joined-before-google-reader-axing-news-can-stay/

irclogger-viewer