[00:11] !!! [00:11] http://www.theverge.com/2013/3/15/4110196/social-question-answer-site-formspring-shut-down-march-31st [00:12] that's just great [00:17] and it's in ~16 days, so we better get something figured out, and fast [00:19] nm, we have a month [00:19] since the site will still remain alive until april 15th [00:25] holy shit seriously? [00:25] I thought Formspring was alive and well [00:25] Goddamn [00:28] evening [00:32] facks sake [00:32] the inactivity message that showed up at the beginning of the year was probably our early warning to trouble [00:50] wp494: the website will shut down March 31, users can log in and export until April 15. [00:50] We have until March 31 [00:50] if they went into a limited-functionality mode, then that would most certainly hold true [01:20] http://formspring.wordpress.com/2013/03/15/formspring-is-shutting-down/ [01:20] Seen. [01:21] * trs80 nods [01:23] We need to: 1. scrape for the IDs, 2. scrape each site. [01:23] Did Formspring have an API/ [01:28] o noes [01:28] http://formspring.wordpress.com/2013/03/15/formspring-is-shutting-down/ [01:29] ffs [01:30] lol [01:30] for Yaho [01:30] lol [01:30] that sounds like drugs [01:30] Wanna buy some fuckin' Yaho? [01:30] GET SUM OF DAT YAHO @ #BURNTHEMESSENGER [01:31] formspring was nice [01:31] It was. [01:31] now the only option for smth like formspring [01:31] is uhm [01:31] tumblr's questions? [01:31] there's also ask.fm [01:32] 4y7a7h8 8u9 9r3i3t10e [01:35] Channel name for formspring? [01:35] #validforafunnyname [01:35] as a placeholder [01:35] ofc [01:37] #firespring perhaps? [01:37] * GLaDOS cannot into joke [01:37] sounds good [01:45] wait, how legitimate is that blog, it only has two posts [01:45] and there is a (no longer accessible) http://blog.formspring.me/ [01:46] also nothing on their twitter account [01:47] as legitimate [01:47] as techcrunch [01:47] confirmingit [01:47] http://techcrunch.com/2013/03/15/formspring-the-pioneering-anonymous-qa-platform-is-shutting-down/ [01:47] oh right, didn't see that [01:48] formsprung [01:48] it's legit, they've got a message on user pages [01:49] just says something along the lines of "important formspring news" [01:54] i like formsprung [01:55] anyone have an account to see how the export works? [01:57] I think I had one [02:00] came across this message [02:00] I do, although I have no questions/answers so I don't know what I'll give me [02:00] As of January 2013, accounts that have not been active in over 18 months may be automatically deleted. If this is your account, you may login within the next 24 hours to stop this account from being permanently deleted. [02:01] Yeah, when I logged in I got a notice saying that my account had been deactivated [02:02] oh wow, ANOTHER site that's dying [02:02] but they just killed all the old users' contents [02:07] so if we get the remainder of formspring archived, would the archiving status on its page state "partially saved" due to the loss of some data because of the january purge of inactive users? [02:07] it would be very partial [02:07] most people haven't used it for a while [02:08] but maybe it's still worth it [02:13] we may as well archive what's left [02:13] better a small archive than nothing [02:13] yeah [02:17] LOL SUP FAGGOT, HOW DOES IT FEEL TO BE AUTISTIC? [02:17] I'm actually female, and it feels pretty good to have autism because I can tell good people from assholes like yourself :) [02:17] i love the magic of these anon q&a [02:18] http://ask.fm/HollytheKirlia [02:18] de_stroyed [02:18] good comebacks always bring a smile to me [02:18] that was far [02:19] from being a molishing reply [02:22] I thought we had already taken care of stupid users like that [02:25] trolling and spamming irc is completely pointless. It is not like it has the staying power of comments on an unattended blog. [02:26] I troll with ascii art [02:26] ◑ ◔ [02:26] IMMA CHARGIN MAH LAZER! [02:26] ║▓▒░░░░░░░░░░░░░░░░░░ [02:26] ╔═╗ [02:26] ╚═╝ [02:28] ... [02:28] Has anyone grabbed the Boston Phoenix yet? [02:37] Yeah skipping this robots.txt file http://thephoenix.com/robots.txt [02:38] How rude. [02:45] brb [02:53] hey, whoever has permission to delete pages on the wiki, there are a handful of spam pages that have been blanked but should be deleted. check the top of recent changes [02:56] done [03:33] We've got a Formspring representative in #firespring [04:02] happy Friday evening archivers [04:02] same to you no2pencil [04:19] anybody know about WebCite(http://www.webcitation.org) potentially shutting down. never knew it was a non-profit org. [04:43] sounds more like a fundraiser for now [04:45] oh, WebCite is already on the deathwatch [04:48] it looks like it's might die. They need $25K and they only raised $3.8K since January. [04:48] hi closure [11:52] Hey look, another service closing https://twitter.com/synchtube/status/312752992476602368 [11:52] http://i.imgur.com/hPITxq2.png [11:53] at least there isn't really any data to be saved [12:37] Morning. [12:38] Man, we need to think about how to ramp up new users. [12:38] well, if jacquesm writing about the projects, that'll work ;) [12:39] *keeps [12:57] I have no question we can get new members and continue to. [12:57] What we need to do, is engage and encourage people to be doers [12:58] Nah, it's more than that. [12:58] Hm [12:58] Obviously, we have a lot of self-directed people and we abuse non-self-directed people until they either clutch sack or storm off. This is good. [12:59] But there's a moment when a person who IS self directed goes "I want to help" and they begin doing "stuff" and we have to go "wait, no, that actually is the opposite of help" [12:59] Or, even more..... [12:59] Hmmm. [13:00] You know what might be a good idea? A site for Archiveteam where people submit a request to archive it. [13:00] Like 'xxxx going down'. So all the usual pre-formed things are in there. [13:01] That's indeed a good/interesting idea. [13:01] Then we have a channel here that gets those requests. [13:01] batsignal.archiveteam.org [13:01] Pass that around, and then people can use the batsignal. [13:01] Put that on front page of wiki [13:01] Nothing too crazy. [13:01] * ersi nods [13:02] The page should also show the current projects we have so people hopefully do not submit the same thing over and over agai [13:02] Right. [13:02] I've been thinking about that as well. Might as well start weekend hackin' on it. [13:02] Yeah, it doesn't need to be anything crazy. I like it when it looks nice, obviously. [13:04] While having more people reporting sites, keeping an eye out and running warrior there is the other lurking problem. Sites where wget just cannot cut it. Right now it feels like we have more projects that need this than people who can do it [13:05] like everything I am working on [13:06] Well, it doesn't have to be entirely focused on "New things to archive". It could be "Oh, so you're telling us about Google Reader? We know that it's going down. Do you have any extra info we've missed? Our collective thoughts are on the wiki page [link]. Either contribute there, or fill out this message box and we'll take care of it" [13:07] Maybe it's time to start up #archiveteam-batsignal. Brain storming galore [13:14] ---------------------------------------------- [13:15] WE'RE BRAINSTORMING AN EARLY ALERT SYSTEM [13:15] #archiveteam-batsignal [13:15] ---------------------------------------------- [13:22] I love that my #1 slowdown in my new project is "now which machine do I put mediawiki on" [13:24] ha [13:24] This just in: having the posterous clumper (makes 50gb warcs) not first move the warc.gz files between two drives means your machine doesn't choke to shit [13:25] Yeah, saves some I/O [13:26] Apparently a pile of it. [13:26] While batsignal goes on, let me go back to original problem. [13:27] Guy shows up in archieteam, wants to help. [13:27] Either: [13:27] - Technically unastute, just undirectedly angry [13:27] - Somewhat technically astute, has windows box, directedly angry [13:27] - Technically astute, has linux, directedly angry [13:28] So in theory, warrior tends to be an excellent continuing fix for a whole class. We're in very good shape there. [13:28] We're in such good shape, we're able to crush a service to a bloody pulp. [13:29] Sorry, I keep going tangential but thinking makes me think too much [13:30] What is the technical consideration with ... [13:30] OK, here's a task the warrior might be really good for. [13:31] Straight-ass website warcing. You give it a site, or I say we assign it a site, and the warriors download it along the thread. [13:32] Obviously, a warrior can only do so much before it runs into issues. So have it go until the warc is a certain size, then it dumps back what URLs it's done. [13:32] So it does a couple hundred mb or whatever the instance is granted, and then dumps it back going "I got this much" [13:32] This won't work for a lot of strange sites, but for others it would be fine. [13:33] Have we used warrior before to just get single large files [13:33] like a list of 500 videos [13:34] I don't know. [13:34] Well, not the warrior - but Yahoo Video was pretty huge. [13:35] But I think a not-that-big-a-deal website set, where one or two people can have it directed to them and it just goes, and we know it's good because the warrior's doing the work, that might be nice. [13:38] And now.... remembering how virtual hosts work on Apache 22 [13:49] http://test.archiveteam.org [13:49] Well, thank goodness for little tiny victories. [13:50] Just tried to grab thephoenix and it craps out pretty quick, even with a different UA and robots.txt ignored [13:50] 30 files and then it says it is done [13:51] https://itunes.apple.com/us/app/internet-archive-companion/id617179709 - Internet Archive Companion [13:51] Not perfect, but a start. [13:53] here is the command I used it anyone is interested in it http://pastebin.com/EUDMSXs0 [14:10] SketchCow, what about different versions of the warrior. [14:11] I have been working on one that can screenshot pages from a tracker list [14:14] Why would there need to be different versions though? Couldn't the necessariy software be installed on the current ones? [14:14] There shouldn't be different versions. [14:14] the warrior will get bigger then [14:14] You should bring your forked optional changes into the main warrior. [14:14] The warrior is trivially small [14:14] I need to get it all working [14:15] Yes, that's what development is [14:15] Working on something until it works and then joining it to the new version [14:23] http://metadata.archiveteam.org/index.php/Main_Page yay [14:24] omf_: your wget line- --span-hosts will only go to the domains listed in --domains right? [14:24] yes [14:25] the images are hosted on cache [14:32] SketchCow: so when will i upload to my other collections? [14:33] i only ask cause its been a few days [16:15] guys [16:15] thq [16:15] i thought their site's had gone, they've gone bust afaik [16:15] http://www.saintsrow.com/community# [16:15] it's still up and running [16:15] grab grab grab [16:38] so, there's a bit of javascript that runs asking you to confirm your language and birthdate [17:57] yeah [18:23] hello, I wanted to help out with archiving posterous, but I would prefer to not give the archiveteam-warrior my full bandwidth, is there any possibility to throttle it? [18:27] I don't think there is one (yet), but because posterous is quite slow, you will probably not use a lot of bandwidth. Uploading the results via rsync will probably be the biggest bandwidth peak [18:29] okay thanks, and yes download bandwidth is no problem, but I only have 768 kbps upload bandwidth [18:30] urgato: you can limit warrior instances [18:31] except I've lost the instructions [18:39] SmileyG: thanks, I looked it up, and http://www.virtualbox.org/manual/ch06.html#network_bandwidth_limit sounds promising, but I couldn't try it yet, because my virtualbox version from ubuntu repos is 4.1.8, I will update and try it later [18:44] yeah, that stuff works [19:14] grawity, Did you upload that opensolaris stuff you downloaded? [19:15] omf_: http://archive.org/details/defect.opensolaris.org [19:18] thanks [19:35] afternoon [19:38] looks good to me [19:38] oops [19:40] it's a date ;* [20:22] Do we have enough time to rescue formspring? The wiki says there are more than 4 billion posts and 30 million registered users. From what I undertand AT has until the 31st this month to save it. and after that until april 15th users will only be able to retrieve their own data. [20:23] #firespring [20:23] oh [20:23] heh. thanks ersi :) [20:24] didn't cross my mind to ask there. duh. [20:28] ;) [20:31] Question: This goes for everything the Team works on. Is there a way I can help besides downloading shuttering sites? I have a very slow upload speed. I'd like to help somehow. Let me know. :) [20:34] so i'm grabing pcper podcast [20:34] just know there is will be twit-pcper podcast and pcper podcast [20:34] arkhive: That certainly helps [20:34] not to confuse the two [20:34] arkhive: But other than that, there's information gathering and developing [20:35] ersi: Ya, I frequently Google keywords like shutter, shuttering, shutdown, closing, close [20:38] ersi: I will have faster internet in three to six month hopefully. That is what a CenturyLink rep told me when faster speeds would be coming to my home. When that comes I will get the fastest which will be 40Mb down 20Mb up [20:41] There's also helping out with the IRC/wiki btw, but that's kinda like the information gathering I mentioned [20:44] At least I still have tons of digitizing to do. My Mom has old records that came from cereal boxes when she was a kid. Would be cool to digitize that. [21:01] arkhive: if you have spare time but not bandwidth there are a lot of metadata jobs to help with [21:04] http://www.archiveteam.org/index.php?title=Metadata_warriors https://twitter.com/textfiles/status/309401046340468736 [22:09] that sounds interesting [22:19] Yeah, the metadata thing is moving forward. [22:19] http://metadata.archiveteam.org/index.php/Main_Page [22:20] Is anyone working on Yahoo Messages? [22:25] * WiK is pulling email addresses outta github repos [22:26] I think NOBODY is, alard. We need to move on it, too. [22:26] MAYBE ersi [22:26] There's the #BurnTheMessenger channel, but that doesn't seem very active. [22:27] No, only one who've done something is omf_ and I think he needs backing up [22:28] omf_ is not in that channel, and there's not much on the wiki. [22:28] I don't think it should be too difficult to archive. Most important question is probably: what do we want? [22:29] There are many different ways to view that site, many urls with slightly different parameters that lead to the same data. [22:32] And a few URLs who've stopped working/redirects to some weird place, from what noted in #BurnTheMessenger [22:33] He's not there now, but he was earlier [22:33] Hm, weird with the parameters [22:58] good news [22:58] on the pcper podcast [22:59] it looks like pcper was made with twit [22:59] its not some other podcast call pcper on twit [23:03] is any one working on formspring? [23:06] #firespring is the project channel [23:09] I created https://github.com/ArchiveTeam/yahoomessages-grab [23:11] ersi: yeah I know, nothing much is happening there at the moment though. I'm not too technical so I just got in contact with a formspring engineer, timbart, last night. He might give us a dump of the usernames to help with the scraping. Was hoping something happened since then. :) [23:14] Everyone will be delighted to know that the posterous uploading process is now smooth as silk. [23:19] pinwale: Neat! [23:24] ersi: yeah, timbart also said that he'll keep the rate limiters from interfering from any scrapping. [23:24] ersi: Just need someone set the job up. other than running warrior, this is as far as I can do. [23:24] i'm going to try to grab all odtv podcasts [23:25] there are some post talks from twit live that you will not find anywhere else [23:37] SketchCow: so, how is the cd-ripping process going? [23:39] Haven't even started [23:39] it's behind a bunch of other priorities [23:39] Hopefully soon, right? [23:39] I just started the process of getting the book scanner online and into the archive.org world [23:39] http://metadata.archiveteam.org/index.php/Main_Page also very important to me right now [23:40] SketchCow: btw, are you interested in writing something for World Backup Day? I'm currently looking for writers to feature. [23:40] to speed up the process, have you considered a multi-CDROM drive computer (my university did it so they didn't need to swap discs for reference materials) [23:40] I know you are. I doubt it'll be me. [23:40] I know this day means a lot to you, but every day is goddamned world backup day for me and a lot of this team now. [23:41] But I know some of the guys in here may be jonesing to write something [23:41] Go link to some of my speeches [23:42] dashcloud: Oh, it's less a part of "the process" than literally making room in this room. [23:42] Room's kind of a mess [23:42] :) [23:43] I'm finally getting around to having an easy process to digitize my old VHS tapes- I spent so long trying to get things perfect that only a couple here and there got done [23:44] Archive.org just got big big big into digitized VHS tapes [23:44] SketchCow: yeah, I understand. the offer to write is open to anyone here. the day is just to get people to think about data archiving and preservation. every minute of every day is backup day [23:45] what sort of workflow are they using? [23:46] VHS machine, digitizer, big ol' screen [23:46] Volunteer guy [23:46] neat [23:47] 2.0T . [23:47] root@teamarchive-1:/1/PUNCHFORK/archive# du -sh . [23:47] 2 terabytes of fuckin' punchfork! [23:47] So far we're under budget this year, or I should say, trends are in that direction, which is good. [23:47] there are still 39 punchfork users that aren't done [23:47] well hopefully we won't have something as big as mobileme [23:47] I know it's shocking, but I can live with us missing 39 punchfork users. [23:48] but if trends continue as the past few months, that may not be the case [23:48] I put together a Mitsubishi S-VHS player, an AVT-210(?) time-base corrector, and a Hauppauge PVR-250 capturing the video [23:48] Because we're getting 90% of what is punchfork [23:48] Which is more than anyone else was getting [23:48] dashcloud: I have something similar, no time-base corrector though [23:48] Yeah, no time-base corrector here either [23:48] Hence I never dump originals [23:48] time-base corrector is something that looks useful [23:49] I got it mostly because I do basically zero-post processing and that takes out a step for me- crop, chop out dead space/trailers, maybe yadif deinterlace, and then it's onto a DVD [23:50] do you have anything for removing macrovision? [23:50] SketchCow: your going to be getting this week in fun [23:50] the PVR-250 seems to ignore it, and if it didn't, the time-base corrector would [23:50] this is odtv video rips [23:51] pretty much every computer capture card seems to ignore macrovision (either that or I've never run into a macrovision tape in my library of commercial VHS) [23:51] macrovision messes up automatic gain control in the VCR [23:51] ah no, it messes up the TV [23:52] so yeah, some boxes may ignore it [23:52] or the time-base corrector might be ignoring / blanking it out :P [23:52] time-base corrector blanks it [23:53] hey Famicoman [23:53] Famicoman: i'm finding the odtv videos of twit [23:54] coolio [23:54] alot of lost videos are there [23:54] like this week in fun [23:54] and twit fit [23:54] but if you've got a computer video capture card, it probably doesn't support/ignores Macrovision [23:55] alard: is there a list of problematic / not-done punchfork users? [23:56] dashcloud, depends really. I've had some people tell me their sub-$100 cards don't work well with macrovision, while some of the more expensive ones have workarounds or downright ignore it. I believe there were some lawsuits a while back [23:57] all my cards are really old- the PVR-250 I got in 2004 or so, and the HVR-950q I bought at least 3 years ago [23:58] how are you compressing the video? [23:59] the PVR-250 has a hardware MPEG-2 encoder, and the HVR-950q is a framegrabber- which means you've got raw video unless you encode it as something else during capture