[04:24] Am I just misremembering how big this channel was or did it grow? [04:27] maybe a little? hard to tell from my irc logs [04:28] It's grown a lot over the past few years [04:28] namespace: if you're really interested in google groups I can provide some tips for getting started [04:28] ivan`: Sure. [04:29] (Keep in mind I'm graduating high school right now and the crunch time might be on. So I might not be able to get on it ASAP.) [04:29] I think it speaks some filthy JavaScript-only protocol and you'd have to get familiar with it using Chrome's browser inspector (F12, network tab) [04:30] then you can write some Python programs to try to grab a thread, or all threads in a group, or whatever granularity you decide [04:30] * namespace nods [04:31] then you'd have to plug that Python into a pipeline script; all of the past ones are available for perusing at https://github.com/ArchiveTeam [04:32] the way these work involves the pipeline grabbing a job from a tracker, doing something, then uploading the resulting data to an upload target with rsync, where it's later packed into a megawarc and sent off somewhere [04:32] anyway, getting way ahead of things there; let me or #archiveteam know if you need more help with the reverse-engineering [04:32] In case you don't remember me, I did several gigs during Posterous. But I showed up from the HN ad, after all the hard stuff was done and it was entirely waiting. [04:33] So doing this process from the start sounds fairly interesting. (And frustrating.) [04:33] yes [04:35] * namespace needs to get his IRC logs working again [04:35] google groups also has a ton of valuable non-usenet content that is almost certainly not backed up anywhere [04:35] non-google search engines can't even read it [04:36] I didn't even think about that. [04:37] I'm honestly surprised Groups has evaded the guillotine so long already. [04:40] It produces almost no new content from what I can see, and they already shut down reader, which at least let them know what the technerati are interested in. [04:41] The channel is full of awesome [04:43] SketchCow: ? [04:51] mm, google groups is shutting down? [04:51] joepie91: Not yet. [04:51] Just generally. Channel is full of awesome. [04:56] I haven't made a pithy joke in a while, so it could be awesome-r [04:56] Wait, there's no mirror of wikileaks? [05:04] that would be quite unlikely [05:04] does the archiveteam wiki say that or something [05:05] ivan`: It talks about it like it doesn't have one. [05:05] Which like you, struck me as highly unlikely. [05:05] (Did anybody write down that trick we came up with of detecting spam with grep?) [05:19] google groups is going/ [06:49] hmm he left but there is a mirror https://archive.org/details/wikileaksarchive [06:58] ivan`: yes, but it's quite hard to extract that usenet data from googl groups [06:59] it would be nicer to have the datasets they used to feed it :P [06:59] we have only one of those on archive.org IIRC [13:21] Greetings [13:21] Is there anyone here working on recovering USENET from the Google monopoly? [13:22] I know you've archived some of the propriatery Google Groups stuff [13:23] But it would be nice to have an alternative to Google when trying to track down 15 year old technical disscussions on obscure lanaguages, for use at citations in Reference artciles on sites like Wikipedia. [13:24] Such an archive could also add the warning flags which Google seems unwilling to provide... [13:27] Just leaving an unhelpful "Suppressed for abuse" [13:27] Rather than an actual indication as to why a posting was abusive [13:42] there's been some talk about it, but nothing concrete yet (certainly nothing that can be run from the Warrior) [13:42] There is one specfic group alt.sex.fetish.robots that should be recovered [13:43] for cultural reasons [13:43] dashcloud: However, in archiving USENET there are some issues [13:43] spam? [13:43] That can be flagged [13:44] Identifying content which would have been removed for libel or hate-speech is more complex [13:44] Google Groups doesn't identify it seperatly in the 'suppressed for abuse' [13:45] Nor would it be easy to identify material for which 'valid' cancels were issued by the original posters [13:46] I know people that were imperonsated as well :( [13:46] dashcloud; This is going to get lengthy... can we move this chat to -bs? [13:47] sure [13:52] Oh and side note... I would suggest someone looks into archiving the BBC BASIC and BB4W programmer groups on Yahoo! [13:54] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [13:54] yahoo sucks [13:55] True, but I would hate to see a lot valuable technical data lost, [13:55] because Yahoo closed an abandoned group... [13:56] (Btw what is the secret word for the Wiki? I was creating a new account) [13:57] it's really "yahoo sucks" [13:58] Not according to the wiki it isn't :( [13:59] Ok folks , Creating a new account on your wiki, it needs a confrimation code? [13:59] Hey, just passing by to ask if chanarchive will ever be up again. [14:00] !seen sketchcow [14:01] I was wanting to create new page on the Wiki concerning USENET archiving, but if your MediaWiki is like WMF ones, I can't do that without an account [14:02] Alright, I take that as a no? I suppose I can still use the wayback machine for some threads. [14:03] Anon_: SketchCow is probablythe only person who can tell you that [14:04] Ah, thank you. [14:05] I'm looking for SketchCow because of some concerns I had about material on the archive.org software collection [14:06] (Namely one title that the emulation community here in the UK were specifcally told WAS NOT for archival yet) [14:06] As I'm not the rights holder, I can't complain formally... [14:07] (But I can certainly raise the issue about tracing them...) [14:08] It's also sometimes suprising to find some material archived.... [14:08] At least you aren't the NSA [14:08] XD [14:08] And in relation to that I have a nasty feeling, that things will go to far the other way - i.e a ban on large scale archving of private communications :( [14:12] Hmmm [14:12] Still can't work out the code for the Wiki? [14:13] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [14:17] http://techcrunch.com/2014/01/21/when-goods-not-good-enough/ [14:17] So Canvas and DrawQuest are closing [14:17] GLaDOS: do you remember the secret word for the wiki? [14:18] yahoosucks [14:18] why would you stick a space in there? [14:18] Secret word, not secret words [14:25] Oh [14:25] * EpsilonRe goes bright pinbk [14:25] sorry about that [14:27] So what you guys do is legal doxing? [14:27] XD [14:28] I.E Keeping stuff people forget to backup [14:29] It's preserving things that should not die, in my opinion. [14:29] I wouldn't call it doxing- that has a very harsh and personal tint to it- we go after websites, not people or individuals [14:30] Anon_: OK [14:30] Apologies ... It was the term that came to mind [14:30] On something else...entrioyl [14:30] Does your archiving get audited? [14:31] So you know you haven't archived illegal porn for example? [14:34] no, but I also don't worry about that- archive what you think needs archiving, and if there's a problem, it'll get handled [14:35] dashcloud: OK Created an account [14:36] Where's the best place to suggest a project? [14:36] (I know how things usally (don't) work on WMF projects if it helps [14:36] ) [14:37] just make a new page, fill in what you know, and mention it in the channel here [14:37] OK [14:37] I'm new to your wiki [14:58] http://www.archiveteam.org/index.php?title=User:Sfan00_IMG/USENET [14:58] It's a draft, and feedback would be appreciated [14:59] Oh and not ver major - but someone should consider archiving - http://orteil.dashnet.org/cookieclicker/ ;) [15:00] you can archive small sites using #archivebot [15:02] if you op access in that channel - I don't [15:02] *have [15:03] ill add it [15:06] BTW midas as you are here , Can you look into trying to track down the MSDN notes on Direct3D pre version 3? [15:06] I note that a LOT of older MSDN content was removed around 2012... [15:06] Including sadly the material which explained quite a lot about the internals of various portions of pre XP/NT Windows... [15:07] It's out there SOMEWHERE [15:07] Speaking of which if ftp.microsoft.com hasn't been archiveds it should be.... I've found some really ancient support files in there... [15:09] ftp.microsoft.com, joepie91 is grabbing that [15:10] msdn stuff, can check in a couple of hours, currently at work :p [15:11] midas: I also wonder is anyone grabbed Sunsite... which was a mirror for a large number of other archives around 2000 I think [15:12] not sure, maybe someone else can anwser that [15:16] it used to be something like mirror.sunsite.ic.ac.uk [15:19] the plan is to eventually grab all ftp sites https://archive.org/details/ftpsites [15:19] http://ascii.textfiles.com/archives/4199 [15:24] another ftp: ftp://ftp.cs.washington.edu/ [16:05] Was there any archive of bebo? [16:08] http://www.6581-8580.com/soasc_copyright.php - Anyone recall C64 BBS? [16:08] or Amiga?- http://www.paula8364.com/ [16:17] Lately I've come to understand that one of my hobby's biggest forums probably doesn't get fully crawled by the wayback machine due to the fact that their custom software doesn't have a full thread index. [16:17] So here's my question... If I manage to scrape the message board in question into WARC format, can the Archive Team help me find a safe home for the data? [16:17] I'm a wanderer on the internet... will your tribe accept me? [16:18] sure [16:18] create a free account on archive.org and upload the warc [16:19] For real... simple as that? [16:19] that will get the file hosted at least, for it to get into the wayback machine an admin has to move it [16:19] but there are a couple admins in here [16:21] but yeah you can basically host anything at archive.org for free [16:24] if you upload like a terabyte of random numbers someone might have a word with you but as long as there's some kind of value to the data [16:25] Huh... OK. [16:25] So this whole Archive Team thing is totally loosey-goosey then. [16:25] Everybody should just come on in and save the stuff that they care about. [16:26] pretty much, we do have some best practices to follow http://archiveteam.org/index.php?title=Wget#Creating_WARC_with_wget [16:26] for 'simple' sites we have a bot in #archivebot that can do an automated crawl [16:27] Thanks, I'm familiar with that page. [16:27] Unfortunatly, the site in question isn't really amenable to a traditional crawl [16:28] right, but you will likely come across more [16:28] *evil grin* Oh yes... I dare say I will. [16:30] but yeah if people give us a heads up about dying websites then we help out but otherwise it's grab what you know about and are interested in [16:31] people are doing genealogy sites, gardening forums, DOOM levels, glenn beck videos, etc. [16:32] That's super-cool... thanks. [16:33] So, is irc the main communication then? What about the wiki, is that more of a community outreach thing? [16:36] wiki seems to be more of a knowledge dump [16:36] I wouldn't say we're THAT loosey goosey [16:36] In fact, we've worked pretty hard to make sure that our larger scale efforts are anything but. [16:38] yeah there's a lot more organization that goes into the huge projects like MobileMe [16:47] speaking of which, 555 was asking how to get wikiteam dumping into the warrior: is there documentation anywhere on what are the requirements and how todo it? [16:52] Dud1: if i recall correctly, not yet. we need to be saving bebo soon. [16:53] I tried to go onto mine today and I saw this http://www.bebo.com/#faq [16:55] Dud1: yes the first time it was mentioned it was already shutdown, but archive.bebo.com is still active [16:57] i guess i should write grab scripts for it now [16:59] Ah right okay. [17:45] I do like the name SadDM [17:46] Yeah well, there have been tears when I click on dead links. [17:48] Re: loosey-goosieness... yeah, I guess I was surprised about the breadth of organizational levels [17:49] I was familiar with the teams larger, more organized efforts [17:49] but I wasn't too sure how a single free-agent played into what you do. [17:50] And by free-agent, I guess I mean a guy like me with very specific interests. [17:57] the more the better [18:01] Yes, the more the better. [18:02] We have tools that can help you, but in fact you are the advocate for your thing. [18:02] For example, if there are sites that are out there that are resources, you can add them to archivebot. [18:03] If you are aware of a larger site shutting down, we can grab them and add them to crawls going to the archive.org machine. [18:05] DFJustin> that will get the file hosted at least, for it to get into the wayback machine an admin has to move it <-- move where? is this visible for normal users? [18:06] Which is this [18:06] getting warc items into a web collection [18:06] and mediatype [18:06] Yeah, that's me. [18:07] My kingdom for a AAA battery [18:07] DFJustin: any web (sub)collection? [18:07] I don't actually know how it works under the hood [19:15] saw schemer.com is going down, it's kinda like pinterest for to-do lists [19:16] I'm sure y'all already know [19:21] Stiletto: http://archivebot.at.ninjawedding.org:4567/ [19:23] already done, or WIP?! welp I expected as much [19:29] wip [19:30] er damn [20:31] Do we have a dogster/catster project channel? [20:32] Otherwise, #rawdogster [21:03] * SadDM slaps SadDM around a bit with a large fishbot [21:05] Did the explanation make sense to you, SadDM? [21:11] ze's gone [21:24] Aww [22:04] https://twitter.com/kevingibbon/status/426111003739185152 [22:04] See, this is how you make friends. [22:10] haha [22:10] nice idea, but yeah. I think you're safe there. [22:30] lol [22:31] Co-founder jumped in [22:33] rofl [22:34] their website is unusable in my browser [22:34] it keeps jumping to the top of the page [22:34] genious [22:35] i requested an invite and they told me they were sending me a link to their iphone app [22:35] problem: i dont have any ios devices [22:35] see, you're obviously not in the target market [22:36] everybody knows that anybody who ever wants to send parcels has at least half a dozen iOS devices, right? [22:37] i think it's a cool concept but they'll really only make it to 2016 if they get bought. all it takes is for one of the big 3 to photo copy their business model and *sad horn* [22:42] USPS and UPS and Fedex actually already do this [22:42] They just don't loss-leader [22:42] Because, you know, they have a profit [22:59] Shyp: Making MMORPGs A Little More Real Since 2014 [23:00] SketchCow: You're just jealous because you can't be a Shyp hero [23:01] (I already hate myself for ironically quoting their twee jargon) [23:02] Shyp Hero [23:02] Well, who knows what we'll expand into when Archive Team gets that round of VC funding [23:02] I've mentioned it before, I had a VC come after archive team [23:02] ha ha, I ruined his day [23:03] I mean, I really got in there and just noogied the fuck out of him [23:03] tell us about your traction [23:04] I am kind of curious how a VC thought archiveteam was a fundable business [23:04] Do what we do, for money [23:05] I was a Shyp Hero once [23:05] in Kerbal Space Program [23:12] poo, the "be a Shyp hero" page doesn't have an entry for "I am proficient in parkour" [23:12] when they ask you for your mode of transportation [23:13] and I'll stop before the off-topic siren goes off [23:19] I tell you, five year of archiveteam [23:19] and archivebot is the single funniest thing [23:20] I go to see how archivebot is going [23:20] how so [23:20] aaaaaaand shyp [23:20] oh, heh [23:20] Every time I see stuff like that, it's always got me laughing "Well, no matter what, asshole, we snapshotted for you" [23:20] I use it to often know what's going on in the world. [23:20] Celebrity website. Oh, what did that fuck say now [23:21] Financial news. Oh what terrible thing was announced [23:21] General website with a political bend. Oh, someone's going to wake up tomorrow wondering why their voicemail is full and they have 13,000 e-mails [23:22] Anyway, far and away my favorite thing [23:23] hahaha [23:23] archivebot is great [23:24] emergent behaviors are awesome