#archiveteam 2014-01-22,Wed

↑back Search

Time Nickname Message
04:24 🔗 namespace Am I just misremembering how big this channel was or did it grow?
04:27 🔗 ivan` maybe a little? hard to tell from my irc logs
04:28 🔗 atg It's grown a lot over the past few years
04:28 🔗 ivan` namespace: if you're really interested in google groups I can provide some tips for getting started
04:28 🔗 namespace ivan`: Sure.
04:29 🔗 namespace (Keep in mind I'm graduating high school right now and the crunch time might be on. So I might not be able to get on it ASAP.)
04:29 🔗 ivan` I think it speaks some filthy JavaScript-only protocol and you'd have to get familiar with it using Chrome's browser inspector (F12, network tab)
04:30 🔗 ivan` then you can write some Python programs to try to grab a thread, or all threads in a group, or whatever granularity you decide
04:30 🔗 * namespace nods
04:31 🔗 ivan` then you'd have to plug that Python into a pipeline script; all of the past ones are available for perusing at https://github.com/ArchiveTeam
04:32 🔗 ivan` the way these work involves the pipeline grabbing a job from a tracker, doing something, then uploading the resulting data to an upload target with rsync, where it's later packed into a megawarc and sent off somewhere
04:32 🔗 ivan` anyway, getting way ahead of things there; let me or #archiveteam know if you need more help with the reverse-engineering
04:32 🔗 namespace In case you don't remember me, I did several gigs during Posterous. But I showed up from the HN ad, after all the hard stuff was done and it was entirely waiting.
04:33 🔗 namespace So doing this process from the start sounds fairly interesting. (And frustrating.)
04:33 🔗 ivan` yes
04:35 🔗 * namespace needs to get his IRC logs working again
04:35 🔗 ivan` google groups also has a ton of valuable non-usenet content that is almost certainly not backed up anywhere
04:35 🔗 ivan` non-google search engines can't even read it
04:36 🔗 namespace I didn't even think about that.
04:37 🔗 namespace I'm honestly surprised Groups has evaded the guillotine so long already.
04:40 🔗 namespace It produces almost no new content from what I can see, and they already shut down reader, which at least let them know what the technerati are interested in.
04:41 🔗 SketchCow The channel is full of awesome
04:43 🔗 namespace SketchCow: ?
04:51 🔗 joepie91 mm, google groups is shutting down?
04:51 🔗 namespace joepie91: Not yet.
04:51 🔗 SketchCow Just generally. Channel is full of awesome.
04:56 🔗 turnip I haven't made a pithy joke in a while, so it could be awesome-r
04:56 🔗 namespace Wait, there's no mirror of wikileaks?
05:04 🔗 ivan` that would be quite unlikely
05:04 🔗 ivan` does the archiveteam wiki say that or something
05:05 🔗 namespace ivan`: It talks about it like it doesn't have one.
05:05 🔗 namespace Which like you, struck me as highly unlikely.
05:05 🔗 namespace (Did anybody write down that trick we came up with of detecting spam with grep?)
05:19 🔗 trs80 google groups is going/
06:49 🔗 DFJustin hmm he left but there is a mirror https://archive.org/details/wikileaksarchive
06:58 🔗 Nemo_bis ivan`: yes, but it's quite hard to extract that usenet data from googl groups
06:59 🔗 Nemo_bis it would be nicer to have the datasets they used to feed it :P
06:59 🔗 Nemo_bis we have only one of those on archive.org IIRC
13:21 🔗 EpsilonRe Greetings
13:21 🔗 EpsilonRe Is there anyone here working on recovering USENET from the Google monopoly?
13:22 🔗 EpsilonRe I know you've archived some of the propriatery Google Groups stuff
13:23 🔗 EpsilonRe But it would be nice to have an alternative to Google when trying to track down 15 year old technical disscussions on obscure lanaguages, for use at citations in Reference artciles on sites like Wikipedia.
13:24 🔗 EpsilonRe Such an archive could also add the warning flags which Google seems unwilling to provide...
13:27 🔗 EpsilonRe Just leaving an unhelpful "Suppressed for abuse"
13:27 🔗 EpsilonRe Rather than an actual indication as to why a posting was abusive
13:42 🔗 dashcloud there's been some talk about it, but nothing concrete yet (certainly nothing that can be run from the Warrior)
13:42 🔗 EpsilonRe There is one specfic group alt.sex.fetish.robots that should be recovered
13:43 🔗 EpsilonRe for cultural reasons
13:43 🔗 EpsilonRe dashcloud: However, in archiving USENET there are some issues
13:43 🔗 dashcloud spam?
13:43 🔗 EpsilonRe That can be flagged
13:44 🔗 EpsilonRe Identifying content which would have been removed for libel or hate-speech is more complex
13:44 🔗 EpsilonRe Google Groups doesn't identify it seperatly in the 'suppressed for abuse'
13:45 🔗 EpsilonRe Nor would it be easy to identify material for which 'valid' cancels were issued by the original posters
13:46 🔗 EpsilonRe I know people that were imperonsated as well :(
13:46 🔗 EpsilonRe dashcloud; This is going to get lengthy... can we move this chat to -bs?
13:47 🔗 dashcloud sure
13:52 🔗 EpsilonRe Oh and side note... I would suggest someone looks into archiving the BBC BASIC and BB4W programmer groups on Yahoo!
13:54 🔗 EpsilonRe WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
13:54 🔗 dashcloud yahoo sucks
13:55 🔗 EpsilonRe True, but I would hate to see a lot valuable technical data lost,
13:55 🔗 EpsilonRe because Yahoo closed an abandoned group...
13:56 🔗 EpsilonRe (Btw what is the secret word for the Wiki? I was creating a new account)
13:57 🔗 dashcloud it's really "yahoo sucks"
13:58 🔗 EpsilonRe Not according to the wiki it isn't :(
13:59 🔗 EpsilonRe Ok folks , Creating a new account on your wiki, it needs a confrimation code?
13:59 🔗 Anon_ Hey, just passing by to ask if chanarchive will ever be up again.
14:00 🔗 EpsilonRe !seen sketchcow
14:01 🔗 EpsilonRe I was wanting to create new page on the Wiki concerning USENET archiving, but if your MediaWiki is like WMF ones, I can't do that without an account
14:02 🔗 Anon_ Alright, I take that as a no? I suppose I can still use the wayback machine for some threads.
14:03 🔗 dashcloud Anon_: SketchCow is probablythe only person who can tell you that
14:04 🔗 Anon_ Ah, thank you.
14:05 🔗 EpsilonRe I'm looking for SketchCow because of some concerns I had about material on the archive.org software collection
14:06 🔗 EpsilonRe (Namely one title that the emulation community here in the UK were specifcally told WAS NOT for archival yet)
14:06 🔗 EpsilonRe As I'm not the rights holder, I can't complain formally...
14:07 🔗 EpsilonRe (But I can certainly raise the issue about tracing them...)
14:08 🔗 EpsilonRe It's also sometimes suprising to find some material archived....
14:08 🔗 EpsilonRe At least you aren't the NSA
14:08 🔗 EpsilonRe XD
14:08 🔗 EpsilonRe And in relation to that I have a nasty feeling, that things will go to far the other way - i.e a ban on large scale archving of private communications :(
14:12 🔗 EpsilonRe Hmmm
14:12 🔗 EpsilonRe Still can't work out the code for the Wiki?
14:13 🔗 EpsilonRe WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
14:17 🔗 GLaDOS http://techcrunch.com/2014/01/21/when-goods-not-good-enough/
14:17 🔗 GLaDOS So Canvas and DrawQuest are closing
14:17 🔗 dashcloud GLaDOS: do you remember the secret word for the wiki?
14:18 🔗 GLaDOS yahoosucks
14:18 🔗 GLaDOS why would you stick a space in there?
14:18 🔗 GLaDOS Secret word, not secret words
14:25 🔗 EpsilonRe Oh
14:25 🔗 * EpsilonRe goes bright pinbk
14:25 🔗 dashcloud sorry about that
14:27 🔗 EpsilonRe So what you guys do is legal doxing?
14:27 🔗 EpsilonRe XD
14:28 🔗 EpsilonRe I.E Keeping stuff people forget to backup
14:29 🔗 Anon_ It's preserving things that should not die, in my opinion.
14:29 🔗 dashcloud I wouldn't call it doxing- that has a very harsh and personal tint to it- we go after websites, not people or individuals
14:30 🔗 EpsilonRe Anon_: OK
14:30 🔗 EpsilonRe Apologies ... It was the term that came to mind
14:30 🔗 EpsilonRe On something else...entrioyl
14:30 🔗 EpsilonRe Does your archiving get audited?
14:31 🔗 EpsilonRe So you know you haven't archived illegal porn for example?
14:34 🔗 dashcloud no, but I also don't worry about that- archive what you think needs archiving, and if there's a problem, it'll get handled
14:35 🔗 EpsilonRe dashcloud: OK Created an account
14:36 🔗 EpsilonRe Where's the best place to suggest a project?
14:36 🔗 EpsilonRe (I know how things usally (don't) work on WMF projects if it helps
14:36 🔗 EpsilonRe )
14:37 🔗 dashcloud just make a new page, fill in what you know, and mention it in the channel here
14:37 🔗 EpsilonRe OK
14:37 🔗 EpsilonRe I'm new to your wiki
14:58 🔗 EpsilonRe http://www.archiveteam.org/index.php?title=User:Sfan00_IMG/USENET
14:58 🔗 EpsilonRe It's a draft, and feedback would be appreciated
14:59 🔗 EpsilonRe Oh and not ver major - but someone should consider archiving - http://orteil.dashnet.org/cookieclicker/ ;)
15:00 🔗 midas you can archive small sites using #archivebot
15:02 🔗 EpsilonRe if you op access in that channel - I don't
15:02 🔗 EpsilonRe *have
15:03 🔗 midas ill add it
15:06 🔗 EpsilonRe BTW midas as you are here , Can you look into trying to track down the MSDN notes on Direct3D pre version 3?
15:06 🔗 EpsilonRe I note that a LOT of older MSDN content was removed around 2012...
15:06 🔗 EpsilonRe Including sadly the material which explained quite a lot about the internals of various portions of pre XP/NT Windows...
15:07 🔗 EpsilonRe It's out there SOMEWHERE
15:07 🔗 EpsilonRe Speaking of which if ftp.microsoft.com hasn't been archiveds it should be.... I've found some really ancient support files in there...
15:09 🔗 midas ftp.microsoft.com, joepie91 is grabbing that
15:10 🔗 midas msdn stuff, can check in a couple of hours, currently at work :p
15:11 🔗 EpsilonRe midas: I also wonder is anyone grabbed Sunsite... which was a mirror for a large number of other archives around 2000 I think
15:12 🔗 midas not sure, maybe someone else can anwser that
15:16 🔗 EpsilonRe it used to be something like mirror.sunsite.ic.ac.uk
15:19 🔗 DFJustin the plan is to eventually grab all ftp sites https://archive.org/details/ftpsites
15:19 🔗 DFJustin http://ascii.textfiles.com/archives/4199
15:24 🔗 godane another ftp: ftp://ftp.cs.washington.edu/
16:05 🔗 Dud1 Was there any archive of bebo?
16:08 🔗 EpsilonRe http://www.6581-8580.com/soasc_copyright.php - Anyone recall C64 BBS?
16:08 🔗 EpsilonRe or Amiga?- http://www.paula8364.com/
16:17 🔗 SadDM Lately I've come to understand that one of my hobby's biggest forums probably doesn't get fully crawled by the wayback machine due to the fact that their custom software doesn't have a full thread index.
16:17 🔗 SadDM So here's my question... If I manage to scrape the message board in question into WARC format, can the Archive Team help me find a safe home for the data?
16:17 🔗 SadDM I'm a wanderer on the internet... will your tribe accept me?
16:18 🔗 DFJustin sure
16:18 🔗 DFJustin create a free account on archive.org and upload the warc
16:19 🔗 SadDM For real... simple as that?
16:19 🔗 DFJustin that will get the file hosted at least, for it to get into the wayback machine an admin has to move it
16:19 🔗 DFJustin but there are a couple admins in here
16:21 🔗 DFJustin but yeah you can basically host anything at archive.org for free
16:24 🔗 DFJustin if you upload like a terabyte of random numbers someone might have a word with you but as long as there's some kind of value to the data
16:25 🔗 SadDM Huh... OK.
16:25 🔗 SadDM So this whole Archive Team thing is totally loosey-goosey then.
16:25 🔗 SadDM Everybody should just come on in and save the stuff that they care about.
16:26 🔗 DFJustin pretty much, we do have some best practices to follow http://archiveteam.org/index.php?title=Wget#Creating_WARC_with_wget
16:26 🔗 DFJustin for 'simple' sites we have a bot in #archivebot that can do an automated crawl
16:27 🔗 SadDM Thanks, I'm familiar with that page.
16:27 🔗 SadDM Unfortunatly, the site in question isn't really amenable to a traditional crawl
16:28 🔗 DFJustin right, but you will likely come across more
16:28 🔗 SadDM *evil grin* Oh yes... I dare say I will.
16:30 🔗 DFJustin but yeah if people give us a heads up about dying websites then we help out but otherwise it's grab what you know about and are interested in
16:31 🔗 DFJustin people are doing genealogy sites, gardening forums, DOOM levels, glenn beck videos, etc.
16:32 🔗 SadDM That's super-cool... thanks.
16:33 🔗 SadDM So, is irc the main communication then? What about the wiki, is that more of a community outreach thing?
16:36 🔗 DFJustin wiki seems to be more of a knowledge dump
16:36 🔗 SketchCow I wouldn't say we're THAT loosey goosey
16:36 🔗 SketchCow In fact, we've worked pretty hard to make sure that our larger scale efforts are anything but.
16:38 🔗 DFJustin yeah there's a lot more organization that goes into the huge projects like MobileMe
16:47 🔗 Nemo_bis speaking of which, 555 was asking how to get wikiteam dumping into the warrior: is there documentation anywhere on what are the requirements and how todo it?
16:52 🔗 chfoo Dud1: if i recall correctly, not yet. we need to be saving bebo soon.
16:53 🔗 Dud1 I tried to go onto mine today and I saw this http://www.bebo.com/#faq
16:55 🔗 chfoo Dud1: yes the first time it was mentioned it was already shutdown, but archive.bebo.com is still active
16:57 🔗 chfoo i guess i should write grab scripts for it now
16:59 🔗 Dud1 Ah right okay.
17:45 🔗 SketchCow I do like the name SadDM
17:46 🔗 SadDM Yeah well, there have been tears when I click on dead links.
17:48 🔗 SadDM Re: loosey-goosieness... yeah, I guess I was surprised about the breadth of organizational levels
17:49 🔗 SadDM I was familiar with the teams larger, more organized efforts
17:49 🔗 SadDM but I wasn't too sure how a single free-agent played into what you do.
17:50 🔗 SadDM And by free-agent, I guess I mean a guy like me with very specific interests.
17:57 🔗 yipdw the more the better
18:01 🔗 SketchCow Yes, the more the better.
18:02 🔗 SketchCow We have tools that can help you, but in fact you are the advocate for your thing.
18:02 🔗 SketchCow For example, if there are sites that are out there that are resources, you can add them to archivebot.
18:03 🔗 SketchCow If you are aware of a larger site shutting down, we can grab them and add them to crawls going to the archive.org machine.
18:05 🔗 Nemo_bis DFJustin> that will get the file hosted at least, for it to get into the wayback machine an admin has to move it <-- move where? is this visible for normal users?
18:06 🔗 SketchCow Which is this
18:06 🔗 DFJustin getting warc items into a web collection
18:06 🔗 DFJustin and mediatype
18:06 🔗 SketchCow Yeah, that's me.
18:07 🔗 SketchCow My kingdom for a AAA battery
18:07 🔗 Nemo_bis DFJustin: any web (sub)collection?
18:07 🔗 DFJustin I don't actually know how it works under the hood
19:15 🔗 Stiletto saw schemer.com is going down, it's kinda like pinterest for to-do lists
19:16 🔗 Stiletto I'm sure y'all already know
19:21 🔗 DFJustin Stiletto: http://archivebot.at.ninjawedding.org:4567/
19:23 🔗 Stiletto already done, or WIP?! welp I expected as much
19:29 🔗 DFJustin wip
19:30 🔗 DFJustin er damn
20:31 🔗 SketchCow Do we have a dogster/catster project channel?
20:32 🔗 SketchCow Otherwise, #rawdogster
21:03 🔗 * SadDM slaps SadDM around a bit with a large fishbot
21:05 🔗 SketchCow Did the explanation make sense to you, SadDM?
21:11 🔗 xmc ze's gone
21:24 🔗 SketchCow Aww
22:04 🔗 SketchCow https://twitter.com/kevingibbon/status/426111003739185152
22:04 🔗 SketchCow See, this is how you make friends.
22:10 🔗 Baljem haha
22:10 🔗 Baljem nice idea, but yeah. I think you're safe there.
22:30 🔗 xmc lol
22:31 🔗 SketchCow Co-founder jumped in
22:33 🔗 RedType_ rofl
22:34 🔗 xmc their website is unusable in my browser
22:34 🔗 xmc it keeps jumping to the top of the page
22:34 🔗 xmc genious
22:35 🔗 RedType_ i requested an invite and they told me they were sending me a link to their iphone app
22:35 🔗 RedType_ problem: i dont have any ios devices
22:35 🔗 Baljem see, you're obviously not in the target market
22:36 🔗 Baljem everybody knows that anybody who ever wants to send parcels has at least half a dozen iOS devices, right?
22:37 🔗 RedType_ i think it's a cool concept but they'll really only make it to 2016 if they get bought. all it takes is for one of the big 3 to photo copy their business model and *sad horn*
22:42 🔗 SketchCow USPS and UPS and Fedex actually already do this
22:42 🔗 SketchCow They just don't loss-leader
22:42 🔗 SketchCow Because, you know, they have a profit
22:59 🔗 yipdw Shyp: Making MMORPGs A Little More Real Since 2014
23:00 🔗 mistym SketchCow: You're just jealous because you can't be a Shyp hero
23:01 🔗 mistym (I already hate myself for ironically quoting their twee jargon)
23:02 🔗 SketchCow Shyp Hero
23:02 🔗 SketchCow Well, who knows what we'll expand into when Archive Team gets that round of VC funding
23:02 🔗 SketchCow I've mentioned it before, I had a VC come after archive team
23:02 🔗 SketchCow ha ha, I ruined his day
23:03 🔗 SketchCow I mean, I really got in there and just noogied the fuck out of him
23:03 🔗 ivan` tell us about your traction
23:04 🔗 ivan` I am kind of curious how a VC thought archiveteam was a fundable business
23:04 🔗 SketchCow Do what we do, for money
23:05 🔗 yipdw I was a Shyp Hero once
23:05 🔗 yipdw in Kerbal Space Program
23:12 🔗 yipdw poo, the "be a Shyp hero" page doesn't have an entry for "I am proficient in parkour"
23:12 🔗 yipdw when they ask you for your mode of transportation
23:13 🔗 yipdw and I'll stop before the off-topic siren goes off
23:19 🔗 SketchCow I tell you, five year of archiveteam
23:19 🔗 SketchCow and archivebot is the single funniest thing
23:20 🔗 SketchCow I go to see how archivebot is going
23:20 🔗 yipdw how so
23:20 🔗 SketchCow aaaaaaand shyp
23:20 🔗 yipdw oh, heh
23:20 🔗 SketchCow Every time I see stuff like that, it's always got me laughing "Well, no matter what, asshole, we snapshotted for you"
23:20 🔗 SketchCow I use it to often know what's going on in the world.
23:20 🔗 SketchCow Celebrity website. Oh, what did that fuck say now
23:21 🔗 SketchCow Financial news. Oh what terrible thing was announced
23:21 🔗 SketchCow General website with a political bend. Oh, someone's going to wake up tomorrow wondering why their voicemail is full and they have 13,000 e-mails
23:22 🔗 SketchCow Anyway, far and away my favorite thing
23:23 🔗 xmc hahaha
23:23 🔗 xmc archivebot is great
23:24 🔗 yipdw emergent behaviors are awesome

irclogger-viewer