[00:13] *** xk_id has quit IRC (Read error: Connection reset by peer)
[00:13] *** xk_id_ has joined #archiveteam
[00:16] *** xk_id_ has quit IRC (Remote host closed the connection)
[00:16] *** K4k has quit IRC (Read error: Operation timed out)
[00:18] *** dashcloud has quit IRC (Ping timeout: 272 seconds)
[00:28] *** dashcloud has joined #archiveteam
[01:26] *** arbin has quit IRC (Read error: Connection reset by peer)
[01:28] *** arbin has joined #archiveteam
[01:30] *** __uu has joined #archiveteam
[01:31] *** xk_id has joined #archiveteam
[01:35] *** mistym_ has joined #archiveteam
[01:40] *** __uu has quit IRC (Ping timeout: 265 seconds)
[01:42] *** mistym has quit IRC (Read error: Operation timed out)
[01:56] *** mistym_ has quit IRC (Remote host closed the connection)
[02:03] *** dashcloud has quit IRC (Read error: Operation timed out)
[02:09] *** __uu has joined #archiveteam
[02:10] *** dashcloud has joined #archiveteam
[02:11] *** __uu_ has joined #archiveteam
[02:13] *** __uu_ has quit IRC (Client Quit)
[02:24] *** philpem has quit IRC (Ping timeout: 272 seconds)
[02:29] *** primus104 has quit IRC (Leaving.)
[03:40] *** kyan_ has joined #archiveteam
[03:40] *** godane has quit IRC (Ping timeout: 272 seconds)
[03:42] *** kyan has quit IRC (Ping timeout: 258 seconds)
[03:56] *** godane has joined #archiveteam
[03:57] *** mib_0n6by has joined #archiveteam
[03:57] <mib_0n6by> Howdy - if I know of a website that is going down relatively soon, who do I talk to to possibly preserve it?
[03:58] *** Lord_Nigh has quit IRC (Read error: Operation timed out)
[03:59] <Ctrl-S> us i guess
[03:59] <Ctrl-S> what website is it?
[04:01] *** Lord_Nigh has joined #archiveteam
[04:01] <Atluxity> talk in channel, that way more people can get involved if need be
[04:02] <Atluxity> also, greetings! and thanks for showing up
[04:04] <mib_0n6by> Sorry
[04:04] <mib_0n6by> kb.berkeley.edu
[04:05] <Atluxity> do you think it requires a lot of storage?
[04:05] <mib_0n6by> Mostly text and few images.
[04:05] <Atluxity> looks like mostly text
[04:05] <mib_0n6by> No large files.
[04:06] <Atluxity> looks like a job for archiveteambot
[04:06] <Atluxity> do you agree Ctrl-S ?
[04:06] <Ctrl-S> I have no idea
[04:06] <Atluxity> ah, ok
[04:07] <Ctrl-S> https://kb.berkeley.edu/page.php?id=23247
[04:07] <mib_0n6by> Hmm?
[04:07] <Ctrl-S> https://kb.berkeley.edu/page.php?id=23243
[04:07] <Ctrl-S> might be sequential numbering for articles?
[04:08] <mib_0n6by> It is, but updates to articles are not.
[04:08] <mib_0n6by> And there are a number of subsites.
[04:08] <Atluxity> I have added the site to a bot used for archiving
[04:09] <Ctrl-S> sine it's a university hosting it, might we be able to ask the admins about archiving it locally?
[04:09] <Atluxity> mib_0n6by: do you know when it will go offline?
[04:09] <mib_0n6by> Relatively soon.
[04:09] <Ctrl-S> might be able to mail a HDD?
[04:09] <mib_0n6by> Ctrl-S: it is a UCB page hosted by the University of Wisconsin.
[04:10] <mib_0n6by> Easier to simply grab a copy as the entire site shouldn't be that large.
[04:10] <Ctrl-S> I'm pretty clueless about these matters
[04:10] <mib_0n6by> Trust me when I say that the site is small enough to just grab as opposed to waiting on the University to provide a copy.
[04:11] <mib_0n6by> (which would be a low priority and would likely take longer than just wgetting the whole thing.)
[04:11] <Atluxity> yeah
[04:11] <Atluxity> local copy if often not the best choice
[04:12] <mib_0n6by> Archiving the site sooner is better than not.
[04:12] <Atluxity> I have added the site to an archiveing bot
[04:13] <mib_0n6by> Thank you :)
[04:13] <Atluxity> and thank you
[04:13] <Ctrl-S> any other sites/subsites you know of that might be in need of archiving?
[04:14] <mib_0n6by> From the University?
[04:14] <Ctrl-S> anywhere really
[04:14] <mib_0n6by> I contacted Jason Scott a while ago about a private torrent site.
[04:15] <mib_0n6by> berkeley.edu is undergoing a complete site redesign soon, which means everything currently there may no longer be available or completely broken in a few months (this is for the main site only, departmental subsites are a different affair.)
[04:16] <Ctrl-S> I guess that means *.berkely needs archiving
[04:16] <mib_0n6by> Unknown ETA on the site change...
[04:16] *** Silent700 has left 
[04:17] <mib_0n6by> How do you guys handle overlap with the Archive.org WayBackMachine?
[04:17] <pikhq_> mib_0n6by: Whaddya mean, overlap? When possible the stuff we save gets shoved on there.
[04:18] <Ctrl-S> My understanding is that anything that these guys arcive gets shoved onto archive.org if at all possible
[04:19] <Atluxity> aren't we the voulunteer guirilia warriors of archive.org? Acting by ourself, but hope we do archive.org's biddings
[04:19] <Ctrl-S> we're just a bit more aggressive/proactive at fetching stuff
[04:19] <mib_0n6by> Overlap... They have their own web spiders for I guess more casual site grabs. Guess you guys pull full sites and if it is a current / recent copy, they wouldn't have the depth nor record of it at that time anyway.
[04:19] <mib_0n6by> Ya...
[04:19] <mib_0n6by> Forget you guys are a rogue branch of bad asses ;)
[04:19] <Atluxity> :D
[04:19] <Ctrl-S> They use outdated systems like robots.txt
[04:20] <Atluxity> robots.txt are made for one thing, to be archived
[04:20] <Ctrl-S> exactly
[04:20] <Ctrl-S> also to point out interesting things
[04:20] <pikhq_> Eh, robots.txt aren't "outdated". Just completely at odds with archiveteam.
[04:20] <pikhq_> Though I suppose understandable archive.org listens to them; probably makes their legal standing rather less white-knuckle.
[04:20] <Atluxity> altought they sometime point to redirect loops :\
[04:20] <Ctrl-S> it was invented when robots could actually overload sites
[04:21] <mib_0n6by> Robots.txt was always a sign in the road and not even a legally binding one at that. 
[04:21] <Ctrl-S> or break networks
[04:21] <pikhq_> Yeah, but having an easy "just opt out" thing probably significantly reduces the random crazies.
[04:21] <mib_0n6by> It doesn't stop you guys :P
[04:21] <pikhq_> (no accounting for insanity though.)
[04:22] <pikhq_> mib_0n6by: Yeah, but what're they gonna do, sue a bunch of random folks?
[04:22] <Ctrl-S> in a bunch of random countries
[04:22] <pikhq_> Who may or may not be identifiable.
[04:22] <mib_0n6by> Does robots.txt have any legal basis? At worst you guys are running a friendly DDOS archive attack.
[04:22] <Ctrl-S> and will probably invode the streisand effect if bothered
[04:23] <pikhq_> You gotta *really* piss off a big company to get that sort of wide-scatter individual lawsuit going.
[04:23] <pikhq_> mib_0n6by: Not really, though I suspect in a court of law you could at least *argue* that a lack of robots.txt is equal to saying "hey, do whatever you want".
[04:23] <Ctrl-S> it'd probably be cheaper to just give us the drives the data is on than to sue us
[04:24] <mib_0n6by> When was any company actually reasonable?
[04:24] <Ctrl-S> never, but they like money a whole lot
[04:24] <pikhq_> Now, I suppose there's a chance that Yahoo! does that the next time they bring down a service.
[04:24] <mib_0n6by> They don't care about things such as culture heritage, memory and understanding history though.
[04:25] <Ctrl-S> Bad PR&have to pay lawyers
[04:25] <mib_0n6by> Ya... Yahoo! is still working through the bad press from shutting down geocities /sarcasm.
[04:26] <Ctrl-S> >Have to pay lawyere. >PAY
[04:27] <mib_0n6by> That assumes that corporations are a thinking beast that have morals, values and cares.
[04:28] <mib_0n6by> Much less ones that align with you.
[04:28] <Ctrl-S> they care about getting more money
[04:28] <mib_0n6by> Which preserving a cultural heritage obviously allows them to collect.
[04:29] <Ctrl-S> i mean there is a financial downside to lawsuits
[04:29] <Ctrl-S> they don't give one shit about culture
[04:30] *** mib_0n6by has left 
[04:33] *** kyan_ is now known as kyan
[04:36] <balrog> !a http://www.reddit.com/r/frc/ --phantomjs
[04:36] <balrog> oops
[04:41] <yipdw> I was going to say that archiveteam projects can be construed in the US as a violation of the CFAA if a website's ToS has anti-DoS provisions
[04:41] <yipdw> but the CFAA is so broad, fuck it
[04:42] <yipdw> I'm sure there's a way you can construe that law so that you can get arrested for typing
[04:47] <Ctrl-S> I believe we'd probably not be worth suing, and the EFF would be all over the case
[04:48] <Ctrl-S> Police would consider it not worth their time, since we are always careful to not overload the site
[04:49] <Atluxity> police? do they get involved when lawsuit?
[04:49] <Atluxity> or maybe you thought two different scenarious
[04:50] <Ctrl-S> yes
[04:50] <Ctrl-S> either a lawsuit or contacting the feds over that law
[04:51] <Atluxity> I actually have access to a pretty good legal fund and a great lawyer if I was to be targeted... but doubt it very much
[04:58] <yipdw> I usually bring up the lawsuit line in a "psh who cares" fashion
[04:58] <yipdw> it's roughly on the same level of concern as jaywalking, and far less dangerous
[04:58] <Ctrl-S> p. much
[04:59] <yipdw> between getting hit with Stephen Heymann or getting hit with a car I'll take Heymann
[04:59] <yipdw> at least you can damage Heymann
[04:59] <yipdw> oh right I have +o
[05:00] <yipdw> woop woop woop off topic siren
[05:10] *** aaaaaaaaa has quit IRC (Leaving)
[05:26] *** StartAway is now known as Start
[05:29] *** Start is now known as StartAway
[06:07] *** mistym has joined #archiveteam
[06:34] *** dashcloud has quit IRC (Read error: Operation timed out)
[06:34] *** dashcloud has joined #archiveteam
[07:12] <SketchCow> YEAH
[07:13] <SketchCow> My MS-DOS thing has finished
[07:13] <SketchCow> All the booting verified, and the script that hit the Mobygames site now does a great job
[07:30] *** dashcloud has quit IRC (Read error: Operation timed out)
[07:34] *** brayden_ has joined #archiveteam
[07:37] *** lytv has quit IRC (Read error: Operation timed out)
[07:38] *** lytv has joined #archiveteam
[07:39] *** dashcloud has joined #archiveteam
[07:40] *** brayden has quit IRC (Read error: Operation timed out)
[07:42] *** dashcloud has quit IRC (Read error: Operation timed out)
[07:45] *** dashcloud has joined #archiveteam
[08:26] *** primus104 has joined #archiveteam
[08:39] *** philpem has joined #archiveteam
[08:40] *** kris33 has joined #archiveteam
[09:16] *** dashcloud has quit IRC (Read error: Operation timed out)
[09:19] *** dashcloud has joined #archiveteam
[09:47] *** BlueMaxim has quit IRC (Quit: Leaving)
[10:16] *** mistym has quit IRC (Remote host closed the connection)
[10:24] *** kris33 has quit IRC (Textual IRC Client: www.textualapp.com)
[10:27] *** brayden_ has quit IRC (Ping timeout: 606 seconds)
[10:37] *** Swizzle_ has joined #archiveteam
[10:41] *** schbirid has joined #archiveteam
[10:44] *** Swizzle has quit IRC (Read error: Operation timed out)
[10:55] *** Control-S has joined #archiveteam
[11:03] *** Ctrl-S has quit IRC (Read error: Operation timed out)
[11:03] *** Control-S is now known as Ctrl-S
[12:03] *** Ymgve has joined #archiveteam
[12:31] *** brayden has joined #archiveteam
[13:04] *** lbft_ has quit IRC (Ping timeout: 258 seconds)
[13:21] *** lbft has joined #archiveteam
[13:56] *** bauruine has quit IRC (Ping timeout: 265 seconds)
[14:01] *** bauruine has joined #archiveteam
[14:56] *** primus105 has joined #archiveteam
[15:02] *** primus104 has quit IRC (Read error: Operation timed out)
[15:12] *** archvtyp1 has joined #archiveteam
[15:13] *** archvtype has quit IRC (Read error: Operation timed out)
[15:33] *** BiggieJon has joined #archiveteam
[15:37] *** BiggieJo1 has quit IRC (Read error: Operation timed out)
[15:41] *** ohhdemgir has quit IRC (Leaving)
[16:17] *** toad1 has joined #archiveteam
[16:24] *** toad2 has quit IRC (Ping timeout: 600 seconds)
[17:50] *** robv has joined #archiveteam
[18:05] <StartAway> http://vstreamers.com
[18:05] <StartAway> "Website will be shutting down day January 15th."
[18:06] <StartAway> the site looks to be a clone of old youtube
[18:06] <arkiver> looks like they have less then 6000 videos
[18:09] <StartAway> i'll get to work on the site structure
[18:10] <StartAway> got any ideas for an irc channel name?
[18:10] <arkiver> StartAway: ok, I'll start with the scripts for vstreamer
[18:11] *** StartAway is now known as Start
[18:11] <midas> 10x409 pages arkiver 
[18:11] <arkiver> Yes
[18:11] <midas> rather small
[18:11] <arkiver> 21 channel pages
[18:11] <arkiver> midas: yeah, less then 6000 videos
[18:11] <midas> maybe we can run it through the bot?
[18:12] <arkiver> those videos are not linked to from the html
[18:13] <arkiver> probably some post somewhere (haven't checked yet)
[18:14] <midas> oh well, it should be easy to grab
[18:14] <midas> (size wise that is)
[18:14] <arkiver> yeah
[18:14] <arkiver> I already found the videos
[18:14] <arkiver> should be doable
[18:17] *** intothemo has joined #archiveteam
[18:17] *** intothemo has quit IRC (Client Quit)
[18:20] <Start> would #destreamers be a good name for the irc channel?
[18:24] <arkiver> that would do I think
[18:27] <Start> ok
[18:40] *** nertzy has joined #archiveteam
[18:52] *** nertzy has quit IRC (This computer has gone to sleep)
[19:00] *** aaaaaaaaa has joined #archiveteam
[19:17] *** BlueMaxim has joined #archiveteam
[19:27] *** mistym has joined #archiveteam
[19:33] <Start> with vstreamers shutting down, i'd place zippcast on a watchlist
[19:34] <Start> zippcast has shut down multiple times in the past and reappeared without any content that was previously there
[19:35] *** BlueMaxim has quit IRC (Quit: Leaving)
[19:59] *** dashcloud has quit IRC (Read error: Operation timed out)
[20:13] *** dashcloud has joined #archiveteam
[20:56] *** dashcloud has quit IRC (Read error: Connection reset by peer)
[21:01] *** signius has quit IRC (Ping timeout: 258 seconds)
[21:05] *** dashcloud has joined #archiveteam
[21:14] <brook> Hi
[21:14] *** signius has joined #archiveteam
[21:15] <brook> can anyone help me out? I want to archive this wiki http://c2.com/cgi/wiki?PrinciplesObjectivesAndGoals
[21:16] <brook> could I get +v to try the bot on it?
[21:19] <brook> anyone have some input, suggestions?
[21:26] <chfoo> you can get an idea of how many links are in the wayback machine by using this link: http://web.archive.org/web/*/http://c2.com/* and there's an index of archivebot's crawls of c2.com: http://archive.fart.website/archivebot/viewer/job/xdufx
[21:28] <chfoo> and you can search the chat logs at http://archive.fart.website/bin/irclogger_logs to see why it was aborted
[21:29] *** ariscop has quit IRC (Ping timeout: 492 seconds)
[21:29] <brook> it looks like the log is password protected
[21:30] <brook> im not too interestedin why it stopped the archive anyway
[21:30] <brook> I want to make a offline image/mirror of the site
[21:30] <brook> archive.org says it has 117,838 urls
[21:31] *** dashcloud has quit IRC (Read error: Operation timed out)
[21:34] <chfoo> oh, if you want a personal archive, you can try setting up and customize archivebot for yourself, grab it with wget/wpull/httrack/heritrix, or ask someone else to do it
[21:34] *** dashcloud has joined #archiveteam
[21:36] <brook> thre's 35k pages and it wants a delay time of 30 seconds per get. So if I got 30 people to help me we could do this in 10 hours
[21:36] <schbirid> that defeats the purpose of the 30s wait
[21:36] <brook> I tried on my own but the delay time was too low and it stopped giving me the pages after a bit
[21:37] <schbirid> http://c2.com/cgi/wiki?search=* says ~40k pages
[21:38] <brook> ah so there's a lot of pages!
[21:41] <brook> Ill email him about it again, but he ignored me before
[21:41] <brook> maybe I got spam filtered
[21:44] <brook> http://c2.com/cgi/wiki?DownloadWiki no I think he ignores me on purpose
[21:45] <schbirid> i'll give it a try
[21:46] <balrog> > The only person who can tell you why it isn't available is its creator, WardCunningham, and he appears unwilling to do so.
[21:46] <balrog> lol
[21:46] <brook> he's got a new wiki project on so if it doesn't go well he might do something dodgy with this site to force people onto his new page
[21:46] <balrog> I think it's unlikely
[21:46] <brook> im not judging hIm but ive seen other people do this
[21:48] <schbirid> wget is running
[21:48] <balrog> schbirid: what delays?
[21:49] <balrog> I'd also use random wait
[21:49] <brook> can you pause and resume wget?
[21:49] <schbirid> 30
[21:49] <brook> since it has many pages I was worried about that and wrote my own script
[21:49] <schbirid> you can ctrl-z
[21:49] <brook> ah ok cool
[21:49] <chfoo> there's this list of pages if you havent seen it yet: http://c2.com/cgi/wiki?search=$
[21:50] <brook> there is also http://c2.com/cgi/wikiList
[21:50] <brook> hopefully these two have the same stuff on them
[21:50] <balrog> "36855 pages found out of 36857 titles searched"
[21:50] <schbirid> oh nice
[21:50] * schbirid cancels
[21:51] <balrog> let me see how many lines there are in the second
[21:53] <schbirid> eww, it has google analytics
[21:53] <schbirid> i am doing a wget -i on the urls
[21:53] <schbirid> will forget and find the files in 4 days or so
[21:53] <schbirid> good night :)
[21:53] *** schbirid has quit IRC (Leaving)
[21:54] <brook> you should grep for 'The WikiWiki Server Can not Process Your Request' every so often
[21:54] <brook> if you see this you need to wait a bit and redownload it
[21:54] <balrog> brook: does it return an appropriate http response code in that case?
[21:55] <brook> i don't know
[22:32] *** __uu has quit IRC (Ping timeout: 265 seconds)
[22:33] *** ariscop has joined #archiveteam
[22:43] *** cadbury__ has quit IRC (Read error: Operation timed out)
[22:44] <balrog> http://c2.com/cgi/wiki?WikiArchive -- LOL
[22:49] *** __uu has joined #archiveteam
[23:05] <godane> SketchCow: all 2006 episodes of the believers voice of victory is uploaded now
[23:11] *** __uu has quit IRC (Ping timeout: 265 seconds)
[23:17] *** __uu has joined #archiveteam
[23:41] *** __uu has quit IRC (Ping timeout: 265 seconds)
[23:43] <Nemo_bis> Did someone use https://pypi.python.org/pypi/wget ?
[23:56] *** __uu has joined #archiveteam