[00:00] *** Darkstar has joined #archiveteam-bs [00:05] *** mr-b has joined #archiveteam-bs [00:31] *** JesseW has joined #archiveteam-bs [00:49] *** JesseW has quit IRC (Quit: Leaving.) [01:10] *** JetBalsa has joined #archiveteam-bs [01:12] *** MrIdea has quit IRC () [01:20] *** JesseW has joined #archiveteam-bs [01:23] Following the AT wiki, I'm trying to use wget to scrape computerhope.com, but all I get is the index.html page. [01:25] Even if I use --mirror and --recursive and a host of other options, all I can pull down is index.html and maybe a bunch of other cross-linked trash. [01:37] acridAxid, it seems to be downloading fine using grab-site [01:37] have you checked your useragent? [01:41] *** Start has joined #archiveteam-bs [01:47] *** JesseW has quit IRC (Quit: Leaving.) [01:58] *** JesseW has joined #archiveteam-bs [01:59] *** JesseW has quit IRC (Client Quit) [02:05] *** schbirid2 has joined #archiveteam-bs [02:07] *** schbirid has quit IRC (Read error: Operation timed out) [02:23] *** dashcloud has quit IRC (Read error: Operation timed out) [02:26] *** dashcloud has joined #archiveteam-bs [02:58] *** vitzli has joined #archiveteam-bs [03:13] kyan: thanks, I'll look into grab-site. Is this tool preferred over wget today? [03:14] Not exactly. [03:14] It's based on a program called wpull, which *is* preferred over wget [03:14] kyan: yeah, i had a useragent set. without one the site simply refuses. [03:14] grab-site is basically a wrapper around wpull [03:14] Is wpull preferred over wget for archiving, or just in general? [03:15] Well, I think both [03:15] grab-site: https://github.com/ludios/grab-site/ [03:15] wpull: http://wpull.readthedocs.org/ [03:15] kyan: thanks! :) [03:15] np :) [03:15] good luck! [03:17] hmm, not packaged for Arch [03:17] project for this evening :) [03:27] acridAxid: Easiest/quickest would just be using pip to install grab-site [03:27] and it has a nice little webserver to view progress [03:31] *** wyatt8740 has joined #archiveteam-bs [03:52] prefer having it managed by my package manager [03:52] i don't need more things to remember to update [03:52] also, if i package it, it helps others [04:01] *** vitzli has quit IRC (Leaving) [04:25] *** Mayonaise has quit IRC (Read error: Operation timed out) [04:34] *** Mayonaise has joined #archiveteam-bs [05:18] *** Infreq has quit IRC (Ping timeout: 258 seconds) [05:19] *** Infreq has joined #archiveteam-bs [05:27] *** logchfoo4 has quit IRC (Ping timeout: 360 seconds) [05:29] *** logchfoo1 starts logging #archiveteam-bs at Sat Feb 06 05:29:46 2016 [05:29] *** logchfoo1 has joined #archiveteam-bs [05:47] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:54] *** Sk1d has joined #archiveteam-bs [06:00] *** JetBalsa has quit IRC (Read error: Connection reset by peer) [07:22] *** Kenshin has quit IRC (Ping timeout: 260 seconds) [07:23] *** CHANFIX has joined #archiveteam-bs [07:23] *** services.int sets mode: +o CHANFIX [07:23] *** CHANFIX sets mode: -bbbb *!*67c68b35@103.198.139.* *!*JojoRecv@*.dyn.optonline.net *!*stereo197@*.iusacell.net *!*ae830ded@*.mibbit.com [07:23] *** CHANFIX sets mode: -bbbb WOOHOO!*@* *!WOOHOO@* *!OOHOOW@* *!*@66.23.235.245 [07:23] *** CHANFIX sets mode: -bbbb *!*archivist@*.static.wa.bigpond.net.au *!*@178.18.16.10 mxncqci79!*@* *!~egyptmosl@41.46.215.* [07:23] *** CHANFIX sets mode: -bbbb *!*mig2970@190.235.150.* *!*@c-76-108-171-70.hsd1.fl.comcast.net dl-boy-!*@* *!*webchat@71.93.65.* [07:23] *** CHANFIX sets mode: -bbbb EG!*@* *!*EricJess*@* *!*smuxi@*.hsd1.wa.comcast.net instence!*@* [07:23] *** CHANFIX sets mode: -bbbb *!~DrShitsAB@50.7.30.34 dec31*!*@* *!*@c-76-22-62-23.hsd1.wa.comcast.net *!*4c163e17@*.mibbit.com [07:23] *** CHANFIX sets mode: -bbbb *!~korkite@225.Red-79-153-104.dynamicIP.rima-tde.net Dec*!*@* *!*KaderCavd@213.74.159.* *!*@31-184-242-76.mrhost.biz [07:23] *** CHANFIX sets mode: -bbbb *!*@213.143.61.87 *!*@114.76.21.95.dynamic.jazztel.es garette!*@* *!~garette@213.143.61.* [07:23] *** CHANFIX sets mode: -bbbb *!*bkr@*.mindhackers.org *!*sponges@*.Red-2-138-166.dynamicIP.rima-tde.net *!~sponges@98.Red-83-39-251.dynamicIP.rima-tde.net *!*sponges@*.Red-83-55-30.dynamicIP.rima-tde.net [07:23] *** CHANFIX sets mode: -bbbb *!*sponges@*.Red-81-38-80.dynamicIP.rima-tde.net sponges!*@* *!*tichels@*.Red-2-138-161.dynamicIP.rima-tde.net critics!*@* [07:23] *** CHANFIX sets mode: -bb xocco!*@* *!*uid118096@*.tooting.irccloud.com [07:23] I only joined to remove modes. [07:23] *** CHANFIX has left [07:27] why [07:27] oh we lost all ops nice [07:33] *** CHANFIX has joined #archiveteam-bs [07:33] *** services.int sets mode: +o CHANFIX [07:33] *** CHANFIX sets mode: +oo SadDM closure [07:33] 2 clients should have been opped. [07:33] *** CHANFIX has left [07:33] *** Kenshin has joined #archiveteam-bs [07:43] *** CHANFIX has joined #archiveteam-bs [07:43] *** services.int sets mode: +o CHANFIX [07:43] *** CHANFIX sets mode: +oo Kenshin ivan` [07:43] 2 clients should have been opped. [07:43] *** CHANFIX has left [07:53] *** CHANFIX has joined #archiveteam-bs [07:53] *** services.int sets mode: +o CHANFIX [07:53] *** CHANFIX sets mode: +o alard [07:53] 1 client should have been opped. [07:53] *** CHANFIX has left [08:34] *** JesseW has joined #archiveteam-bs [09:04] *** JesseW has quit IRC (Quit: Leaving.) [09:26] *** vitzli has joined #archiveteam-bs [10:12] *** arkiver3 has joined #archiveteam-bs [10:38] *** arkiver3 has quit IRC (Ping timeout: 252 seconds) [11:38] Heh. First page of The Register http://www.theregister.co.uk/2016/02/05/malware_museum/ [11:45] *** arkiver3 has joined #archiveteam-bs [11:52] is friendsreunited currently deactivated? [12:11] *** arkiver3 has quit IRC (Ping timeout: 252 seconds) [13:23] does anyone ever got free ppv around jan 1997? [13:24] i got free ppv by going to tv channel 2 back then [13:25] this hack happened like every 5 months that [13:25] must have been feb 1997 then if it was 5 months [13:25] it lasted like 4 days [13:25] starting on the 4th thursday of the month [13:47] currently haphazardly recording AT5 stream of Pegida protest and counter-protestt [13:47] they fucked up their RTMP stream so I'm just dumping .ts segments now [13:50] *** mismatch has quit IRC (Remote host closed the connection) [13:50] *** mismatch has joined #archiveteam-bs [14:42] *** Mayonaise has quit IRC (Read error: Operation timed out) [14:58] *** username1 has joined #archiveteam-bs [15:00] *** Mayonaise has joined #archiveteam-bs [15:00] *** schbirid2 has quit IRC (Read error: Operation timed out) [15:36] *** username1 is now known as schbirid [17:12] *** JesseW has joined #archiveteam-bs [17:15] *** JesseW has quit IRC (Client Quit) [17:20] *** schbirid has quit IRC (Quit: Leaving) [17:22] *** MrIdea has joined #archiveteam-bs [17:29] hey, so i was thinking, although you guys might already thought of this before, why don't we go out and find the user generated content sites(since those as far as i can tell are the ones we are doing mostly and are the most important) and have the management of that site "sign"/agree to "yes, i promise to preserve the content that i host, even in the event [17:29] of a merger/shutdown/bankruptcy ect., by either allowing users to export there content via an simple export function, provide a data base dump to the internet archive/archiveteam(with sensitive data removed), and/or collaborating with archiveteam as they scrape my site" [17:30] kinda like an contingency agreement [17:32] for lack of better words [17:32] SketchCow: what are you thoughts on this? [17:38] MrIdea: we could [17:38] or we could just scrape them [17:39] well in this case, scraping would likely be a last resort [17:39] nod [17:39] wel welcome them to offer us an easier way to get the data before they kill it [17:40] however most of the time they aren't interested [17:42] i think stuff like this might be better for "smaller" sites, like um... say an forum or such [18:13] nobody ever responds positively to that [18:13] never [18:13] we've tried [18:13] alard closure SadDM ivan` Kenshin: ops [18:14] *** Kenshin sets mode: +o xmc [18:14] thx [18:14] *** xmc sets mode: +o swebb [18:14] *** swebb sets mode: +o DFJustin [18:14] *** swebb sets mode: +o SketchCow [18:14] *** swebb sets mode: +o antomatic [18:14] *** swebb sets mode: +o balrog [18:14] *** swebb sets mode: +o brayden [18:14] *** swebb sets mode: +o ersi [18:34] *** VADemon has joined #archiveteam-bs [19:26] MrIdea: They don't talk to us [19:35] besides the fact of "b-but i don't want to lose my userbase/money", i don't see why they would really disagree [19:36] Besides the unstoppable, irrefutable facts, there's no reason [19:36] See also: Lawyers [19:36] Every once in a while one of the folks in here negotiates. [19:36] I'd say that works once out of 80 large-scale projects. [19:39] well i'm sure doing this for a site thats soul porpouse of existence was profit, those guys would very much disagree [19:40] i'm thinking this would be much more effective for some of the smaller sites like fourms/geocity clones/ ect [19:42] We do that. [19:42] We do that all the time. [19:42] We've been here 7 years. [19:47] haha, might want to update that topic then SketchCow [19:51] *** SketchCow changes topic to: Archive Team: Oh Yeah, Negotiation is the Answer | Shut Up [20:11] *** vitzli has quit IRC (Leaving) [20:14] MrIdea: a few people, have, hilariously come into #archivebot to make a backup of their own site [20:15] MrIdea: one of the problems with database dumps is that somebody has to kill a few days either setting up the site infrastructure or dumping it into something readable [20:15] oh my no backup plan [20:16] might aswell leave the server in a smokey state [20:17] owner or web/database people have a smartass way of creating the database so that it's structure only suits the site and cant be used elsewhere [21:17] so 2013 of The Morning Joe podcast is all uploaded [21:41] *** Swizzle has quit IRC (Quit: Leaving) [22:04] MrIdea: You should try that- if it works, great, otherwise they are put on notice that rogue archivists are coming for them [22:13] MrIdea: Feel free to try to negotiate with sites. There's no ArchiveTeam management. That said, it usually does not work [22:13] *** dashcloud has quit IRC (Read error: Operation timed out) [22:13] if it would, more people would ask for a backup or "sign" people up [22:14] *** dashcloud has joined #archiveteam-bs [22:33] anyway, I think GNU Social *has* the potential to become significant. [22:33] Imagine if Twitter did mess with users timelines, a few major users made their exodus, and media caught on it [22:33] ...oh, icedice ain't here. [22:34] I dunno, decentralization isn't really a trend [22:34] and it's confusing as fuck [22:35] *** icedice has joined #archiveteam-bs [22:35] anyway, I think GNU Social *has* the potential to become significant. [22:35] Imagine if Twitter did mess with users timelines, a few major users made their exodus, and media caught on it [22:35] ...oh, icedice ain't here. [22:35] <@ersi> I dunno, decentralization isn't really a trend [22:35] <@ersi> and it's confusing as fuck [22:35] Techies looove decentralization and the idea/potential [22:35] it's confusing as fuck for non-aspies though [22:36] can it not be completely abstracted away? [22:36] In what way? You need an account *somewhere* [22:36] with quitter you don't even have to realize there are alternative instances [22:36] even if it's rolled into your OS [22:36] I think you're undervaluing a central brand [22:37] regardless, we probably define 'significant' in different ways. [22:37] Maybe so! [22:37] even if it became to Twitter what Duck Duck Go is to Google, it'd be significant in my eyes [22:37] They would need to market a site - Quitter - instead of the software - GNU Social [22:37] I mean, best case for GNU Social would be to herd all the techies over from G+ [22:38] Are techies using Google+? [22:38] Sanqui: that would be pretty significant. DDG isn't small [22:38] it isn't [22:38] There's plenty of FOSS people using Google+ - posting updates and conversating [22:38] I thought most people dumped it after hostile YouTube take-over [22:39] and I assumed that the techies wouldn't like Google's privacy approach any more than Facebook's [22:40] Seems like they/some do [22:40] since both approaches are "collect all data that we can and sell it to the advertisers" [22:45] I would guess some take Google for being the lesser evil or just don't care [22:56] *** dashcloud has quit IRC (Read error: Operation timed out) [23:00] *** DFJustin has quit IRC (Read error: Connection reset by peer) [23:00] *** dashcloud has joined #archiveteam-bs [23:00] *** alard has quit IRC (Read error: No route to host) [23:02] *** DFJustin has joined #archiveteam-bs [23:02] *** swebb sets mode: +o DFJustin [23:19] the latter I would guess [23:26] *** Swizzle has joined #archiveteam-bs [23:35] I'm going to work on fotolog soon [23:35] Website isn't slow, so I hope we can fully get fotolog [23:36] *** RichardG has quit IRC (Read error: Connection reset by peer) [23:37] i may need some help with this: http://www.oldgamemags.com/ [23:40] *** RichardG has joined #archiveteam-bs