[00:00] *** kvieta has joined #archiveteam-bs [00:23] HCross: well, flickr is currently paused because I need to have a look at a problem with WARCs being too smal [00:25] *** dashcloud has joined #archiveteam-bs [00:30] *** BlueMaxim has joined #archiveteam-bs [00:44] *** kvieta has quit IRC (Ping timeout: 370 seconds) [00:47] *** kvieta has joined #archiveteam-bs [01:21] *** kvieta has quit IRC (Read error: Operation timed out) [01:51] *** kvieta has joined #archiveteam-bs [03:52] *** wp494 has quit IRC (Read error: Connection reset by peer) [04:04] *** ndiddy has quit IRC (Ping timeout: 244 seconds) [04:12] *** mutoso_ has quit IRC (Read error: Connection reset by peer) [04:17] *** mutoso has joined #archiveteam-bs [04:21] huh neat, you can use colons in sqlite table names, and pretty much every sqlite tool that isn't the sqlite shell breaks in awesome ways [04:23] deeelightful [04:43] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:49] *** Sk1d has joined #archiveteam-bs [04:55] has this been backed up? http://www.therobotsvoice.com/ [04:55] it's site that has shitty 15-short-paged articles [04:56] stumbled across it linked from a stackexchange post on swearing from Firefly [05:05] http://www.therobotsvoice.com/2010/11/fireflys_15_best_uses_of_chinese_profanity.php [05:05] "I’ve given this site formerly known as Topless Robot three years of my life and hard work, and I wouldn’t trade them. I hoped that covering the subjects and culture that I love would sustain the site. For three years, it has — the three years it took to make The Force Awakens, no less. But all things must end. Today is the The Robot’s Voice’s final day of publication. After [05:05] years of trying, we couldn’t make this work financially..." [05:06] Thank you for reading the site, supporting it and creating a community here over the years. I spent more time each day with our regular commenters than I did with my own wife or family, so even though I don’t actually know all your real names, I’ll miss you. Sly, Timely, Abraxas, FakeAss, Gallen, Polk, Mindbender, Zoidberg, Canadian Scott, GrimlockPrime, and everyone else…I’ll [05:06] never forget you. I stayed up until the early hours of the morning, created social media posts on weekends, ran from dinner tables when news happened, and generally made TR/TRV the focus of my life. You got 100 percent of me, like it or not. And I hope you did..." [05:06] etc etc [05:07] not sure if worth backing up [05:24] I'd argue it's more "worth backing up" than the latest leak of NSA documents or whatever [05:24] every nerd on the Internet gets off on saving a copy of those and then never reading them [05:25] lol [05:25] fanworks though, they don't get much [05:25] i presume this community had just a small reach [05:25] so in the long run we end up with thousands of copies of unknown integrity of one thing and significantly incomplete copies of everything else [05:25] so I threw that site into archivebot since that's what it was made for [05:26] will try to keep that in mind! [05:27] also just for full disclosure, yes, I have a copy of the wikileaks insurance file [05:27] I too get off on that stuff [05:28] i just don't want to throw EVERYTHING at archivebot [05:28] still gauging what's worth time etc [05:28] time, space [05:28] I might whine about it a lot but really it's better to just throw something in [05:28] we do have some limits like github/bitbucket links just making it a mess [05:29] http://archive.fart.website/archivebot/viewer/ <-can incomplete URLs be searched? [05:29] hostnames only [05:30] but if you throw in "tumblr" you'll get all hostnames matching tumblr [05:30] ah,yeah, that's what i was wondering [05:30] if "digitalocean" would return all domains [05:30] and subdomains [05:31] is it not good at searching backups? or is everything backed up not necessarily tracked there? [05:31] i'd assume digitalocean, linode (for their guides) have been backed up [05:31] that's just archivebot's catalog [05:31] there's a ton of other stuff that isn't in there [05:32] Warrior projects, works from other AT members, everything else in IA, ... [05:34] how good is archivebot at backing up sites with dynamic "next page/more posts" buttons? https://www.digitalocean.com/community/tutorials [05:35] at the end of the page is a js button "load more results" [05:35] it's not going to work [05:35] damn [05:35] phantomjs mode just scrolls, there's no "click this button" function [05:36] if that button is actually an you might have luck with phantomjs [05:36] I'm not sure [05:37] Load More Results [05:41] is there a way to archive site.com/dir2 [05:41] and site.com/dir2/sub1 site.com/dir2/sub2 [05:41] but not traverse back to site.com [05:41] and not backup site.com/dir1, etc, linked from site.com [05:42] yes, !a https://site.com/dir2/ [05:43] does !ao only backup site.com/dir2/index.html + images/resources? [05:43] or does it still spider [05:43] it's page plus prerequisites [05:43] https://archivebot.readthedocs.io/en/latest/commands.html#archiveonly [05:44] it didn't make much sense to me :< [05:44] * ranma holds onto his butt and feeds archivebot something [05:47] lol @ kebsonsecurity [05:47] was just reading about their DDoS [06:32] *** Aranje has quit IRC (Quit: Three sheets to the wind) [06:50] *** wp494 has joined #archiveteam-bs [06:54] *** fie has joined #archiveteam-bs [07:08] ranma: same few days as OVH got hit with 1.5Tbps [07:11] wheee [07:11] and the company i work for is banking on IoT [07:30] *** ravetcofx has quit IRC (Read error: Operation timed out) [08:05] *** xmc sets mode: +o yipdw [08:40] *** GE has joined #archiveteam-bs [09:00] they like DDoSing? [09:03] their store salespeople are a bag of dicks, so i don't have much sympathy [09:05] not implying i'm an aggressor. just don't like babysitting them [09:41] *** kurt has joined #archiveteam-bs [10:18] *** GE has quit IRC (Remote host closed the connection) [11:07] *** GE has joined #archiveteam-bs [11:47] *** kyounko has quit IRC (Read error: Operation timed out) [12:08] *** BlueMaxim has quit IRC (Quit: Leaving) [12:22] *** GE has quit IRC (Remote host closed the connection) [13:59] *** GE has joined #archiveteam-bs [14:34] *** Start has quit IRC (Quit: Disconnected.) [14:34] *** Start has joined #archiveteam-bs [14:35] *** Start has quit IRC (Client Quit) [14:45] *** achip has joined #archiveteam-bs [15:29] *** kurt has quit IRC (Remote host closed the connection) [16:15] *** VADemon has joined #archiveteam-bs [17:24] *** Swizzle has quit IRC (Quit: Leaving) [17:50] *** GE has quit IRC (Quit: zzz) [17:50] *** GE has joined #archiveteam-bs [18:03] *** VADemon has quit IRC (Read error: Operation timed out) [18:04] i'm at 889k items now [18:11] whoa https://mosh.org/ [18:12] yeah mosh is super nifty [18:12] I need to try this [18:12] highly recommended [18:12] intermittent connectivity is the rule for me now and I'd love something that doesn't broken-pipe on me every time [18:13] not sure how its prediction works with how fish likes to redraw the command line in various colors [18:13] but it's worth a shot [18:13] I need to figure out how to make mosh work with siped [18:13] er spiped [18:13] mosh is amazing [18:14] maybe just tell mosh to connect via spiped and have networking work out the rest [18:14] the one issue I have with it is that it breaks scrolling, so you'll probably want to use tmux/screen with it [18:17] i regularly use mosh on airplane wifi. it makes it tolerable. [18:18] oh nice, I guess mosh uses SSH to establish the initial connection and start mosh-server [18:18] so my existing spipe ProxyCommands work fine [18:20] oh my god this is amazing [18:21] :D [18:22] my build server runs in online.net's Paris datacenter and you get some noticable lag on the Chicago -> Paris hop [18:22] but not here [18:28] this probably also means I can go back to using irssi [18:29] I would suggest trying out Weechat [18:29] or irssi [18:32] *** ravetcofx has joined #archiveteam-bs [18:32] over the years I've come to know irssi fairly well so that's why [18:32] one of these days I'll try a new client [18:33] I used irssi up until I got a bouncer with multiple networks, and it didn't let connect to the same hostname multiple times. I wonder if they fixed that [18:33] didn't let me* [18:34] or I can be really obstinate and reinstall ircii [18:35] this is kind of neat :p http://tools.suckless.org/ii/ [18:35] that kinda reminds me of trying to use Plan9 [18:35] cool ideas in theory but I couldn't really integrate them comfortably [18:37] on the other hand, ii would probably make a good bot substrate [19:21] godane: I'll buy you a cake when you get to one million items [19:57] *** SketchCow has joined #archiveteam-bs [19:57] *** midas sets mode: +o SketchCow [19:57] *** swebb sets mode: +o SketchCow [20:04] that should be around december sometime [20:05] based on me pushing for about 50k items a month [20:32] *** Stiletto has quit IRC () [20:38] *** RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) [20:40] *** acridAxid has quit IRC (Quit: marauder) [20:41] *** acridAxid has joined #archiveteam-bs [20:51] *** Stiletto has joined #archiveteam-bs [21:03] *** ndiddy has joined #archiveteam-bs [21:05] *** computerf has quit IRC (Read error: Operation timed out) [21:08] *** computerf has joined #archiveteam-bs [21:08] *** RichardG has joined #archiveteam-bs [21:24] *** computerf has quit IRC (Read error: Operation timed out) [21:35] *** computerf has joined #archiveteam-bs [21:41] *** ndiddy has quit IRC (Quit: Leaving) [21:51] *** yeoldetoa has quit IRC (Remote host closed the connection) [21:52] *** kristian_ has joined #archiveteam-bs [22:22] *** BlueMaxim has joined #archiveteam-bs [22:33] *** Start has joined #archiveteam-bs [22:35] *** achip has quit IRC (Read error: Operation timed out) [22:37] *** GE has quit IRC (hub.efnet.us irc.Prison.NET) [23:07] Hi Jason, [23:07] I read that your group is archiving Gawker. I'm a documentary producer, have created events and series for History Channel, National Geographic, MTV and others, and produce feature/festival documentaries for Smithsonian Network etc. [23:07] I am currently in pre-productions on a film about why Gawker and what you're doing are important. [23:07] May we get on the phone so that I can tell you a bit more about the project? [23:07] My goal is to interview you and document you and your volunteers saving history. [23:07] ... [23:07] My intention is to not respond, unless someone thinks different. [23:09] approved ✓ [23:25] yeah, sounds interesting [23:25] no i mean not responding is meeting with my approval [23:26] I think it sounds interesting [23:27] If he's positive about us, it would be nice to have us in a documentary [23:28] i've been burned enough times [23:30] Not responding. [23:30] "Gawker" + "Documentary" = hellscape [23:31] :( ok [23:31] if it's only about gawker then not [23:32] but maybe he wants to do something more about web history in general too [23:32] from the description it's a gawker documentary [23:32] No. [23:32] This will be a gawker documentary [23:32] oh well [23:32] I'm off anyway [23:32] have a good day all! [23:32] * arkiver zzzzzzz [23:32] I'd rather watch my eye going through a spaghetti strainer with my remaining eye than be involved in anything glorifying gawker [23:33] ^ got it [23:33] Hi Jason, [23:33] I read that your group is archiving Gawker. I'm a documentary producer, have created events and series for History Channel, National Geographic, MTV and others, and produce feature/festival documentaries for Smithsonian Network etc. [23:33] <3 [23:33] Added disk check to pipeline, by the way, sleepy [23:33] yes, saw it! [23:33] Looks awesome :D [23:33] thanks [23:52] *** RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue)