[00:02] *** primus104 has quit IRC (Leaving.) [00:15] *** ruukasu has quit IRC (Quit: WeeChat 1.0.1) [00:15] *** ruukasu has joined #archiveteam [00:28] *** cbb has quit IRC (Ping timeout: 633 seconds) [00:41] *** primus104 has joined #archiveteam [01:07] *** mistym has quit IRC (Remote host closed the connection) [01:24] *** mistym has joined #archiveteam [01:25] *** APerti has joined #archiveteam [01:28] *** Ymgve has quit IRC () [01:34] *** mistym has quit IRC (Remote host closed the connection) [01:45] *** ohhdemgir has joined #archiveteam [01:46] https://www.reddit.com/r/DataHoarder/comments/2mlk2f/capturebate_chaturbatecom_automated_data/ [01:50] *** Ravenloft has quit IRC (Ping timeout: 378 seconds) [02:07] *** SmileyG has joined #archiveteam [02:10] *** Emcy_ has quit IRC (Ping timeout: 265 seconds) [02:10] *** Smiley has quit IRC (Read error: Operation timed out) [02:10] *** Emcy has joined #archiveteam [02:24] *** primus104 has quit IRC (Leaving.) [02:25] *** Ravenloft has joined #archiveteam [03:05] quick notice before I'm going to sleep [03:06] just got back from squatconf, it was a success [03:06] with a horribly duct-taped setup we managed to capture most of the bits of some talks [03:06] so after editing (which will probably take a while, given the messed up source material...) there should be a few new hackercon talks on IA [03:06] :D [03:07] (my photo camera in video mode on a tripod taped to the beamer table combined with the main sound board being patched into the mic port on my laptop running Audacity... not the most reliable audio setup :( ) [03:07] s/audio/recording/ [03:08] end of notice [03:23] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [03:26] *** Lord_Nigh has joined #archiveteam [03:26] *** balrog sets mode: +o Lord_Nigh [03:42] *** BlueMaxim has joined #archiveteam [03:44] *** SmileyG has quit IRC (Ping timeout: 258 seconds) [03:47] *** LordNigh2 has joined #archiveteam [03:47] *** balrog sets mode: +o LordNigh2 [03:48] *** Smiley has joined #archiveteam [03:49] *** tfgbd has joined #archiveteam [03:50] *** Lord_Nigh has quit IRC (Ping timeout: 272 seconds) [03:50] *** LordNigh2 is now known as Lord_Nigh [04:19] *** Danneh__ has joined #archiveteam [04:20] *** danneh_ has quit IRC (Ping timeout: 633 seconds) [04:46] *** aaaaaaaaa has quit IRC (Leaving) [05:50] *** LordNigh2 has joined #archiveteam [05:56] *** Lord_Nigh has quit IRC (Ping timeout: 600 seconds) [05:56] *** LordNigh2 is now known as Lord_Nigh [06:03] *** mistym has joined #archiveteam [06:12] *** zenguy_pc has quit IRC (Ping timeout: 480 seconds) [06:19] *** zenguy_pc has joined #archiveteam [06:25] *** amerrykan has quit IRC (Quit: Quitting) [06:34] *** amerrykan has joined #archiveteam [07:01] *** danneh_ has joined #archiveteam [07:03] *** Danneh__ has quit IRC (Ping timeout: 633 seconds) [07:40] *** n00b807 has joined #archiveteam [07:41] *** filippo_ has quit IRC (Ping timeout: 186 seconds) [07:41] *** n00b807 has quit IRC (Client Quit) [07:44] *** filippo_ has joined #archiveteam [07:59] *** dashcloud has quit IRC (Read error: Operation timed out) [08:02] *** dashcloud has joined #archiveteam [08:04] *** Sk1d has quit IRC (Read error: Operation timed out) [08:09] *** Sk1d has joined #archiveteam [08:12] *** ZorbaBeta has quit IRC (Read error: Connection reset by peer) [08:12] *** ZorbaBeta has joined #archiveteam [08:13] *** mistym has quit IRC (Remote host closed the connection) [08:19] *** Sk1d has quit IRC (Ping timeout: 265 seconds) [08:20] *** primus104 has joined #archiveteam [08:22] *** Sk1d has joined #archiveteam [08:25] *** Sk2d has joined #archiveteam [08:26] *** Sk1d has quit IRC (Read error: Operation timed out) [08:26] *** deathy___ has quit IRC (Read error: Connection reset by peer) [08:26] *** parsons__ has quit IRC (Read error: Connection reset by peer) [08:28] *** deathy___ has joined #archiveteam [08:28] *** parsons_ has joined #archiveteam [08:29] *** Sk2d has quit IRC (Ping timeout: 265 seconds) [08:30] *** Sk1d has joined #archiveteam [08:36] *** Sk2d has joined #archiveteam [08:37] *** Sk1d has quit IRC (Read error: Operation timed out) [08:37] *** Sk2d is now known as Sk1d [08:49] *** Sk1d has quit IRC (Ping timeout: 265 seconds) [08:51] *** Sk1d has joined #archiveteam [08:59] *** amerrykan has quit IRC (west.us.hub irc.mzima.net) [08:59] *** Sanqui has quit IRC (west.us.hub irc.mzima.net) [08:59] *** Coderjoe has quit IRC (west.us.hub irc.mzima.net) [08:59] *** robink has quit IRC (west.us.hub irc.mzima.net) [08:59] *** torvik has quit IRC (west.us.hub irc.mzima.net) [08:59] *** lysobit has quit IRC (west.us.hub irc.mzima.net) [08:59] *** marc has quit IRC (west.us.hub irc.mzima.net) [08:59] *** Baljem_ has quit IRC (west.us.hub irc.mzima.net) [08:59] *** cloudmons has quit IRC (west.us.hub irc.mzima.net) [09:00] *** Morbus has quit IRC (Quit: http://www.disobey.com/) [09:07] *** Sk2d has joined #archiveteam [09:10] *** Morbus has joined #archiveteam [09:10] *** amerrykan has joined #archiveteam [09:10] *** Sanqui has joined #archiveteam [09:10] *** Coderjoe has joined #archiveteam [09:10] *** robink has joined #archiveteam [09:10] *** torvik has joined #archiveteam [09:10] *** lysobit has joined #archiveteam [09:10] *** marc has joined #archiveteam [09:10] *** Baljem_ has joined #archiveteam [09:10] *** cloudmons has joined #archiveteam [09:10] *** Sk1d has quit IRC (Read error: Operation timed out) [09:10] *** Sk2d is now known as Sk1d [09:17] *** Sk1d has quit IRC (Ping timeout: 265 seconds) [09:20] *** Sk1d has joined #archiveteam [09:28] *** Sk1d has quit IRC (Read error: Operation timed out) [09:30] *** Sk1d has joined #archiveteam [09:30] *** dashcloud has quit IRC (Read error: Operation timed out) [09:30] *** MMovie has quit IRC (Ping timeout: 335 seconds) [09:35] *** Sk1d has quit IRC (Ping timeout: 265 seconds) [09:37] *** Sk1d has joined #archiveteam [09:38] *** dashcloud has joined #archiveteam [09:58] joepie91, >most of the bits of some talks [09:58] *** schbirid has joined #archiveteam [10:01] *** APerti has quit IRC (Read error: Operation timed out) [10:17] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [10:20] *** Lord_Nigh has joined #archiveteam [10:20] *** balrog sets mode: +o Lord_Nigh [11:10] *** Ymgve has joined #archiveteam [11:22] *** primus104 has quit IRC (Leaving.) [11:34] *** ruukasu has quit IRC (Ping timeout: 265 seconds) [11:47] *** ruukasu has joined #archiveteam [11:49] *** ruukasu has quit IRC (Client Quit) [11:58] *** BlueMaxim has quit IRC (Quit: Leaving) [12:21] ohhdemgir: not all speakers wanted to be recorded [12:22] record first, delete later [12:22] the archiveteam way [12:23] midas: doesn't work like that for these kind of confs [12:23] but I just woke up, not going to argue this now [12:23] i understand [12:26] XD [12:26] joepie91, when you're awake get to wildarc, SN4T14 wanted you / irc server something or other blah [12:53] *** arbin has quit IRC (Read error: Connection reset by peer) [13:26] *** Trskl has quit IRC (Ping timeout: 186 seconds) [13:28] *** human39 has joined #archiveteam [13:28] *** bsmith093 has quit IRC (Read error: Operation timed out) [13:40] *** bsmith093 has joined #archiveteam [13:43] *** sankin has joined #archiveteam [13:44] *** ruukasu has joined #archiveteam [14:08] *** K4k has joined #archiveteam [14:12] *** MMovie has joined #archiveteam [14:15] *** arbin has joined #archiveteam [14:37] *** primus104 has joined #archiveteam [15:14] *** schbirid has quit IRC (Read error: Operation timed out) [15:19] http://www.americanlibrariesmagazine.org/article/rip-ipl [15:26] *** schbirid has joined #archiveteam [15:26] *** K4k has quit IRC (WeeChat 1.0.1) [15:31] *** T31M has joined #archiveteam [15:38] *** aaaaaaaaa has joined #archiveteam [15:59] *** arbin has quit IRC (Read error: Connection reset by peer) [16:00] *** ruukasu has quit IRC (Quit: WeeChat 1.0.1) [16:00] *** ruukasu has joined #archiveteam [16:04] *** ruukasu has quit IRC (Client Quit) [16:13] *** arbin has joined #archiveteam [16:14] *** mistym has joined #archiveteam [16:23] *** primus104 has quit IRC (Leaving.) [16:23] *** K4k has joined #archiveteam [16:31] *** APerti has joined #archiveteam [16:37] *** ruukasu has joined #archiveteam [17:03] *** mistym has quit IRC (Remote host closed the connection) [17:06] *** ruukasu has quit IRC (Ping timeout: 265 seconds) [17:06] *** ruukasu has joined #archiveteam [17:25] *** mistym has joined #archiveteam [18:13] *** nertzy has quit IRC (Leaving) [18:20] *** nertzy has joined #archiveteam [19:02] *** the_fox has joined #archiveteam [19:14] Hello. There's a large website I feel is in danger of shutting down for good. furaffinity.net, the world's largest furry themed website, has recently taken on an admin that has a history of trashing websites when he's angry/bored with them. There are at least two fairly large projects (website and an MMO) he was an admin of where he wiped the servers and attempted to delete the backups [19:14] (luckily in both cases, there were additional backups that he didn't know about that allowed them to come back online). I'll spare you the drama filled details, but there have been quite a few other transgressions besides that caused by him. I fear that FA might not be lucky/have a competent enough owner to keep all the backups a secert from this guy, and that FA's days may be numbered. [19:14] Also it should be noted that FA's robots.txt specifically excludes the WayBack archive, so there's no hope of getting anything from there if everything goes to hell. I know furries aren't exactly the most popular bunch of people, but this website means a whole lot to me and many others. I'd hate to see it disappear for good. [19:17] the_fox: please hold [19:17] weird, I thought we had a page on the wiki about FA [19:18] yipdw: ivan`: schbirid: xmc: any idea on the status of above? I vaguely recall FA being discussed before [19:18] i thought the same but cant remember [19:18] Sorry if I'm repeating what's already been said, I just now joined this channel. [19:18] archivebot did a partial crawl of the forums but had to be aborted because it was too big [19:18] 15 million images? [19:19] DFJustin: "too big" being? [19:19] Yep, 15 million. [19:19] (idk in what 'era' of archivebot this was) [19:19] being 15 million images [19:19] Should note that many images are marked as "mature", and cannot be seen without signing in. [19:19] there's 1,261,248 threads [19:19] joepie91: http://archive.fart.website/archivebot/viewer/job/ccl2v [19:19] 11/17, so recent [19:20] I support doing a logged-in crawl of FA, but not so much putting it into wayback. [19:20] also if we need sign-in then archivebot is out anyway [19:20] the_fox: how big of a hard drive do you have? :) [19:20] yipdw: is that 462 gigs? [19:21] 462 MB [19:21] or M? [19:21] I got a 1TB sitting around with nothing to do. If need be I can scrounge around and find maybe around another TB. [19:21] that's not that big, is it? [19:21] it isn't, I suspect the job was prematurely aborted due to realization of "oh shit that's a lot" [19:21] ahhh, heh [19:21] might need to reconsider that now... [19:21] also, xmc, any particular reason? [19:21] the_fox: what makes you think that FA is near to shutting down? [19:22] (he just explained that) [19:22] I can create a project for it and use someone's rsync, if someone'd like to save them [19:22] depending on your goals archivebot might not be useful anyway due to sign-in restriction [19:22] a forum of that size would be better done with a specialized forum script or warrior [19:22] joepie91: same reason i support doing a preemptive crawl of deviantart [19:22] joepie91: ok. [19:22] xmc: I mean the "not so much putting it into wayback" [19:22] oh [19:22] uh, because it'd be a cookie-using crawl [19:22] archivebot maybe could do it but it's not well suited to it [19:22] monster jobs clog up the works [19:25] xmc: ok... is there a particular issue with that? [19:25] *** APerti has quit IRC () [19:25] I can create project for it and use cookies and everything to log in [19:26] not a technical issue. i think it's a social issue though, and one we haven't really talked about much [19:26] from a legal perspective, there's no difference between a registration-less crawl and a crawl where an account is needed that can be registered in an automatic review-less fashion [19:26] to my understanding [19:26] so that shouldn't be a problem [19:26] fair enough [19:26] if accounts are manually reviewed or such, that can change things [19:26] Accounts are not manually reviewed. [19:26] but if it's the bog standard "register a forum account, click confirmation link, done" then it should be fine [19:27] (aside; it's hidden from the wayback anyway) [19:27] i'm only advocating caution, not restraint [19:27] :3 [19:28] Also: FA just recovered from a massive DDoS, so the admins are probably still a bit jumpy about lots of unusual traffic. [19:28] xmc: fair enough [19:29] the_fox: what OS are you running? [19:29] and arkiver, can you run a job with cookies then? a full-blown job of the entire site? [19:29] disk-space wise [19:29] joepie91: yes, I did it before the get rid of shutdown messages poping up [19:29] Win7. I can easily load Ubuntu back up if I need to though. [19:30] the_fox: who is this admin btw? [19:31] *** primus104 has joined #archiveteam [19:32] He goes by starrykitten on FA, but he was previously known as Zidonuke. [19:32] arkiver: alright, can you coordinate with the_fox then? [19:32] sure [19:33] the_fox: so you do not have a date before it needs to be finished? [19:33] also, do you have any contact with the admins [19:33] ? [19:34] There's no set date. For all I know he might not try to wreck anything. But with his history, I feel sooner would be better than later. And no, no contact with any admins. [19:36] ok [19:36] will you stay around here for some days? [19:36] I will. [19:36] ok thank you. [19:36] I'll keep you informed about any progress [19:36] we do need an rsync [19:36] SketchCow ^ [19:37] I'm going to try make an estimate of the size when the scripts are ready [19:37] *** ex-parrot has quit IRC (Leaving.) [19:37] *** ruukasu has quit IRC (Quit: WeeChat 1.0.1) [19:37] *** ruukasu has joined #archiveteam [19:37] Alright then. [19:47] wait how did I not make a what does the_fox say joke [19:47] ok done sorry [19:47] lol [19:49] Hah [19:51] *** BlueMaxim has joined #archiveteam [20:02] I considered it but refrained [20:02] lol [20:03] yipdw: you even evaded xmc's off-topic siren, good job :P [20:03] *** primus104 has quit IRC (Leaving.) [20:05] My fault for picking that name without thinking. I'm not really known as the_fox anywhere else, but I'd prefer to not let my true identity out. Only because I don't want that very same admin to know who I am if he gets wind of this (he also has a history of spying on and retalliating against users he doesn't like) [20:06] fair enough [20:07] the_fox: no worries :) I recommend not posting any identifiable information in this channel, it's publicly logged (afaik) [20:07] that said, please see PM [20:11] *** ruukasu has quit IRC (Ping timeout: 265 seconds) [20:13] *** the_fox has quit IRC () [20:14] *** the_fox has joined #archiveteam [20:17] the_fox: from fauxgames? [20:17] er no, that's 'thefox' sorry [20:17] mixed up two similar nicks [20:22] very different yes [20:22] Gotta step away for a while, will be back [20:25] *** arbin has quit IRC (Read error: Connection reset by peer) [20:26] *** mistym has quit IRC (Remote host closed the connection) [20:27] *** chfoo sets mode: +ooo arkiver ivan` yipdw [20:28] *** dashcloud has quit IRC (Read error: Connection reset by peer) [20:28] *** dashcloud has joined #archiveteam [20:29] *** philpem has joined #archiveteam [20:30] the_fox, you about? [20:30] the_fox, I did some work on archiving FurAffinity some time ago, wrote python scripts and stuff. I may still have them. Just had a friend pass me your messages and figured I'd call in. [20:34] philpem: cc arkiver [20:34] FA has some heavy throttling [20:35] The problem you'll have now is, the sodding thing is sat behind Cloudflare. [20:36] *** Coderjoe has quit IRC (Read error: Operation timed out) [20:36] uff. and how is that part set up? [20:36] Unless Dragoneer has just changed the DNS entry, in which case the info on wikifur might still include server IPs. [20:36] curl seems to work just fine [20:37] so I don't know if it has the "enhanced" protection on [20:37] Depends how hard you hit it I guess. [20:37] yeah — would have to limit [20:37] Running my six-thread monster grabber will probably trip a bandwidth limit [20:39] Certainly got me a few IP bans on my old ISP ^^; [20:39] what often helps is limit with a random factor [20:39] wget's random limit feature basically [20:39] philpem: have you seen chfoo's wpull? [20:39] http://wpull.readthedocs.org/en/master/ [20:40] *** arbin has joined #archiveteam [20:41] balrog, nope [20:42] interesting though [20:42] *** mistym has joined #archiveteam [20:43] *** Coderjoe has joined #archiveteam [20:49] *** dashcloud has quit IRC (Remote host closed the connection) [20:50] *** dashcloud has joined #archiveteam [20:57] *** tg_ has joined #archiveteam [20:58] *** gibigiana has quit IRC (Read error: No route to host) [20:58] *** Deewiant has quit IRC (Read error: No route to host) [20:58] *** aNthraXx has quit IRC (Read error: No route to host) [20:59] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [20:59] *** brayden has joined #archiveteam [21:02] *** brayden_ has quit IRC (Read error: No route to host) [21:03] *** tfgbd has quit IRC (Read error: No route to host) [21:03] *** Ravenloft has quit IRC (Read error: No route to host) [21:06] *** Coderjoe has quit IRC (Ping timeout: 606 seconds) [21:07] *** brayden_ has joined #archiveteam [21:07] *** gibigiana has joined #archiveteam [21:08] *** Deewiant has joined #archiveteam [21:08] *** schbirid has quit IRC (Leaving) [21:08] *** brayden has quit IRC (Read error: No route to host) [21:09] *** K4k has quit IRC (Read error: Operation timed out) [21:09] *** primus104 has joined #archiveteam [21:13] *** Coderjoe has joined #archiveteam [21:14] *** Deewiant has quit IRC (Read error: No route to host) [21:14] *** Sk1d has quit IRC (Read error: No route to host) [21:17] *** aNthraXx has joined #archiveteam [21:17] *** brayden has joined #archiveteam [21:18] *** Sk1d has joined #archiveteam [21:19] *** brayden_ has quit IRC (Read error: No route to host) [21:19] *** Deewiant has joined #archiveteam [21:27] *** mistym has quit IRC (Remote host closed the connection) [21:28] *** aNthraXx has quit IRC (Read error: No route to host) [21:28] *** Deewiant has quit IRC (Read error: No route to host) [21:29] *** godane has quit IRC (Read error: No route to host) [21:33] *** brayden has quit IRC (Ping timeout: 606 seconds) [21:33] *** brayden has joined #archiveteam [21:33] *** aNthraXx has joined #archiveteam [21:34] *** godane has joined #archiveteam [21:35] *** Deewiant has joined #archiveteam [21:43] *** mistym has joined #archiveteam [21:52] *** sankin has quit IRC (Leaving.) [21:54] *** human39 has quit IRC (Leaving) [21:59] *** archvtype has joined #archiveteam [22:09] *** ruukasu has joined #archiveteam [22:13] www.allgame.com is going down 12/12 - I'm doing a mirror+warc [22:15] Hey, I'm back. [22:17] For those unaware: FA tends to have lots of outages. If you try to access it and it fails, don't panic, they're probably just having one of their famous downtimes. [22:18] Famous, infamous, ... :P [22:18] They've got scheduled downtime soonish too [22:20] archvtype: just a fyi, both archivebot and internet archive are grabbing that site right now [22:21] chfoo: ah, thanks; I'll lay off it then [22:23] Oh, I was wrong. FA's robots.txt doesn't explicitly disallow Internet Archive, it indiscriminately disallows all types of crawlers. Not sure if that makes a difference, just wanted to be clear about it. [22:24] ok [22:24] little to no difference [22:24] the_fox: yep [22:38] electric cars talked about in 1996 on talk of the nation: https://archive.org/details/npr-talk-of-the-nation-01-19-1996 [22:44] I have no idea exactly what archiving a large site entails, but I'll throw it out there that I have a 100 Mbps down connection and 1TB of free space that I'm willing to put to good use. [22:46] Realistically it's probably bigger than 1TB. In the last week with my previous ISP (after they tried to screw me over), I decided to turn on the FA grabber. [22:46] Ran it for a few days then stopped. That was ~20Mbps down. [22:46] how much data did you collect? [22:47] also is it mostly photo or is there video content as well? [22:47] video content is really, really painful [22:47] photos — eh, can be dealt with [22:47] Quite a few gigs. Lots of images (full size), SWFs, text, all sorts, and the HTML to go with them (metadata) [22:47] but look at the recent AT projects of video sites [22:47] A few videos here and there, but the vast majority is pictures and text. [22:47] yeah. [22:47] Write some good HTML scraping code and you could archive FA and all the public info [22:48] wpull should be a good place to start [22:48] there are countless accounts that you have to be logged in to access :/ [22:48] but I guess those stay unarchived for ethical reasons [22:48] A couple of throwaway accounts will sort that. [22:48] or, yeah that [22:48] Do you have an average size of all the files you did manage to collect? [22:49] Just multiply that by 15,000,000, and we'll have a good idea of how much space will be needed. [22:49] Sadly not. I reformatted the drive ages ago. [22:49] I think I did anyway. [22:49] Ah. [22:49] I still have the scripts somewhere, probably in a VM. [22:51] Neer has changed some stuff which will probably break it. I wrote some tools to help people move from FA to/from Weasyl some time ago, OK'd it with Dragoneer (Sean "Princess" Piche / FA owner)... guy then turns around, grabs my code from bitbucket and starts changing stuff to thoroughly break it. [22:51] So chances are, the grabber may not work any more either. [22:52] Any idea how much work would be needed to make it work again? (Just FYI, I'll be of no help with code. I never really got past "Hello world!") [22:54] Guess a couple of hours of bashing it around and fixing regexps. [22:54] It might be written around beautifulsoup, in which case it should still work. [22:54] Realistically there's nothing stopping someone archiving FA except disk space and bandwidth. [22:55] if furaffinity is the usual web forum then we have tools to handle that [22:55] anyway this is getting longwinded, should start a separate channel [22:55] someone come up with a punny name [22:55] Furchive? [22:56] does it have to be a "fur" pun? *cringe* [22:56] it has to be *a* pun [22:57] fa also has videos and audios [22:57] Ice King? (FA's main servers are nicknamed Finn and Jake). [22:58] sure #iceking [23:04] just to make it confusing [23:05] strained puns are the best puns [23:05] if a service named, say, FireQueen goes offline in the next few weeks we are of course fucked [23:06] #checkflame [23:06] never mind, garyrh saves [23:10] furinfinity? [23:10] :P [23:11] idk, creative juices low tonight [23:11] joepie91: that's an actual site [23:12] aw damnit [23:14] #furdeficiency [23:15] yes! [23:16] if you're not in #urlteam and you are running urlteam in a warrior, remember to check up on them because it may hang [23:18] so which channel for fa now? [23:19] for now, we're in 'iceking [23:19] #iceking [23:21] the_fox: I haven't read all of the discussion above, but you do still want me to create the warrior scripts right? [23:27] xmc: we can still use that in the project description which i just did right now [23:27] k [23:28] (any leftover puns must not go to waste) [23:33] punservation [23:34] punchive team [23:34] pun pun pun [23:48] *** Deewiant has quit IRC (Read error: Operation timed out) [23:50] *** Deewiant has joined #archiveteam