#archiveteam 2014-11-18,Tue

↑back Search

Time Nickname Message
00:02 🔗 primus104 has quit IRC (Leaving.)
00:15 🔗 ruukasu has quit IRC (Quit: WeeChat 1.0.1)
00:15 🔗 ruukasu has joined #archiveteam
00:28 🔗 cbb has quit IRC (Ping timeout: 633 seconds)
00:41 🔗 primus104 has joined #archiveteam
01:07 🔗 mistym has quit IRC (Remote host closed the connection)
01:24 🔗 mistym has joined #archiveteam
01:25 🔗 APerti has joined #archiveteam
01:28 🔗 Ymgve has quit IRC ()
01:34 🔗 mistym has quit IRC (Remote host closed the connection)
01:45 🔗 ohhdemgir has joined #archiveteam
01:46 🔗 ohhdemgir https://www.reddit.com/r/DataHoarder/comments/2mlk2f/capturebate_chaturbatecom_automated_data/
01:50 🔗 Ravenloft has quit IRC (Ping timeout: 378 seconds)
02:07 🔗 SmileyG has joined #archiveteam
02:10 🔗 Emcy_ has quit IRC (Ping timeout: 265 seconds)
02:10 🔗 Smiley has quit IRC (Read error: Operation timed out)
02:10 🔗 Emcy has joined #archiveteam
02:24 🔗 primus104 has quit IRC (Leaving.)
02:25 🔗 Ravenloft has joined #archiveteam
03:05 🔗 joepie91 quick notice before I'm going to sleep
03:06 🔗 joepie91 just got back from squatconf, it was a success
03:06 🔗 joepie91 with a horribly duct-taped setup we managed to capture most of the bits of some talks
03:06 🔗 joepie91 so after editing (which will probably take a while, given the messed up source material...) there should be a few new hackercon talks on IA
03:06 🔗 joepie91 :D
03:07 🔗 joepie91 (my photo camera in video mode on a tripod taped to the beamer table combined with the main sound board being patched into the mic port on my laptop running Audacity... not the most reliable audio setup :( )
03:07 🔗 joepie91 s/audio/recording/
03:08 🔗 joepie91 end of notice
03:23 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
03:26 🔗 Lord_Nigh has joined #archiveteam
03:26 🔗 balrog sets mode: +o Lord_Nigh
03:42 🔗 BlueMaxim has joined #archiveteam
03:44 🔗 SmileyG has quit IRC (Ping timeout: 258 seconds)
03:47 🔗 LordNigh2 has joined #archiveteam
03:47 🔗 balrog sets mode: +o LordNigh2
03:48 🔗 Smiley has joined #archiveteam
03:49 🔗 tfgbd has joined #archiveteam
03:50 🔗 Lord_Nigh has quit IRC (Ping timeout: 272 seconds)
03:50 🔗 LordNigh2 is now known as Lord_Nigh
04:19 🔗 Danneh__ has joined #archiveteam
04:20 🔗 danneh_ has quit IRC (Ping timeout: 633 seconds)
04:46 🔗 aaaaaaaaa has quit IRC (Leaving)
05:50 🔗 LordNigh2 has joined #archiveteam
05:56 🔗 Lord_Nigh has quit IRC (Ping timeout: 600 seconds)
05:56 🔗 LordNigh2 is now known as Lord_Nigh
06:03 🔗 mistym has joined #archiveteam
06:12 🔗 zenguy_pc has quit IRC (Ping timeout: 480 seconds)
06:19 🔗 zenguy_pc has joined #archiveteam
06:25 🔗 amerrykan has quit IRC (Quit: Quitting)
06:34 🔗 amerrykan has joined #archiveteam
07:01 🔗 danneh_ has joined #archiveteam
07:03 🔗 Danneh__ has quit IRC (Ping timeout: 633 seconds)
07:40 🔗 n00b807 has joined #archiveteam
07:41 🔗 filippo_ has quit IRC (Ping timeout: 186 seconds)
07:41 🔗 n00b807 has quit IRC (Client Quit)
07:44 🔗 filippo_ has joined #archiveteam
07:59 🔗 dashcloud has quit IRC (Read error: Operation timed out)
08:02 🔗 dashcloud has joined #archiveteam
08:04 🔗 Sk1d has quit IRC (Read error: Operation timed out)
08:09 🔗 Sk1d has joined #archiveteam
08:12 🔗 ZorbaBeta has quit IRC (Read error: Connection reset by peer)
08:12 🔗 ZorbaBeta has joined #archiveteam
08:13 🔗 mistym has quit IRC (Remote host closed the connection)
08:19 🔗 Sk1d has quit IRC (Ping timeout: 265 seconds)
08:20 🔗 primus104 has joined #archiveteam
08:22 🔗 Sk1d has joined #archiveteam
08:25 🔗 Sk2d has joined #archiveteam
08:26 🔗 Sk1d has quit IRC (Read error: Operation timed out)
08:26 🔗 deathy___ has quit IRC (Read error: Connection reset by peer)
08:26 🔗 parsons__ has quit IRC (Read error: Connection reset by peer)
08:28 🔗 deathy___ has joined #archiveteam
08:28 🔗 parsons_ has joined #archiveteam
08:29 🔗 Sk2d has quit IRC (Ping timeout: 265 seconds)
08:30 🔗 Sk1d has joined #archiveteam
08:36 🔗 Sk2d has joined #archiveteam
08:37 🔗 Sk1d has quit IRC (Read error: Operation timed out)
08:37 🔗 Sk2d is now known as Sk1d
08:49 🔗 Sk1d has quit IRC (Ping timeout: 265 seconds)
08:51 🔗 Sk1d has joined #archiveteam
08:59 🔗 amerrykan has quit IRC (west.us.hub irc.mzima.net)
08:59 🔗 Sanqui has quit IRC (west.us.hub irc.mzima.net)
08:59 🔗 Coderjoe has quit IRC (west.us.hub irc.mzima.net)
08:59 🔗 robink has quit IRC (west.us.hub irc.mzima.net)
08:59 🔗 torvik has quit IRC (west.us.hub irc.mzima.net)
08:59 🔗 lysobit has quit IRC (west.us.hub irc.mzima.net)
08:59 🔗 marc has quit IRC (west.us.hub irc.mzima.net)
08:59 🔗 Baljem_ has quit IRC (west.us.hub irc.mzima.net)
08:59 🔗 cloudmons has quit IRC (west.us.hub irc.mzima.net)
09:00 🔗 Morbus has quit IRC (Quit: http://www.disobey.com/)
09:07 🔗 Sk2d has joined #archiveteam
09:10 🔗 Morbus has joined #archiveteam
09:10 🔗 amerrykan has joined #archiveteam
09:10 🔗 Sanqui has joined #archiveteam
09:10 🔗 Coderjoe has joined #archiveteam
09:10 🔗 robink has joined #archiveteam
09:10 🔗 torvik has joined #archiveteam
09:10 🔗 lysobit has joined #archiveteam
09:10 🔗 marc has joined #archiveteam
09:10 🔗 Baljem_ has joined #archiveteam
09:10 🔗 cloudmons has joined #archiveteam
09:10 🔗 Sk1d has quit IRC (Read error: Operation timed out)
09:10 🔗 Sk2d is now known as Sk1d
09:17 🔗 Sk1d has quit IRC (Ping timeout: 265 seconds)
09:20 🔗 Sk1d has joined #archiveteam
09:28 🔗 Sk1d has quit IRC (Read error: Operation timed out)
09:30 🔗 Sk1d has joined #archiveteam
09:30 🔗 dashcloud has quit IRC (Read error: Operation timed out)
09:30 🔗 MMovie has quit IRC (Ping timeout: 335 seconds)
09:35 🔗 Sk1d has quit IRC (Ping timeout: 265 seconds)
09:37 🔗 Sk1d has joined #archiveteam
09:38 🔗 dashcloud has joined #archiveteam
09:58 🔗 ohhdemgir joepie91, >most of the bits of some talks
09:58 🔗 schbirid has joined #archiveteam
10:01 🔗 APerti has quit IRC (Read error: Operation timed out)
10:17 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
10:20 🔗 Lord_Nigh has joined #archiveteam
10:20 🔗 balrog sets mode: +o Lord_Nigh
11:10 🔗 Ymgve has joined #archiveteam
11:22 🔗 primus104 has quit IRC (Leaving.)
11:34 🔗 ruukasu has quit IRC (Ping timeout: 265 seconds)
11:47 🔗 ruukasu has joined #archiveteam
11:49 🔗 ruukasu has quit IRC (Client Quit)
11:58 🔗 BlueMaxim has quit IRC (Quit: Leaving)
12:21 🔗 joepie91 ohhdemgir: not all speakers wanted to be recorded
12:22 🔗 midas record first, delete later
12:22 🔗 midas the archiveteam way
12:23 🔗 joepie91 midas: doesn't work like that for these kind of confs
12:23 🔗 joepie91 but I just woke up, not going to argue this now
12:23 🔗 midas i understand
12:26 🔗 ohhdemgir XD
12:26 🔗 ohhdemgir joepie91, when you're awake get to wildarc, SN4T14 wanted you / irc server something or other blah
12:53 🔗 arbin has quit IRC (Read error: Connection reset by peer)
13:26 🔗 Trskl has quit IRC (Ping timeout: 186 seconds)
13:28 🔗 human39 has joined #archiveteam
13:28 🔗 bsmith093 has quit IRC (Read error: Operation timed out)
13:40 🔗 bsmith093 has joined #archiveteam
13:43 🔗 sankin has joined #archiveteam
13:44 🔗 ruukasu has joined #archiveteam
14:08 🔗 K4k has joined #archiveteam
14:12 🔗 MMovie has joined #archiveteam
14:15 🔗 arbin has joined #archiveteam
14:37 🔗 primus104 has joined #archiveteam
15:14 🔗 schbirid has quit IRC (Read error: Operation timed out)
15:19 🔗 Nemo_bis http://www.americanlibrariesmagazine.org/article/rip-ipl
15:26 🔗 schbirid has joined #archiveteam
15:26 🔗 K4k has quit IRC (WeeChat 1.0.1)
15:31 🔗 T31M has joined #archiveteam
15:38 🔗 aaaaaaaaa has joined #archiveteam
15:59 🔗 arbin has quit IRC (Read error: Connection reset by peer)
16:00 🔗 ruukasu has quit IRC (Quit: WeeChat 1.0.1)
16:00 🔗 ruukasu has joined #archiveteam
16:04 🔗 ruukasu has quit IRC (Client Quit)
16:13 🔗 arbin has joined #archiveteam
16:14 🔗 mistym has joined #archiveteam
16:23 🔗 primus104 has quit IRC (Leaving.)
16:23 🔗 K4k has joined #archiveteam
16:31 🔗 APerti has joined #archiveteam
16:37 🔗 ruukasu has joined #archiveteam
17:03 🔗 mistym has quit IRC (Remote host closed the connection)
17:06 🔗 ruukasu has quit IRC (Ping timeout: 265 seconds)
17:06 🔗 ruukasu has joined #archiveteam
17:25 🔗 mistym has joined #archiveteam
18:13 🔗 nertzy has quit IRC (Leaving)
18:20 🔗 nertzy has joined #archiveteam
19:02 🔗 the_fox has joined #archiveteam
19:14 🔗 the_fox Hello. There's a large website I feel is in danger of shutting down for good. furaffinity.net, the world's largest furry themed website, has recently taken on an admin that has a history of trashing websites when he's angry/bored with them. There are at least two fairly large projects (website and an MMO) he was an admin of where he wiped the servers and attempted to delete the backups
19:14 🔗 the_fox (luckily in both cases, there were additional backups that he didn't know about that allowed them to come back online). I'll spare you the drama filled details, but there have been quite a few other transgressions besides that caused by him. I fear that FA might not be lucky/have a competent enough owner to keep all the backups a secert from this guy, and that FA's days may be numbered.
19:14 🔗 the_fox Also it should be noted that FA's robots.txt specifically excludes the WayBack archive, so there's no hope of getting anything from there if everything goes to hell. I know furries aren't exactly the most popular bunch of people, but this website means a whole lot to me and many others. I'd hate to see it disappear for good.
19:17 🔗 joepie91 the_fox: please hold
19:17 🔗 joepie91 weird, I thought we had a page on the wiki about FA
19:18 🔗 joepie91 yipdw: ivan`: schbirid: xmc: any idea on the status of above? I vaguely recall FA being discussed before
19:18 🔗 schbirid i thought the same but cant remember
19:18 🔗 the_fox Sorry if I'm repeating what's already been said, I just now joined this channel.
19:18 🔗 DFJustin archivebot did a partial crawl of the forums but had to be aborted because it was too big
19:18 🔗 arkiver 15 million images?
19:19 🔗 joepie91 DFJustin: "too big" being?
19:19 🔗 the_fox Yep, 15 million.
19:19 🔗 joepie91 (idk in what 'era' of archivebot this was)
19:19 🔗 arkiver being 15 million images
19:19 🔗 the_fox Should note that many images are marked as "mature", and cannot be seen without signing in.
19:19 🔗 DFJustin there's 1,261,248 threads
19:19 🔗 yipdw joepie91: http://archive.fart.website/archivebot/viewer/job/ccl2v
19:19 🔗 yipdw 11/17, so recent
19:20 🔗 xmc I support doing a logged-in crawl of FA, but not so much putting it into wayback.
19:20 🔗 yipdw also if we need sign-in then archivebot is out anyway
19:20 🔗 joepie91 the_fox: how big of a hard drive do you have? :)
19:20 🔗 joepie91 yipdw: is that 462 gigs?
19:21 🔗 yipdw 462 MB
19:21 🔗 joepie91 or M?
19:21 🔗 the_fox I got a 1TB sitting around with nothing to do. If need be I can scrounge around and find maybe around another TB.
19:21 🔗 joepie91 that's not that big, is it?
19:21 🔗 yipdw it isn't, I suspect the job was prematurely aborted due to realization of "oh shit that's a lot"
19:21 🔗 joepie91 ahhh, heh
19:21 🔗 joepie91 might need to reconsider that now...
19:21 🔗 joepie91 also, xmc, any particular reason?
19:21 🔗 xmc the_fox: what makes you think that FA is near to shutting down?
19:22 🔗 joepie91 (he just explained that)
19:22 🔗 arkiver I can create a project for it and use someone's rsync, if someone'd like to save them
19:22 🔗 yipdw depending on your goals archivebot might not be useful anyway due to sign-in restriction
19:22 🔗 DFJustin a forum of that size would be better done with a specialized forum script or warrior
19:22 🔗 xmc joepie91: same reason i support doing a preemptive crawl of deviantart
19:22 🔗 xmc joepie91: ok.
19:22 🔗 joepie91 xmc: I mean the "not so much putting it into wayback"
19:22 🔗 xmc oh
19:22 🔗 xmc uh, because it'd be a cookie-using crawl
19:22 🔗 DFJustin archivebot maybe could do it but it's not well suited to it
19:22 🔗 DFJustin monster jobs clog up the works
19:25 🔗 joepie91 xmc: ok... is there a particular issue with that?
19:25 🔗 APerti has quit IRC ()
19:25 🔗 arkiver I can create project for it and use cookies and everything to log in
19:26 🔗 xmc not a technical issue. i think it's a social issue though, and one we haven't really talked about much
19:26 🔗 joepie91 from a legal perspective, there's no difference between a registration-less crawl and a crawl where an account is needed that can be registered in an automatic review-less fashion
19:26 🔗 joepie91 to my understanding
19:26 🔗 joepie91 so that shouldn't be a problem
19:26 🔗 xmc fair enough
19:26 🔗 joepie91 if accounts are manually reviewed or such, that can change things
19:26 🔗 the_fox Accounts are not manually reviewed.
19:26 🔗 joepie91 but if it's the bog standard "register a forum account, click confirmation link, done" then it should be fine
19:27 🔗 joepie91 (aside; it's hidden from the wayback anyway)
19:27 🔗 xmc i'm only advocating caution, not restraint
19:27 🔗 xmc :3
19:28 🔗 the_fox Also: FA just recovered from a massive DDoS, so the admins are probably still a bit jumpy about lots of unusual traffic.
19:28 🔗 joepie91 xmc: fair enough
19:29 🔗 joepie91 the_fox: what OS are you running?
19:29 🔗 joepie91 and arkiver, can you run a job with cookies then? a full-blown job of the entire site?
19:29 🔗 joepie91 disk-space wise
19:29 🔗 arkiver joepie91: yes, I did it before the get rid of shutdown messages poping up
19:29 🔗 the_fox Win7. I can easily load Ubuntu back up if I need to though.
19:30 🔗 raylee the_fox: who is this admin btw?
19:31 🔗 primus104 has joined #archiveteam
19:32 🔗 the_fox He goes by starrykitten on FA, but he was previously known as Zidonuke.
19:32 🔗 joepie91 arkiver: alright, can you coordinate with the_fox then?
19:32 🔗 arkiver sure
19:33 🔗 arkiver the_fox: so you do not have a date before it needs to be finished?
19:33 🔗 arkiver also, do you have any contact with the admins
19:33 🔗 arkiver ?
19:34 🔗 the_fox There's no set date. For all I know he might not try to wreck anything. But with his history, I feel sooner would be better than later. And no, no contact with any admins.
19:36 🔗 arkiver ok
19:36 🔗 arkiver will you stay around here for some days?
19:36 🔗 the_fox I will.
19:36 🔗 arkiver ok thank you.
19:36 🔗 arkiver I'll keep you informed about any progress
19:36 🔗 arkiver we do need an rsync
19:36 🔗 arkiver SketchCow ^
19:37 🔗 arkiver I'm going to try make an estimate of the size when the scripts are ready
19:37 🔗 ex-parrot has quit IRC (Leaving.)
19:37 🔗 ruukasu has quit IRC (Quit: WeeChat 1.0.1)
19:37 🔗 ruukasu has joined #archiveteam
19:37 🔗 the_fox Alright then.
19:47 🔗 yipdw wait how did I not make a what does the_fox say joke
19:47 🔗 yipdw ok done sorry
19:47 🔗 xmc lol
19:49 🔗 the_fox Hah
19:51 🔗 BlueMaxim has joined #archiveteam
20:02 🔗 DFJustin I considered it but refrained
20:02 🔗 joepie91 lol
20:03 🔗 joepie91 yipdw: you even evaded xmc's off-topic siren, good job :P
20:03 🔗 primus104 has quit IRC (Leaving.)
20:05 🔗 the_fox My fault for picking that name without thinking. I'm not really known as the_fox anywhere else, but I'd prefer to not let my true identity out. Only because I don't want that very same admin to know who I am if he gets wind of this (he also has a history of spying on and retalliating against users he doesn't like)
20:06 🔗 xmc fair enough
20:07 🔗 joepie91 the_fox: no worries :) I recommend not posting any identifiable information in this channel, it's publicly logged (afaik)
20:07 🔗 joepie91 that said, please see PM
20:11 🔗 ruukasu has quit IRC (Ping timeout: 265 seconds)
20:13 🔗 the_fox has quit IRC ()
20:14 🔗 the_fox has joined #archiveteam
20:17 🔗 Lord_Nigh the_fox: from fauxgames?
20:17 🔗 Lord_Nigh er no, that's 'thefox' sorry
20:17 🔗 Lord_Nigh mixed up two similar nicks
20:22 🔗 xmc very different yes
20:22 🔗 the_fox Gotta step away for a while, will be back
20:25 🔗 arbin has quit IRC (Read error: Connection reset by peer)
20:26 🔗 mistym has quit IRC (Remote host closed the connection)
20:27 🔗 chfoo sets mode: +ooo arkiver ivan` yipdw
20:28 🔗 dashcloud has quit IRC (Read error: Connection reset by peer)
20:28 🔗 dashcloud has joined #archiveteam
20:29 🔗 philpem has joined #archiveteam
20:30 🔗 philpem the_fox, you about?
20:30 🔗 philpem the_fox, I did some work on archiving FurAffinity some time ago, wrote python scripts and stuff. I may still have them. Just had a friend pass me your messages and figured I'd call in.
20:34 🔗 joepie91 philpem: cc arkiver
20:34 🔗 balrog FA has some heavy throttling
20:35 🔗 philpem The problem you'll have now is, the sodding thing is sat behind Cloudflare.
20:36 🔗 Coderjoe has quit IRC (Read error: Operation timed out)
20:36 🔗 balrog uff. and how is that part set up?
20:36 🔗 philpem Unless Dragoneer has just changed the DNS entry, in which case the info on wikifur might still include server IPs.
20:36 🔗 balrog curl seems to work just fine
20:37 🔗 balrog so I don't know if it has the "enhanced" protection on
20:37 🔗 philpem Depends how hard you hit it I guess.
20:37 🔗 balrog yeah — would have to limit
20:37 🔗 philpem Running my six-thread monster grabber will probably trip a bandwidth limit
20:39 🔗 philpem Certainly got me a few IP bans on my old ISP ^^;
20:39 🔗 balrog what often helps is limit with a random factor
20:39 🔗 balrog wget's random limit feature basically
20:39 🔗 balrog philpem: have you seen chfoo's wpull?
20:39 🔗 balrog http://wpull.readthedocs.org/en/master/
20:40 🔗 arbin has joined #archiveteam
20:41 🔗 philpem balrog, nope
20:42 🔗 philpem interesting though
20:42 🔗 mistym has joined #archiveteam
20:43 🔗 Coderjoe has joined #archiveteam
20:49 🔗 dashcloud has quit IRC (Remote host closed the connection)
20:50 🔗 dashcloud has joined #archiveteam
20:57 🔗 tg_ has joined #archiveteam
20:58 🔗 gibigiana has quit IRC (Read error: No route to host)
20:58 🔗 Deewiant has quit IRC (Read error: No route to host)
20:58 🔗 aNthraXx has quit IRC (Read error: No route to host)
20:59 🔗 nertzy has quit IRC (Quit: This computer has gone to sleep)
20:59 🔗 brayden has joined #archiveteam
21:02 🔗 brayden_ has quit IRC (Read error: No route to host)
21:03 🔗 tfgbd has quit IRC (Read error: No route to host)
21:03 🔗 Ravenloft has quit IRC (Read error: No route to host)
21:06 🔗 Coderjoe has quit IRC (Ping timeout: 606 seconds)
21:07 🔗 brayden_ has joined #archiveteam
21:07 🔗 gibigiana has joined #archiveteam
21:08 🔗 Deewiant has joined #archiveteam
21:08 🔗 schbirid has quit IRC (Leaving)
21:08 🔗 brayden has quit IRC (Read error: No route to host)
21:09 🔗 K4k has quit IRC (Read error: Operation timed out)
21:09 🔗 primus104 has joined #archiveteam
21:13 🔗 Coderjoe has joined #archiveteam
21:14 🔗 Deewiant has quit IRC (Read error: No route to host)
21:14 🔗 Sk1d has quit IRC (Read error: No route to host)
21:17 🔗 aNthraXx has joined #archiveteam
21:17 🔗 brayden has joined #archiveteam
21:18 🔗 Sk1d has joined #archiveteam
21:19 🔗 brayden_ has quit IRC (Read error: No route to host)
21:19 🔗 Deewiant has joined #archiveteam
21:27 🔗 mistym has quit IRC (Remote host closed the connection)
21:28 🔗 aNthraXx has quit IRC (Read error: No route to host)
21:28 🔗 Deewiant has quit IRC (Read error: No route to host)
21:29 🔗 godane has quit IRC (Read error: No route to host)
21:33 🔗 brayden has quit IRC (Ping timeout: 606 seconds)
21:33 🔗 brayden has joined #archiveteam
21:33 🔗 aNthraXx has joined #archiveteam
21:34 🔗 godane has joined #archiveteam
21:35 🔗 Deewiant has joined #archiveteam
21:43 🔗 mistym has joined #archiveteam
21:52 🔗 sankin has quit IRC (Leaving.)
21:54 🔗 human39 has quit IRC (Leaving)
21:59 🔗 archvtype has joined #archiveteam
22:09 🔗 ruukasu has joined #archiveteam
22:13 🔗 archvtype www.allgame.com is going down 12/12 - I'm doing a mirror+warc
22:15 🔗 the_fox Hey, I'm back.
22:17 🔗 the_fox For those unaware: FA tends to have lots of outages. If you try to access it and it fails, don't panic, they're probably just having one of their famous downtimes.
22:18 🔗 philpem Famous, infamous, ... :P
22:18 🔗 philpem They've got scheduled downtime soonish too
22:20 🔗 chfoo archvtype: just a fyi, both archivebot and internet archive are grabbing that site right now
22:21 🔗 archvtype chfoo: ah, thanks; I'll lay off it then
22:23 🔗 the_fox Oh, I was wrong. FA's robots.txt doesn't explicitly disallow Internet Archive, it indiscriminately disallows all types of crawlers. Not sure if that makes a difference, just wanted to be clear about it.
22:24 🔗 xmc ok
22:24 🔗 xmc little to no difference
22:24 🔗 balrog the_fox: yep
22:38 🔗 godane electric cars talked about in 1996 on talk of the nation: https://archive.org/details/npr-talk-of-the-nation-01-19-1996
22:44 🔗 the_fox I have no idea exactly what archiving a large site entails, but I'll throw it out there that I have a 100 Mbps down connection and 1TB of free space that I'm willing to put to good use.
22:46 🔗 philpem Realistically it's probably bigger than 1TB. In the last week with my previous ISP (after they tried to screw me over), I decided to turn on the FA grabber.
22:46 🔗 philpem Ran it for a few days then stopped. That was ~20Mbps down.
22:46 🔗 balrog how much data did you collect?
22:47 🔗 balrog also is it mostly photo or is there video content as well?
22:47 🔗 balrog video content is really, really painful
22:47 🔗 balrog photos — eh, can be dealt with
22:47 🔗 philpem Quite a few gigs. Lots of images (full size), SWFs, text, all sorts, and the HTML to go with them (metadata)
22:47 🔗 balrog but look at the recent AT projects of video sites
22:47 🔗 the_fox A few videos here and there, but the vast majority is pictures and text.
22:47 🔗 balrog yeah.
22:47 🔗 philpem Write some good HTML scraping code and you could archive FA and all the public info
22:48 🔗 balrog wpull should be a good place to start
22:48 🔗 balrog there are countless accounts that you have to be logged in to access :/
22:48 🔗 balrog but I guess those stay unarchived for ethical reasons
22:48 🔗 philpem A couple of throwaway accounts will sort that.
22:48 🔗 balrog or, yeah that
22:48 🔗 the_fox Do you have an average size of all the files you did manage to collect?
22:49 🔗 the_fox Just multiply that by 15,000,000, and we'll have a good idea of how much space will be needed.
22:49 🔗 philpem Sadly not. I reformatted the drive ages ago.
22:49 🔗 philpem I think I did anyway.
22:49 🔗 the_fox Ah.
22:49 🔗 philpem I still have the scripts somewhere, probably in a VM.
22:51 🔗 philpem Neer has changed some stuff which will probably break it. I wrote some tools to help people move from FA to/from Weasyl some time ago, OK'd it with Dragoneer (Sean "Princess" Piche / FA owner)... guy then turns around, grabs my code from bitbucket and starts changing stuff to thoroughly break it.
22:51 🔗 philpem So chances are, the grabber may not work any more either.
22:52 🔗 the_fox Any idea how much work would be needed to make it work again? (Just FYI, I'll be of no help with code. I never really got past "Hello world!")
22:54 🔗 philpem Guess a couple of hours of bashing it around and fixing regexps.
22:54 🔗 philpem It might be written around beautifulsoup, in which case it should still work.
22:54 🔗 philpem Realistically there's nothing stopping someone archiving FA except disk space and bandwidth.
22:55 🔗 yipdw if furaffinity is the usual web forum then we have tools to handle that
22:55 🔗 yipdw anyway this is getting longwinded, should start a separate channel
22:55 🔗 yipdw someone come up with a punny name
22:55 🔗 the_fox Furchive?
22:56 🔗 philpem does it have to be a "fur" pun? *cringe*
22:56 🔗 yipdw it has to be *a* pun
22:57 🔗 xmc fa also has videos and audios
22:57 🔗 the_fox Ice King? (FA's main servers are nicknamed Finn and Jake).
22:58 🔗 yipdw sure #iceking
23:04 🔗 xmc just to make it confusing
23:05 🔗 yipdw strained puns are the best puns
23:05 🔗 yipdw if a service named, say, FireQueen goes offline in the next few weeks we are of course fucked
23:06 🔗 garyrh #checkflame
23:06 🔗 yipdw never mind, garyrh saves
23:10 🔗 joepie91 furinfinity?
23:10 🔗 joepie91 :P
23:11 🔗 joepie91 idk, creative juices low tonight
23:11 🔗 chfoo joepie91: that's an actual site
23:12 🔗 joepie91 aw damnit
23:14 🔗 xmc #furdeficiency
23:15 🔗 joepie91 yes!
23:16 🔗 chfoo if you're not in #urlteam and you are running urlteam in a warrior, remember to check up on them because it may hang
23:18 🔗 chfoo so which channel for fa now?
23:19 🔗 Kazzy for now, we're in 'iceking
23:19 🔗 Kazzy #iceking
23:21 🔗 arkiver the_fox: I haven't read all of the discussion above, but you do still want me to create the warrior scripts right?
23:27 🔗 chfoo xmc: we can still use that in the project description which i just did right now
23:27 🔗 xmc k
23:28 🔗 chfoo (any leftover puns must not go to waste)
23:33 🔗 xmc punservation
23:34 🔗 garyrh punchive team
23:34 🔗 garyrh pun pun pun
23:48 🔗 Deewiant has quit IRC (Read error: Operation timed out)
23:50 🔗 Deewiant has joined #archiveteam

irclogger-viewer