[00:00] Well, that reduced the number of pending URLs from 390k to 80k. Nice. :D [00:21] *** tfgbd_znc has quit IRC (Ping timeout: 600 seconds) [00:54] *** BlueMaxim has joined #archiveteam-bs [01:04] *** timmc has left [01:08] *** wp494 has joined #archiveteam-bs [01:10] JAA: I kind of forgot about it. [01:16] arkiver: sweet baby jesus thats a ton of data arkiver [01:16] yep [01:16] so the future of the project is a little unclear [01:17] Ya, more then 200TB you need to start getting IA Tech team on it [01:17] well, we can just pump the data in [01:17] but they'll notice [01:17] especially with this size [01:17] It looks like a amazing datastore [01:17] and might not be happy with this much data [01:17] yeah [01:18] we coordinated with IA [01:18] but was paused after the estimate turned out to be 800 TB [01:18] anyway [01:18] It would be best if they rsynced it right into their own search tool for it, so it had the right metadata [01:19] not sure [01:19] but I'm off [01:19] good day [01:19] * jrwr grabs a stick to hold the fort [01:19] :P [01:26] *** geronimo has joined #archiveteam-bs [01:29] *** username1 has joined #archiveteam-bs [01:32] *** schbirid2 has quit IRC (Read error: Operation timed out) [01:35] *** geronimo has left [01:41] *** j08nY has quit IRC (Quit: Leaving) [01:43] *** fie has joined #archiveteam-bs [02:27] *** antonizoo has quit IRC () [02:33] *** antonizoo has joined #archiveteam-bs [02:36] *** ZexaronS has quit IRC (Leaving) [02:49] *** antonizoo has quit IRC () [02:53] *** antonizoo has joined #archiveteam-bs [03:47] *** Ravenloft has quit IRC (Ping timeout: 245 seconds) [03:59] *** pizzaiolo has quit IRC (Quit: pizzaiolo) [04:23] So just curious, anybody have any scripts for downloading say wikiProjects? [04:23] Trying to download one in particular and not the entirety of Wikipedia. [04:29] Otherwise, looks like I'll need to writer a spider and a basic implementation of PageRank to sort each topic into the correct subgrouping... [04:33] Gilfoyle: We have a set of scripts for scraping the API of Mediawiki installations: https://raw.githubusercontent.com/WikiTeam/wikiteam/master/dumpgenerator.py [04:34] A bit more info on the WikiTeam page: http://archiveteam.org/index.php?title=WikiTeam [04:36] *** divingkat has joined #archiveteam-bs [04:41] Has anyone recorded things on Video 8 cassettes? [04:55] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:02] *** Sk1d has joined #archiveteam-bs [05:31] *** klg has joined #archiveteam-bs [05:49] *** phuzion has quit IRC (Remote host closed the connection) [05:49] *** phuzion has joined #archiveteam-bs [06:04] *** phuzion has quit IRC (Read error: Connection reset by peer) [06:05] *** phuzion has joined #archiveteam-bs [06:33] https://www.youtube.com/watch?v=8t5TYw2bkOk [07:00] *** tsp_ has joined #archiveteam-bs [07:08] *** Simpbrain has quit IRC (Read error: Operation timed out) [07:19] *** Simpbrain has joined #archiveteam-bs [08:20] *** vitzli has joined #archiveteam-bs [08:38] *** Honno has joined #archiveteam-bs [09:00] *** SHODAN_UI has joined #archiveteam-bs [09:29] *** underscor has joined #archiveteam-bs [09:29] *** swebb sets mode: +o underscor [09:48] *** gui7 has joined #archiveteam-bs [09:59] *** vitzli has quit IRC (Quit: Leaving) [10:00] *** divingkat has quit IRC (Quit: ChatZilla 0.9.93 [Firefox 53.0.3/20170518000419]) [10:02] okay, NOW pixiv is done, right? [10:02] or have we still not done the +18 sweep? [10:06] Doesn't look even slightly done. Still 200k items out. [10:14] *** SHODAN_UI has quit IRC (Remote host closed the connection) [10:14] *** j08nY has joined #archiveteam-bs [10:20] Yep, we still need to do the 18+ rooms. [10:33] *** icedice has joined #archiveteam-bs [11:36] *** pizzaiolo has joined #archiveteam-bs [12:08] *** SHODAN_UI has joined #archiveteam-bs [12:10] *** icedice has quit IRC (efnet.portlane.se se.hub) [12:10] *** tsp_ has quit IRC (efnet.portlane.se se.hub) [12:10] *** robinak has quit IRC (efnet.portlane.se se.hub) [12:10] *** ItsYoda has quit IRC (efnet.portlane.se se.hub) [12:10] *** BartoCH has quit IRC (efnet.portlane.se se.hub) [12:10] *** DFJustin has quit IRC (efnet.portlane.se se.hub) [12:10] *** jiphex has quit IRC (efnet.portlane.se se.hub) [12:10] *** r3c0d3x has quit IRC (efnet.portlane.se se.hub) [12:10] *** Whopper has quit IRC (efnet.portlane.se se.hub) [12:10] *** Jon has quit IRC (efnet.portlane.se se.hub) [12:10] *** Kenshin has quit IRC (efnet.portlane.se se.hub) [12:10] *** Sanqui has quit IRC (efnet.portlane.se se.hub) [12:10] *** LastNinja has quit IRC (efnet.portlane.se se.hub) [12:10] *** kittymeow has quit IRC (efnet.portlane.se se.hub) [12:10] *** zhongfu has quit IRC (efnet.portlane.se se.hub) [12:10] *** HCross2 has quit IRC (efnet.portlane.se se.hub) [12:10] *** dan- has quit IRC (efnet.portlane.se se.hub) [12:10] *** voltagex has quit IRC (efnet.portlane.se se.hub) [12:10] *** davidar has quit IRC (efnet.portlane.se se.hub) [12:10] *** Meroje has quit IRC (efnet.portlane.se se.hub) [12:10] *** wm_ has quit IRC (efnet.portlane.se se.hub) [12:10] *** kevinr has quit IRC (efnet.portlane.se se.hub) [12:10] *** tephra_ has quit IRC (efnet.portlane.se se.hub) [12:10] *** ThisAsYou has quit IRC (efnet.portlane.se se.hub) [12:10] *** Ctrl-S___ has quit IRC (efnet.portlane.se se.hub) [12:10] *** deathy has quit IRC (efnet.portlane.se se.hub) [12:10] *** alembic has quit IRC (efnet.portlane.se se.hub) [12:10] *** hook54321 has quit IRC (efnet.portlane.se se.hub) [12:10] *** Famicoman has quit IRC (efnet.portlane.se se.hub) [12:10] *** JSharp___ has quit IRC (efnet.portlane.se se.hub) [12:10] *** tklk has quit IRC (efnet.portlane.se se.hub) [12:10] *** floogulin has quit IRC (efnet.portlane.se se.hub) [12:10] *** FalconK has quit IRC (efnet.portlane.se se.hub) [12:10] *** t2t2 has quit IRC (efnet.portlane.se se.hub) [12:10] *** Muad-Dib has quit IRC (efnet.portlane.se se.hub) [12:10] *** raphidae has quit IRC (efnet.portlane.se se.hub) [12:10] *** antonizoo has quit IRC (efnet.portlane.se se.hub) [12:10] *** JensRex has quit IRC (efnet.portlane.se se.hub) [12:10] *** alfie has quit IRC (efnet.portlane.se se.hub) [12:10] *** Rai-chan has quit IRC (efnet.portlane.se se.hub) [12:10] *** purplebot has quit IRC (efnet.portlane.se se.hub) [12:10] *** Aoede has quit IRC (efnet.portlane.se se.hub) [12:10] *** jtn2 has quit IRC (efnet.portlane.se se.hub) [12:10] *** Fletcher has quit IRC (efnet.portlane.se se.hub) [12:10] *** Hecatz has quit IRC (efnet.portlane.se se.hub) [12:10] *** bwn has quit IRC (efnet.portlane.se se.hub) [12:10] *** medowar has quit IRC (efnet.portlane.se se.hub) [12:10] *** voidsta has quit IRC (efnet.portlane.se se.hub) [12:10] *** Riviera has quit IRC (efnet.portlane.se se.hub) [12:10] *** i0npulse has quit IRC (efnet.portlane.se se.hub) [12:10] *** GLaDOS has quit IRC (efnet.portlane.se se.hub) [12:10] *** nyany has quit IRC (efnet.portlane.se se.hub) [12:10] *** PurpleSym has quit IRC (efnet.portlane.se se.hub) [12:10] *** altlabel has quit IRC (efnet.portlane.se se.hub) [12:10] *** yuitimoth has quit IRC (efnet.portlane.se se.hub) [12:10] *** bsmith093 has quit IRC (efnet.portlane.se se.hub) [12:10] *** antomatic has quit IRC (efnet.portlane.se se.hub) [12:10] *** Xibalba has quit IRC (efnet.portlane.se se.hub) [12:10] *** Boppen has quit IRC (efnet.portlane.se se.hub) [12:10] *** pizzaiolo has quit IRC (efnet.portlane.se se.hub) [12:10] *** Simpbrain has quit IRC (efnet.portlane.se se.hub) [12:10] *** brayden has quit IRC (efnet.portlane.se se.hub) [12:10] *** kurt has quit IRC (efnet.portlane.se se.hub) [12:10] *** wacky_ has quit IRC (efnet.portlane.se se.hub) [12:10] *** HUBI has quit IRC (efnet.portlane.se se.hub) [12:10] *** SpaffGarg has quit IRC (efnet.portlane.se se.hub) [12:10] *** balrog has quit IRC (efnet.portlane.se se.hub) [12:10] *** kisspunch has quit IRC (efnet.portlane.se se.hub) [12:10] *** sep332_ has quit IRC (efnet.portlane.se se.hub) [12:10] *** kanzure has quit IRC (efnet.portlane.se se.hub) [12:10] *** c4rc4s has quit IRC (efnet.portlane.se se.hub) [12:10] *** ranma has quit IRC (efnet.portlane.se se.hub) [12:10] *** whydomain has quit IRC (efnet.portlane.se se.hub) [12:10] *** chazchaz_ has quit IRC (efnet.portlane.se se.hub) [12:10] *** winr4r has quit IRC (efnet.portlane.se se.hub) [12:10] *** closure has quit IRC (efnet.portlane.se se.hub) [12:10] *** godane has quit IRC (efnet.portlane.se se.hub) [12:10] *** robogoat has quit IRC (efnet.portlane.se se.hub) [12:10] *** wabu has quit IRC (efnet.portlane.se se.hub) [12:10] *** trs80 has quit IRC (efnet.portlane.se se.hub) [12:10] *** SadDM has quit IRC (efnet.portlane.se se.hub) [12:10] *** jspiros has quit IRC (efnet.portlane.se se.hub) [12:10] *** j08nY has quit IRC (efnet.portlane.se se.hub) [12:10] *** Odd0002 has quit IRC (efnet.portlane.se se.hub) [12:10] *** SilSte has quit IRC (efnet.portlane.se se.hub) [12:10] *** zerkalo has quit IRC (efnet.portlane.se se.hub) [12:10] *** RedType has quit IRC (efnet.portlane.se se.hub) [12:10] *** Nazca has quit IRC (efnet.portlane.se se.hub) [12:10] *** pikhq has quit IRC (efnet.portlane.se se.hub) [12:10] *** Xamayon has quit IRC (efnet.portlane.se se.hub) [12:10] *** mgrytbak has quit IRC (efnet.portlane.se se.hub) [12:10] *** acridAxid has quit IRC (efnet.portlane.se se.hub) [12:10] *** joepie91 has quit IRC (efnet.portlane.se se.hub) [12:10] *** cf has quit IRC (efnet.portlane.se se.hub) [12:10] *** chfoo has quit IRC (efnet.portlane.se se.hub) [12:10] *** eprillios has quit IRC (efnet.portlane.se se.hub) [12:10] *** tapedrive has quit IRC (efnet.portlane.se se.hub) [12:10] *** underscor has quit IRC (efnet.portlane.se se.hub) [12:10] *** espes__ has quit IRC (efnet.portlane.se se.hub) [12:10] *** kvieta has quit IRC (efnet.portlane.se se.hub) [12:10] *** klg has quit IRC (efnet.portlane.se se.hub) [12:10] *** Honno has quit IRC (efnet.portlane.se se.hub) [12:10] *** fie has quit IRC (efnet.portlane.se se.hub) [12:10] *** username1 has quit IRC (efnet.portlane.se se.hub) [12:10] *** BlueMaxim has quit IRC (efnet.portlane.se se.hub) [12:10] *** TheLovina has quit IRC (efnet.portlane.se se.hub) [12:10] *** dashcloud has quit IRC (efnet.portlane.se se.hub) [12:10] *** odemg has quit IRC (efnet.portlane.se se.hub) [12:10] *** zenguy has quit IRC (efnet.portlane.se se.hub) [12:10] *** superkuh has quit IRC (efnet.portlane.se se.hub) [12:10] *** xmc has quit IRC (efnet.portlane.se se.hub) [12:10] *** slyphic has quit IRC (efnet.portlane.se se.hub) [12:10] *** w0rp has quit IRC (efnet.portlane.se se.hub) [12:10] *** zino has quit IRC (efnet.portlane.se se.hub) [12:10] *** ranma_ has quit IRC (efnet.portlane.se se.hub) [12:10] *** Selavi has quit IRC (efnet.portlane.se se.hub) [12:10] *** Frogging has quit IRC (efnet.portlane.se se.hub) [12:10] *** htw has quit IRC (efnet.portlane.se se.hub) [12:10] *** dxrt has quit IRC (efnet.portlane.se se.hub) [12:10] *** twigfoot has quit IRC (efnet.portlane.se se.hub) [12:10] *** MrRadar has quit IRC (efnet.portlane.se se.hub) [12:10] *** atlogbot has quit IRC (efnet.portlane.se se.hub) [12:10] *** chazchaz has quit IRC (efnet.portlane.se se.hub) [12:10] *** atomicthu has quit IRC (efnet.portlane.se se.hub) [12:10] *** TC01 has quit IRC (efnet.portlane.se se.hub) [12:10] *** swebb has quit IRC (efnet.portlane.se se.hub) [12:10] *** Darkstar has quit IRC (efnet.portlane.se se.hub) [12:10] *** dcmorton has quit IRC (efnet.portlane.se se.hub) [12:10] *** Somebody2 has quit IRC (efnet.portlane.se se.hub) [12:10] *** Cameron_D has quit IRC (efnet.portlane.se se.hub) [12:10] *** arkiver has quit IRC (efnet.portlane.se se.hub) [12:10] *** Coderjo has quit IRC (efnet.portlane.se se.hub) [12:10] *** Jonimoose has quit IRC (efnet.portlane.se se.hub) [12:10] *** midas has quit IRC (efnet.portlane.se se.hub) [12:10] *** yipdw has quit IRC (efnet.portlane.se se.hub) [12:10] *** Baljem has quit IRC (efnet.portlane.se se.hub) [12:10] *** lainu has quit IRC (efnet.portlane.se se.hub) [12:10] *** gui7 has quit IRC (efnet.portlane.se se.hub) [12:10] *** phuzion has quit IRC (efnet.portlane.se se.hub) [12:10] *** wp494 has quit IRC (efnet.portlane.se se.hub) [12:10] *** decay has quit IRC (efnet.portlane.se se.hub) [12:10] *** jrwr has quit IRC (efnet.portlane.se se.hub) [12:10] *** C4K3 has quit IRC (efnet.portlane.se se.hub) [12:10] *** Stilett0 has quit IRC (efnet.portlane.se se.hub) [12:10] *** REiN^ has quit IRC (efnet.portlane.se se.hub) [12:10] *** ndiddy has quit IRC (efnet.portlane.se se.hub) [12:10] *** ivan has quit IRC (efnet.portlane.se se.hub) [12:10] *** Smiley has quit IRC (efnet.portlane.se se.hub) [12:10] *** Mayonaise has quit IRC (efnet.portlane.se se.hub) [12:10] *** pipt has quit IRC (efnet.portlane.se se.hub) [12:10] *** dboard has quit IRC (efnet.portlane.se se.hub) [12:10] *** rocode has quit IRC (efnet.portlane.se se.hub) [12:10] *** K4k has quit IRC (efnet.portlane.se se.hub) [12:10] *** will has quit IRC (efnet.portlane.se se.hub) [12:10] *** Petri152 has quit IRC (efnet.portlane.se se.hub) [12:10] *** JAA has quit IRC (efnet.portlane.se se.hub) [12:10] *** mhazinsk has quit IRC (efnet.portlane.se se.hub) [12:10] *** Lord_Nigh has quit IRC (efnet.portlane.se se.hub) [12:10] *** luckcolor has quit IRC (efnet.portlane.se se.hub) [12:10] *** aschmitz has quit IRC (efnet.portlane.se se.hub) [12:10] *** mundus20- has quit IRC (efnet.portlane.se se.hub) [12:10] *** PotcFdk has quit IRC (efnet.portlane.se se.hub) [12:10] *** johnny5 has quit IRC (efnet.portlane.se se.hub) [12:10] *** SHODAN_UI has quit IRC (efnet.portlane.se se.hub) [12:10] *** RichardG has quit IRC (efnet.portlane.se se.hub) [12:10] *** Gilfoyle has quit IRC (efnet.portlane.se se.hub) [12:10] *** greenie has quit IRC (efnet.portlane.se se.hub) [12:14] *** Xibalba has joined #archiveteam-bs [12:14] *** antomatic has joined #archiveteam-bs [12:14] *** bsmith093 has joined #archiveteam-bs [12:14] *** yuitimoth has joined #archiveteam-bs [12:14] *** altlabel has joined #archiveteam-bs [12:14] *** PurpleSym has joined #archiveteam-bs [12:14] *** nyany has joined #archiveteam-bs [12:14] *** GLaDOS has joined #archiveteam-bs [12:14] *** Boppen has joined #archiveteam-bs [12:14] *** i0npulse has joined #archiveteam-bs [12:14] *** Riviera has joined #archiveteam-bs [12:14] *** medowar has joined #archiveteam-bs [12:14] *** voidsta has joined #archiveteam-bs [12:14] *** bwn has joined #archiveteam-bs [12:14] *** Hecatz has joined #archiveteam-bs [12:14] *** Fletcher has joined #archiveteam-bs [12:14] *** jtn2 has joined #archiveteam-bs [12:14] *** Aoede has joined #archiveteam-bs [12:14] *** purplebot has joined #archiveteam-bs [12:14] *** Rai-chan has joined #archiveteam-bs [12:14] *** alfie has joined #archiveteam-bs [12:14] *** JensRex has joined #archiveteam-bs [12:14] *** antonizoo has joined #archiveteam-bs [12:14] *** ranma has joined #archiveteam-bs [12:14] *** whydomain has joined #archiveteam-bs [12:14] *** Baljem has joined #archiveteam-bs [12:14] *** lainu has joined #archiveteam-bs [12:14] *** chazchaz_ has joined #archiveteam-bs [12:14] *** winr4r has joined #archiveteam-bs [12:14] *** closure has joined #archiveteam-bs [12:14] *** c4rc4s has joined #archiveteam-bs [12:14] *** kanzure has joined #archiveteam-bs [12:14] *** yipdw has joined #archiveteam-bs [12:14] *** se.hub sets mode: +oooo antomatic PurpleSym closure yipdw [12:14] *** midas has joined #archiveteam-bs [12:14] *** Jonimoose has joined #archiveteam-bs [12:14] *** Coderjo has joined #archiveteam-bs [12:14] *** arkiver has joined #archiveteam-bs [12:14] *** Cameron_D has joined #archiveteam-bs [12:14] *** Somebody2 has joined #archiveteam-bs [12:14] *** dcmorton has joined #archiveteam-bs [12:14] *** sep332_ has joined #archiveteam-bs [12:14] *** Darkstar has joined #archiveteam-bs [12:14] *** kisspunch has joined #archiveteam-bs [12:14] *** johnny5 has joined #archiveteam-bs [12:14] *** swebb has joined #archiveteam-bs [12:14] *** se.hub sets mode: +ooo Jonimoose arkiver swebb [12:14] *** TC01 has joined #archiveteam-bs [12:14] *** atomicthu has joined #archiveteam-bs [12:14] *** PotcFdk has joined #archiveteam-bs [12:14] *** mundus20- has joined #archiveteam-bs [12:14] *** balrog has joined #archiveteam-bs [12:14] *** chazchaz has joined #archiveteam-bs [12:14] *** atlogbot has joined #archiveteam-bs [12:14] *** MrRadar has joined #archiveteam-bs [12:14] *** kvieta has joined #archiveteam-bs [12:14] *** espes__ has joined #archiveteam-bs [12:14] *** tapedrive has joined #archiveteam-bs [12:14] *** eprillios has joined #archiveteam-bs [12:14] *** chfoo has joined #archiveteam-bs [12:14] *** cf has joined #archiveteam-bs [12:14] *** joepie91 has joined #archiveteam-bs [12:14] *** dxrt has joined #archiveteam-bs [12:14] *** acridAxid has joined #archiveteam-bs [12:14] *** SpaffGarg has joined #archiveteam-bs [12:14] *** HUBI has joined #archiveteam-bs [12:14] *** wacky_ has joined #archiveteam-bs [12:14] *** aschmitz has joined #archiveteam-bs [12:14] *** luckcolor has joined #archiveteam-bs [12:14] *** Lord_Nigh has joined #archiveteam-bs [12:14] *** Xamayon has joined #archiveteam-bs [12:14] *** htw has joined #archiveteam-bs [12:14] *** mhazinsk has joined #archiveteam-bs [12:14] *** JAA has joined #archiveteam-bs [12:14] *** Petri152 has joined #archiveteam-bs [12:14] *** will has joined #archiveteam-bs [12:14] *** mgrytbak has joined #archiveteam-bs [12:14] *** kurt has joined #archiveteam-bs [12:14] *** K4k has joined #archiveteam-bs [12:14] *** Frogging has joined #archiveteam-bs [12:14] *** Selavi has joined #archiveteam-bs [12:14] *** rocode has joined #archiveteam-bs [12:14] *** ranma_ has joined #archiveteam-bs [12:14] *** dboard has joined #archiveteam-bs [12:14] *** greenie has joined #archiveteam-bs [12:14] *** trs80 has joined #archiveteam-bs [12:14] *** SadDM has joined #archiveteam-bs [12:14] *** jspiros has joined #archiveteam-bs [12:14] *** brayden has joined #archiveteam-bs [12:14] *** pipt has joined #archiveteam-bs [12:14] *** zino has joined #archiveteam-bs [12:14] *** Mayonaise has joined #archiveteam-bs [12:14] *** pikhq has joined #archiveteam-bs [12:14] *** se.hub sets mode: +ooo balrog SadDM brayden [12:14] *** w0rp has joined #archiveteam-bs [12:14] *** slyphic has joined #archiveteam-bs [12:14] *** Nazca has joined #archiveteam-bs [12:14] *** xmc has joined #archiveteam-bs [12:14] *** twigfoot has joined #archiveteam-bs [12:14] *** wabu has joined #archiveteam-bs [12:14] *** robogoat has joined #archiveteam-bs [12:14] *** RedType has joined #archiveteam-bs [12:14] *** superkuh has joined #archiveteam-bs [12:14] *** zerkalo has joined #archiveteam-bs [12:14] *** zenguy has joined #archiveteam-bs [12:14] *** Smiley has joined #archiveteam-bs [12:14] *** ivan has joined #archiveteam-bs [12:14] *** odemg has joined #archiveteam-bs [12:14] *** Gilfoyle has joined #archiveteam-bs [12:14] *** ndiddy has joined #archiveteam-bs [12:14] *** REiN^ has joined #archiveteam-bs [12:14] *** Stilett0 has joined #archiveteam-bs [12:14] *** C4K3 has joined #archiveteam-bs [12:14] *** dashcloud has joined #archiveteam-bs [12:14] *** TheLovina has joined #archiveteam-bs [12:14] *** RichardG has joined #archiveteam-bs [12:14] *** godane has joined #archiveteam-bs [12:14] *** jrwr has joined #archiveteam-bs [12:14] *** SilSte has joined #archiveteam-bs [12:14] *** Odd0002 has joined #archiveteam-bs [12:14] *** decay has joined #archiveteam-bs [12:14] *** BlueMaxim has joined #archiveteam-bs [12:14] *** wp494 has joined #archiveteam-bs [12:14] *** username1 has joined #archiveteam-bs [12:14] *** fie has joined #archiveteam-bs [12:14] *** klg has joined #archiveteam-bs [12:14] *** phuzion has joined #archiveteam-bs [12:14] *** Simpbrain has joined #archiveteam-bs [12:14] *** Honno has joined #archiveteam-bs [12:14] *** underscor has joined #archiveteam-bs [12:14] *** gui7 has joined #archiveteam-bs [12:14] *** j08nY has joined #archiveteam-bs [12:14] *** pizzaiolo has joined #archiveteam-bs [12:14] *** SHODAN_UI has joined #archiveteam-bs [12:14] *** se.hub sets mode: +oo xmc underscor [12:14] *** swebb sets mode: +o SketchCow [12:14] *** icedice has joined #archiveteam-bs [12:14] *** tsp_ has joined #archiveteam-bs [12:14] *** robinak has joined #archiveteam-bs [12:14] *** ItsYoda has joined #archiveteam-bs [12:14] *** BartoCH has joined #archiveteam-bs [12:14] *** DFJustin has joined #archiveteam-bs [12:14] *** jiphex has joined #archiveteam-bs [12:14] *** r3c0d3x has joined #archiveteam-bs [12:14] *** Whopper has joined #archiveteam-bs [12:14] *** Jon has joined #archiveteam-bs [12:14] *** Kenshin has joined #archiveteam-bs [12:14] *** Sanqui has joined #archiveteam-bs [12:14] *** LastNinja has joined #archiveteam-bs [12:14] *** kittymeow has joined #archiveteam-bs [12:14] *** zhongfu has joined #archiveteam-bs [12:14] *** HCross2 has joined #archiveteam-bs [12:14] *** dan- has joined #archiveteam-bs [12:14] *** voltagex has joined #archiveteam-bs [12:14] *** davidar has joined #archiveteam-bs [12:14] *** Meroje has joined #archiveteam-bs [12:14] *** wm_ has joined #archiveteam-bs [12:14] *** kevinr has joined #archiveteam-bs [12:14] *** tephra_ has joined #archiveteam-bs [12:14] *** Ctrl-S___ has joined #archiveteam-bs [12:14] *** ThisAsYou has joined #archiveteam-bs [12:14] *** alembic has joined #archiveteam-bs [12:14] *** deathy has joined #archiveteam-bs [12:14] *** hook54321 has joined #archiveteam-bs [12:14] *** Famicoman has joined #archiveteam-bs [12:14] *** JSharp___ has joined #archiveteam-bs [12:14] *** tklk has joined #archiveteam-bs [12:14] *** floogulin has joined #archiveteam-bs [12:14] *** FalconK has joined #archiveteam-bs [12:14] *** t2t2 has joined #archiveteam-bs [12:14] *** raphidae has joined #archiveteam-bs [12:14] *** Muad-Dib has joined #archiveteam-bs [12:14] *** efnet.port80.se sets mode: +o DFJustin [12:14] *** swebb sets mode: +o DFJustin [12:28] *** BlueMaxim has quit IRC (Quit: Leaving) [12:28] Tanobb grab complete. That was way quicker than I expected. I grabbed all 13 languages and the mobile versions as well. I skipped some redundant pages though, e.g. those listing all posts by one user in a thread. [14:05] *** ZexaronS has joined #archiveteam-bs [14:07] *** icedice has quit IRC (Quit: Leaving) [14:23] *** pizzaiolo has quit IRC (Read error: Operation timed out) [14:32] *** pizzaiolo has joined #archiveteam-bs [14:32] *** pizzaiolo has quit IRC (Read error: Connection reset by peer) [14:33] *** pizzaiolo has joined #archiveteam-bs [14:41] *** MrRadar_ has joined #archiveteam-bs [14:49] *** MrRadar has quit IRC (Read error: Operation timed out) [14:49] *** MrRadar_ is now known as MrRadar [15:56] *** odemg has quit IRC (Read error: Operation timed out) [16:00] *** odemg has joined #archiveteam-bs [19:10] arkiver: No, I'm using the same format bsmith093 used (https://github.com/JimmXinu/FanFicFare) which extracts the story text into markdown. [19:11] Any other way and my limited disk space would be completly used up. [19:11] I see [19:11] What are you currently using? [19:11] maybe the WARCs could be uploaded to IA for the wayback machine [19:11] if there's any copyright issues they'll probably block the pages or website in the wayback machine [19:13] arkiver: Example format: http://storage.savefanfiction.tk/Prince%20Consort-ffnet_8902231.txt (re-archived today) [19:14] All fanfiction.net URLs are robots.txt blocked anyway, actually. [19:14] Yet another good reason to grab them all as WARCs... so when the site goes down in the future and someone removes the robots.txt block there will be content to show [19:15] Also, I wonder how well git would handle if you put all these files into it... [19:15] *** SmileyG has joined #archiveteam-bs [19:17] you could point FFF at warcproxy maybe? [19:20] Total of 7,382,393 plaintext files, all in the same folder? Few hundred gigabytes? No idea if git would cope with that. [19:20] *** SmileyG has quit IRC (Read error: Connection reset by peer) [19:20] *** SmileyG has joined #archiveteam-bs [19:21] you could move based off the first few letters of its sha1 of the name of the file [19:21] into their own folders, or even the first few letters of the title of the file [19:21] I've thought about doing WARCs and stuff like that, but I'm running everything on a couple of Raspberry Pis and a 2tb HD, so... [19:22] *** Smiley has quit IRC (Read error: Operation timed out) [19:22] thing is that once it's not in WARCs, it will probably never go into WARCs and therefor will never get into an archive like the wayback machine [19:23] tapedrive: you will probably get better performance in any case if you split that into subdirectories :D [19:23] username1: Yeah, I can't realy do much in that directory any more [19:23] But it makes merging newly updates stories much easier. [19:24] arkiver: I'll think about WARCs for the future. To get them into wayback, do I just upload them to IA with type web? [19:26] and let us know about them [19:26] we'll first have to make sure the WARCs are valid of course [19:26] and not edited [19:26] And if I upload more WARCs into that item, will they be auto-added to wayback? [19:26] I think so [19:26] but multiple items might be better in that case [19:27] also again depending on the number of WARCs [19:27] you could do like 1 or 10 GB per item [19:29] Okay, I'll see if I can add that in. [19:31] *** medowar has quit IRC (Read error: Connection reset by peer) [19:32] nice :) [19:54] *** tfgbd_znc has joined #archiveteam-bs [19:55] *** Ravenloft has joined #archiveteam-bs [20:03] quick question: do you guys know how to download from veoh.com? [20:03] there's this rare video that i can only find there [20:03] link? [20:03] http://www.veoh.com/watch/v1313996wddKMNqf?h1=Super+Spacefortress+Macross [20:04] it's the only copy i can find of the uncut, hilariously bad dub of the first macross movie [20:05] Ugh, Flash. [20:06] ndiddy: uh [20:06] ndiddy: youtube-dl "http://www.veoh.com/watch/v1313996wddKMNqf?h1=Super+Spacefortress+Macross" [20:06] done [20:06] i tried jdownloader but it only downloads the first 1/4 of it [20:07] go to source [20:07] search for fullPreviewHashLowPath [20:07] and download that URL [20:08] Low vs. High quality/resolution? [20:11] ah sorry [20:12] [download] 10.0% of 881.42MiB at 321.50KiB/s ETA 42:07 [20:12] i can up it elsewhere afterwards if needed [20:12] tell me if it downloads all the way [20:12] :) [20:12] also, youtube-dl gives me an "unsupported url" error [20:12] update [20:13] search for fullPreviewHashHighPath [20:13] that's the same resolution as when HQ is selected in the player [20:13] uh [20:13] i assumed so [20:13] same size [20:14] What does "Preview" mean in there though? [20:14] no idea [20:14] it looks to be the same size though [20:14] but it's the version that's loaded in browse [20:14] yeah [20:14] the way they have the site set up is kinda strange [20:15] it downloads the first couple megs at full speed than throttles you to 300 kbps [20:15] Yeah, the whole page is really cancerous. [20:15] many streaming sites do that [20:15] The amount of third-party JavaScript code being pulled there is ridiculous. [20:16] That kind of throttling is pretty common because most people don't watch most of the video anyways [20:16] So they give you just enough to ensure there's a decent buffer, then feed you the rest at approx. the playback rate [20:16] So they don't waste too much bandwidth if you close the tab after a minute [20:16] have you tried watching the network tab while watching via flash? [20:17] no, why? [20:17] yeah, it's the same [20:17] to figure out how it works? [20:17] Could it be that the Preview is only the first 25% (what JDownloader grabs)? [20:17] looks like jdownloader was using a different url [20:18] http://fcache.veoh.com/file/f/h1313996.mp4?e=1497209817&ri=6000&rs=300&h=fa032ed6a31a75daab4b4503f60e35f9 [20:18] vs page editing, which gives you http://content.veoh.com/flash/p/2/v1313996wddKMNqf/h1313996.mp4?ct=bdd4eff4404837d31915158fe9f6a3fe3c9aaf63da409283 [20:18] I see. [20:19] damn it, same thing [20:20] arkiver: can you download the whole video? [20:20] well, yeah, why? [20:20] i just got 200 mb again from that link [20:21] in source search for fullPreviewHashHighPath [20:21] and download that [20:21] that's what i did [20:22] i'll see if watching the video in a muted tab while downloading fixes anything [20:22] i'm assuming that that url is the one the player buffers from [20:22] and it won't buffer more than 25% into the video [20:23] before you can see fullPreviewHashHighPath you need to have 18+ cookie of course [20:23] sure you didn't use fullPreviewHashLowPath? [20:23] that gives you a 200 MB one [20:23] fullPreviewHashHighPath gives 881 or so [20:23] 881 MB* [20:24] [download] 100% of 881.42MiB in 12:40 [20:24] ERROR: content too short (expected 924240150 bytes and served 239785474) [20:24] see what i mean [20:26] well, can you actually skip to later via flash? [20:26] yep [20:26] it's not like nnd or something [20:26] Does anything happen in the Network tab? [20:26] you mean, like the windows one? [20:26] no, the browser one [20:26] The browser's. [20:26] :O [20:26] is that a thing [20:26] zomg :P [20:26] you have a lot to learn [20:27] Press F12 in the tab [20:27] Essential tool #1 for web devs. :-P [20:27] and people trying to download special things [20:28] looks like there's a ?start parameter [20:31] sorry, &start [20:31] ex: http://content.veoh.com/flash/f/2/v1313996wddKMNqf/l1313996.mp4?ct=466b8df98e1762caf22b228f86934f1623d189f7d4c429bb&start=4659.76 [20:33] it will only send you the video starting from that time [20:34] i guess what i have to do is download the video from the first link, count how many seconds are in it, then redownload and splice all the clips together [20:40] *** SmileyG has quit IRC (Remote host closed the connection) [20:42] *** BlueMaxim has joined #archiveteam-bs [20:56] *** username1 has quit IRC (Quit: Leaving) [21:01] *** SHODAN_UI has quit IRC (Remote host closed the connection) [21:37] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [21:48] *** dashcloud has quit IRC (Remote host closed the connection) [21:55] *** dashcloud has joined #archiveteam-bs [22:43] tapedrive: thank you for grabbing the text of that fanfiction, in any case. It's useful, even though WARCs would be very nice too. [22:45] I'm seeing how easy it would be to add in WARC to my system now. [22:45] Although I think it would mean an entire recrawl, which has taken several (about 10) months due to their rate limiting. [22:52] https://archive.org/details/SuperSpacefortressMacross [22:52] *** wp494 has quit IRC (Read error: Connection reset by peer) [23:04] *** Ravenloft has quit IRC (Ping timeout: 268 seconds) [23:21] *** antomati_ has joined #archiveteam-bs [23:21] *** swebb sets mode: +o antomati_ [23:22] *** antomatic has quit IRC (Read error: Operation timed out) [23:33] *** BartoCH has joined #archiveteam-bs [23:52] SketchCow: Fix it please - Could not store file "/tmp/phphIFGty" at "mwstore://local-backend/local-public/d/da/Chatpixivicon.gif". [23:52] its still broken :(