[00:13] *** tomwsmf-a has quit IRC (Ping timeout: 258 seconds) [00:29] *** VADemon has quit IRC (Quit: left4dead) [00:42] *** tomwsmf-a has joined #archiveteam [01:02] *** BlueMaxim has joined #archiveteam [02:12] *** Coderjoe has quit IRC (Read error: Connection reset by peer) [02:32] *** Coderjoe has joined #archiveteam [02:45] *** Start has joined #archiveteam [03:09] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [03:17] *** philpem has quit IRC (Ping timeout: 260 seconds) [04:09] *** RichardG has quit IRC (Ping timeout: 260 seconds) [04:21] *** JesseW has joined #archiveteam [04:29] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:35] *** Sk1d has joined #archiveteam [05:00] *** JesseW has quit IRC (Ping timeout: 370 seconds) [05:38] *** ariscop has quit IRC (Quit: Leaving) [05:40] *** ariscop has joined #archiveteam [06:44] *** RichardG has joined #archiveteam [07:00] *** RichardG has quit IRC (Ping timeout: 370 seconds) [07:31] *** Honno has joined #archiveteam [07:56] *** philpem has joined #archiveteam [08:19] *** schbirid has joined #archiveteam [08:23] *** godane has quit IRC (Read error: Operation timed out) [08:36] *** godane has joined #archiveteam [09:14] *** zhongfu has quit IRC (Remote host closed the connection) [09:16] *** zhongfu has joined #archiveteam [09:38] *** n3m0 has joined #archiveteam [09:38] *** n3m0 is now known as skrp [09:40] can i get a serious pm. 100T+ bsd over zfs system. 2 Million files. 1Million+ books. I've been archiving on my on, but would appreciate a merger [10:19] *** RichardG has joined #archiveteam [10:31] skrp: https://twitter.com/textfiles/status/736270734033575936 [10:33] *** Tomcat_ has joined #archiveteam [10:36] *** M-davidar has joined #archiveteam [10:41] *** M-davidar is now known as davidar_ [10:53] *** Tomcat__ has joined #archiveteam [10:53] *** Tomcat__ has quit IRC (Remote host closed the connection!) [10:53] *** Tomcat_ has quit IRC (Read error: Operation timed out) [10:58] *** Tomcat_ has joined #archiveteam [10:58] *** Tomcat_ has quit IRC (Connection closed) [10:58] *** Tomcat_ has joined #archiveteam [10:58] *** Tomcat_ has quit IRC (Connection closed) [10:59] *** Tomcat_ has joined #archiveteam [11:06] bai: :/ its hard to take tweeters seriously [11:06] the days when men were men and birds were birds... [11:12] *didn't mean to sound sexist. if he is a lonely housewife, its accepable [11:13] skrp: what do you want to merge? [11:15] are your books from the libgen torrents? [11:32] ivan: do you already have all the libgen? [11:33] no they are wildcard actions ive done via torrent/http/ftp [11:35] ive been working on a c coded archive system that works over zfs. keeping all the files in one pool, naturally deduplicated as each file is named after its $index~$sha256^$filename@$previous_path [11:36] *** Tomcat_ has quit IRC (Remote host closed the connection) [11:37] you add an input source and it extracts recursively everything while maintaining importanta 'metadata' [11:40] ripper -t http -i www.pedrk.com -o /uber_dump --index 010518 #this gives it an ultra transient always updated always deduplicated [zfs dedup is a shod] [11:48] *** BartoCH has quit IRC (Read error: Connection reset by peer) [11:48] skrp: no, I am lacking libgen [11:48] I've never worried about dedup because I just dump hundreds of TB into google drive :-) [11:51] *** ivan is now known as ivan` [11:51] ivan: lol i dont trust The Machine. I maintain my own servers with my accouting business funds [11:51] Downloading a set of subreddits and other sites related to the EU Referendum [11:52] *** ivan has joined #archiveteam [11:52] ivan`: libgen is a very hard to get deal. but if you are willing to deal :D [11:54] im trying to get into a group that shares my same philosophy "Data gets lost. Storage gets cheaper. So get everything now" [11:56] i think we're your people, even if some of us vary in scale - i barely own a single terabyte of my own data :) [11:57] *** BartoCH has joined #archiveteam [11:57] Sanqui: well with one TB you could back up many htmls [11:58] absolutely - i help archive sites with the archivebot and keep some private material too. [11:58] the internet is a glass cannon. all universities share the same 'ebsco host' systems which amount to only 200k files. [11:59] once war hits the internet is bye bye. too insecure [11:59] so thats why i call myself noah of my bsd zfs ark haha [12:00] ivan`: the libgen is alot larger than most ppl think, i suspect the russians also stored stenography information in them as well [12:01] anyway, if you want to speak to somebody serious about your collection, SketchCow's your guy [12:05] well ill be at bsdcan if anyone else from this group is a demon [12:25] *** RichardG_ has joined #archiveteam [12:26] *** RichardG has quit IRC (Read error: Connection reset by peer) [12:39] RuTracker project is runnig again. [12:52] *** ariscop has quit IRC (Ping timeout: 633 seconds) [13:06] *** BlueMaxim has quit IRC (Read error: Operation timed out) [13:07] *** Simpbra1 has joined #archiveteam [13:08] *** Simpbrain has quit IRC (Read error: Connection reset by peer) [13:18] https://launchpad.net/~voltagex/+archive/ubuntu/wget-lua if anyone needs it - rebuilt wget-lua for newer Ubuntus (helpful for EC2 instances) [13:36] *** metalcamp has joined #archiveteam [14:02] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [14:05] *** WinterFox has quit IRC (Remote host closed the connection) [14:09] New wikis are added to the wikis project: [14:09] battlestarwiki.org [14:09] editthis.info [14:09] gamepedia.com [14:09] miraheze.org [14:09] referata.com [14:09] wiki-site.com [14:10] The lists are taken from the wikiteam project [14:10] All external URLs from these sites will be grabbed in the wikis project. [14:10] firng up scripts in a sec [14:10] *firing [14:11] Thanks [14:11] It seems wiki-site.com is currently all failing, but that will be fixed. [14:12] almost all* [14:12] concurrency 4 go! [14:14] The grab is running since november 2015. November 2016 we will regrab all sites, to fetch new external URLs and fetch changed external URLs. [15:03] *** RichardG_ is now known as RichardG [15:23] *** tfgbd has quit IRC (Read error: Connection reset by peer) [15:48] *** JesseW has joined #archiveteam [16:30] *** JesseW has quit IRC (Read error: Operation timed out) [16:42] arkiver: can I just request more wikis to be added? [17:00] *** JesseW has joined #archiveteam [17:33] *** fie has quit IRC (Read error: Operation timed out) [18:09] *** Zinob has joined #archiveteam [18:10] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [18:10] ... oh ok, nvm. Just keep an eye on OpenCores.org? [18:11] it's yahoosucks [18:12] *** JesseW has quit IRC (Read error: Operation timed out) [18:12] Zinob: what's the status of that website? [18:12] *** Zei-Pii has joined #archiveteam [18:14] luckcolor: It is all fine! But the company that hosts the sites biggest customer ran in to problems. [18:15] If you want i can schedule a grab on #archivebot [18:17] *Zinob: [18:17] Either that site will get more love now that they have less to do with the other big company OR it might go belly-up... [18:17] So yeah.. a grab might be in order. [18:18] Ok join the channel, in the topic thet's the link to check the progressi of the crawl [18:18] I dont realy care for the site personally, but it is for FPGA-dessign what SourceForge is for the GPL compunity [18:19] Ok it's going [18:21] Nice [18:21] Let me know if you need anything else :p [18:21] I usually pester a friend that is in the archive-team but i thought that i could check for my self for once :) [18:22] Great stuff, Keep the good work up. Your Wikipedia Archives have helped me a few times [19:18] *** schbirid has quit IRC (Quit: Leaving) [19:36] *** Jeroen52 has quit IRC (Ping timeout: 260 seconds) [19:56] *** Jeroen52 has joined #archiveteam [20:08] *** zino has quit IRC (Read error: Operation timed out) [20:09] *** tomwsmf-a has joined #archiveteam [20:22] *** tfgbd_znc has joined #archiveteam [20:27] *** tfgbd_znc has quit IRC (Read error: Connection reset by peer) [20:41] *** Aranje has joined #archiveteam [21:17] *** VADemon has joined #archiveteam [21:31] *** Zei-Pii has quit IRC (Read error: Connection reset by peer) [21:34] *** maseck has quit IRC (Remote host closed the connection) [21:40] *** maseck has joined #archiveteam [22:47] Sanqui: always! [22:47] For now only mediawikis are supported [22:47] is there a formal way, or should I just put them here? [22:47] What an item looks like: [22:49] for example, mediawikieu:bulpedia.wikia.com/api.php:bulpedia.wikia.com/wiki/ [22:49] 'eu' in mediawikieu means 'external urls' [22:50] bulpedia.wikia.com/api.php is the location of the api.php [22:50] bulpedia.wikia.com/wiki/ is the prefix for the articles [22:50] for example, the above wiki has an page http://bulpedia.wikia.com/wiki/Jokes [22:50] so the prefix is bulpedia.wikia.com/wiki/ [22:51] it is different for different wikis [22:51] if you have all that, it can be added to the warrior grab [23:03] *** FalconK has quit IRC (Ping timeout: 260 seconds) [23:04] *** BlueMaxim has joined #archiveteam [23:05] arkiver: yeah, I can get that. [23:17] *** FalconK has joined #archiveteam [23:49] *** ariscop has joined #archiveteam