[00:18] *** icedice has joined #archiveteam-bs [00:53] *** Nick-PC has joined #archiveteam-bs [00:55] *** Nick-PC_ has quit IRC (Ping timeout: 252 seconds) [00:59] *** wyatt8740 has quit IRC (Read error: Operation timed out) [01:00] *** X-Scale has joined #archiveteam-bs [01:15] *** killsushi has quit IRC (Quit: Leaving) [01:46] Just keep goin [01:51] *** wyatt8740 has joined #archiveteam-bs [01:51] *** VerifiedJ has quit IRC (Quit: Leaving) [01:57] *** ShellyRol has quit IRC (Ping timeout: 745 seconds) [02:06] *** ShellyRol has joined #archiveteam-bs [02:31] *** jake_test has joined #archiveteam-bs [03:22] *** manjaro-u has quit IRC (Read error: Operation timed out) [03:59] *** Nick-PC has quit IRC (Ping timeout: 252 seconds) [04:22] *** scorche` has joined #archiveteam-bs [04:22] *** scorche has quit IRC (Read error: Operation timed out) [04:22] *** scorche` is now known as scorche [04:23] *** odemgi_ has joined #archiveteam-bs [04:26] *** qw3rty2 has joined #archiveteam-bs [04:27] *** odemgi has quit IRC (Read error: Operation timed out) [04:30] *** BartoCH has quit IRC (Ping timeout: 615 seconds) [04:30] *** Mateon1 has quit IRC (Read error: Connection reset by peer) [04:30] *** Mateon1 has joined #archiveteam-bs [04:31] *** britmob has quit IRC (Read error: Connection reset by peer) [04:31] *** BartoCH has joined #archiveteam-bs [04:32] *** paul2520 has quit IRC (Read error: Operation timed out) [04:32] *** markedL has quit IRC (Read error: Operation timed out) [04:32] *** britmob has joined #archiveteam-bs [04:33] *** qw3rty has quit IRC (Ping timeout: 745 seconds) [04:34] *** nyany__ has quit IRC (Read error: Operation timed out) [04:35] *** odemg has quit IRC (Ping timeout: 746 seconds) [04:36] *** godane has quit IRC (Ping timeout: 246 seconds) [04:36] *** paul2520 has joined #archiveteam-bs [04:37] *** markedL has joined #archiveteam-bs [04:39] *** odemg has joined #archiveteam-bs [04:40] *** kyledrake has quit IRC (Ping timeout: 496 seconds) [04:40] *** nyany_ has joined #archiveteam-bs [04:40] *** kyledrake has joined #archiveteam-bs [04:53] *** godane has joined #archiveteam-bs [05:03] *** Nick-PC has joined #archiveteam-bs [05:18] *** benjins has quit IRC (Read error: Operation timed out) [05:19] *** benjins has joined #archiveteam-bs [05:43] *** scorche has quit IRC (Read error: Operation timed out) [05:50] *** scorche has joined #archiveteam-bs [05:57] *** dewdrop has joined #archiveteam-bs [05:59] *** dewdropaw has quit IRC (Ping timeout: 255 seconds) [07:02] *** kiska18 has quit IRC (Remote host closed the connection) [07:02] *** Ryz has quit IRC (Remote host closed the connection) [07:02] *** kiska18 has joined #archiveteam-bs [07:02] *** Fusl__ sets mode: +o kiska18 [07:02] *** Fusl sets mode: +o kiska18 [07:02] *** Fusl_ sets mode: +o kiska18 [07:03] *** Ryz has joined #archiveteam-bs [07:42] *** godane has quit IRC (Ping timeout: 258 seconds) [08:03] *** benjins has quit IRC (Read error: Operation timed out) [08:05] *** benjins has joined #archiveteam-bs [08:20] *** LowLevelM has quit IRC (Read error: Operation timed out) [08:31] *** thelounge has joined #archiveteam-bs [10:33] *** odemgi has joined #archiveteam-bs [10:37] *** odemgi_ has quit IRC (Read error: Operation timed out) [12:09] *** deevious has quit IRC (Ping timeout: 252 seconds) [12:16] *** deevious has joined #archiveteam-bs [12:48] *** godane has joined #archiveteam-bs [12:50] Download of those Intel downloads my qwarc crawl discovered is running now on mips. [13:11] so i'm now at 1780k items [13:25] Fusl: Is there a particular reason why the Intel download WARCs (and the other stuff on abox hel1) aren't uploaded to IA yet, or are they just there because mips was running out of space? [13:33] *** britmob has quit IRC (Read error: Connection reset by peer) [13:34] *** Damme_ has joined #archiveteam-bs [13:34] *** icedice has quit IRC (Read error: Connection reset by peer) [13:34] *** phillipsj has quit IRC (Read error: Connection reset by peer) [13:34] *** britmob has joined #archiveteam-bs [13:34] *** Nick-PC_ has joined #archiveteam-bs [13:35] *** fredgido_ has joined #archiveteam-bs [13:35] *** deevious has quit IRC (Read error: Connection reset by peer) [13:35] *** Zerote has joined #archiveteam-bs [13:35] *** phillipsj has joined #archiveteam-bs [13:35] *** icedice has joined #archiveteam-bs [13:36] *** Lord_Nigh has quit IRC (Ping timeout: 252 seconds) [13:36] *** Flashfire has quit IRC (Read error: Connection reset by peer) [13:36] *** antomati_ has joined #archiveteam-bs [13:36] *** Ing3b0rg has quit IRC (Read error: Connection reset by peer) [13:36] *** dxrt has quit IRC (Read error: Connection reset by peer) [13:36] *** dxrt has joined #archiveteam-bs [13:36] *** pew has quit IRC (Ping timeout: 252 seconds) [13:37] *** Fusl__ sets mode: +o dxrt [13:37] *** Fusl_ sets mode: +o dxrt [13:37] *** benjinsmi has joined #archiveteam-bs [13:37] *** underscor has quit IRC (Read error: Connection reset by peer) [13:37] out of space and i dont find the proper time to upload them properly [13:37] *** benjins has quit IRC (Ping timeout: 252 seconds) [13:37] *** Nick-PC has quit IRC (Ping timeout: 252 seconds) [13:37] *** Damme has quit IRC (Ping timeout: 252 seconds) [13:37] *** TC01 has quit IRC (Read error: Connection reset by peer) [13:37] feel free to upload them if you want [13:37] *** fredgido has quit IRC (Ping timeout: 252 seconds) [13:37] *** Maylay_ has quit IRC (Ping timeout: 252 seconds) [13:38] Will do, thanks. [13:38] *** ats has quit IRC (Ping timeout: 252 seconds) [13:38] *** prq has quit IRC (Ping timeout: 252 seconds) [13:38] *** underscor has joined #archiveteam-bs [13:38] *** Zerote__ has quit IRC (Ping timeout: 252 seconds) [13:38] *** Fusl has quit IRC (Ping timeout: 252 seconds) [13:38] *** eientei95 has quit IRC (Read error: Connection reset by peer) [13:39] *** antomatic has quit IRC (Ping timeout: 252 seconds) [13:39] *** kiska has quit IRC (Read error: Connection reset by peer) [13:39] *** SketchCow has quit IRC (Ping timeout: 252 seconds) [13:39] *** Ing3b0rg has joined #archiveteam-bs [13:39] *** ats has joined #archiveteam-bs [13:39] *** SynMonger has quit IRC (Read error: Connection reset by peer) [13:40] *** SynMonger has joined #archiveteam-bs [13:40] *** SketchCow has joined #archiveteam-bs [13:40] *** Fusl__ sets mode: +o SketchCow [13:40] *** Fusl_ sets mode: +o SketchCow [13:41] *** TC01 has joined #archiveteam-bs [13:41] *** eientei95 has joined #archiveteam-bs [13:41] *** Fusl has joined #archiveteam-bs [13:41] *** Fusl__ sets mode: +o Fusl [13:41] *** Fusl_ sets mode: +o Fusl [13:41] *** Lord_Nigh has joined #archiveteam-bs [13:41] *** OrIdow6 has quit IRC (Ping timeout: 252 seconds) [13:41] *** prq has joined #archiveteam-bs [13:41] *** pew has joined #archiveteam-bs [13:42] *** OrIdow6 has joined #archiveteam-bs [13:42] *** Maylay has joined #archiveteam-bs [13:43] *** X-Scale has quit IRC (Quit: HydraIRC -> http://www.hydrairc.com <- Would you like to know more?) [13:44] *** X-Scale has joined #archiveteam-bs [13:45] *** deevious has joined #archiveteam-bs [13:45] *** Flashfire has joined #archiveteam-bs [14:06] *** icedice has quit IRC (Read error: Operation timed out) [14:16] *** katocala has quit IRC () [14:23] *** manjaro-u has joined #archiveteam-bs [14:41] *** deevious1 has joined #archiveteam-bs [14:42] *** deevious has quit IRC (Ping timeout: 252 seconds) [14:42] *** deevious1 is now known as deevious [14:50] *** Flashfire has quit IRC (Ping timeout: 252 seconds) [14:50] *** eientei95 has quit IRC (Ping timeout: 252 seconds) [14:51] *** ShellyRol has quit IRC (Read error: Operation timed out) [14:55] *** Damme_ has quit IRC (Read error: Connection reset by peer) [14:56] *** Damme_ has joined #archiveteam-bs [14:57] *** kiska has joined #archiveteam-bs [14:57] *** Fusl__ sets mode: +o kiska [14:57] *** Flashfire has joined #archiveteam-bs [14:57] *** Fusl sets mode: +o kiska [14:57] *** Fusl_ sets mode: +o kiska [15:00] *** eientei95 has joined #archiveteam-bs [15:04] *** ShellyRol has joined #archiveteam-bs [15:17] *** BlueMax has quit IRC (Read error: Connection reset by peer) [15:21] *** odemg has quit IRC (Remote host closed the connection) [15:45] *** ShellyRol has quit IRC (Ping timeout: 252 seconds) [15:46] *** ShellyRol has joined #archiveteam-bs [15:47] *** OrIdow6 has quit IRC (Ping timeout: 252 seconds) [15:48] *** Flashfire has quit IRC (Ping timeout: 252 seconds) [15:50] *** kiska has quit IRC (Ping timeout: 252 seconds) [15:50] *** OrIdow6 has joined #archiveteam-bs [15:52] *** kiska has joined #archiveteam-bs [15:52] *** Fusl__ sets mode: +o kiska [15:52] *** Fusl sets mode: +o kiska [15:52] *** Fusl_ sets mode: +o kiska [15:53] *** Flashfire has joined #archiveteam-bs [16:17] *** deevious has quit IRC (Remote host closed the connection) [16:17] *** kiska has quit IRC (Ping timeout: 252 seconds) [16:21] *** Flashfire has quit IRC (Ping timeout: 252 seconds) [16:24] *** kiska has joined #archiveteam-bs [16:24] *** Fusl__ sets mode: +o kiska [16:24] *** Fusl sets mode: +o kiska [16:24] *** Fusl_ sets mode: +o kiska [16:25] *** Flashfire has joined #archiveteam-bs [16:27] *** deevious has joined #archiveteam-bs [17:10] *** schbirid has joined #archiveteam-bs [17:10] *** nepeat has quit IRC (Read error: Connection reset by peer) [17:11] *** nepeat has joined #archiveteam-bs [17:13] SketchCow: game machine for issues 1981 to 1988 are here now: https://onitama.tv/gamemachine/archive.html [17:15] some one already put a archive of it here: https://archive.org/details/game-machine [17:35] *** manjaro-u has quit IRC (Ping timeout: 258 seconds) [17:40] godane: Thanks for the heads up, I'm going to split it up. [17:42] a part me thought you had a collection for it but turns out you don't [17:43] anyways it can now get a collection [17:43] would love to see Japanese OCR of them! :) [17:44] the guy that uploaded them said there ocr pdfs [17:46] hmm, the 1981 ones I was looking at from the original site were not. Maybe he did something [17:46] ok this weird [17:46] *** kiska18 has quit IRC (Remote host closed the connection) [17:46] *** Ryz has quit IRC (Remote host closed the connection) [17:46] *** Ryz has joined #archiveteam-bs [17:46] *** kiska18 has joined #archiveteam-bs [17:46] *** Fusl sets mode: +o kiska18 [17:46] *** Fusl__ sets mode: +o kiska18 [17:46] *** Fusl_ sets mode: +o kiska18 [17:46] in okular the text is drm [17:46] so can't be copied [17:48] good news [17:49] just doing a pdftk $file output output.pdf gets the ocr to be fixed [17:52] https://userbase.kde.org/Special:MyLanguage/Okular [17:52] > By default, Okular follows the PDF specification and don't allow copying text from DRM protected files. However, there is an option in the settings to disable DRM limitations in Settings -> General -> Program Features -> Obey DRM limitations. There was a small controversy in the Debian bug tracker a long time ago, about the default choice to Obey DRM limitations[4][5]. The choice was then explained by an Okular/KPdf developer[6]. [17:53] godane ^ [17:54] thanks for that [17:54] ++ [17:56] *** systwiALT has joined #archiveteam-bs [18:01] *** katocala has joined #archiveteam-bs [18:07] *** DogsRNice has joined #archiveteam-bs [18:08] *** Hani111 has joined #archiveteam-bs [18:11] *** raeyulca has quit IRC (Read error: Operation timed out) [18:11] Stiletto: We have Japanese OCR, as long as the language is set right [18:14] Hence godane's never ending japanese manual nightmare works [18:18] *** Hani has quit IRC (Ping timeout: 745 seconds) [18:18] *** Hani111 is now known as Hani [18:19] just know not all pdfs are japanese [18:20] the last 1000 ids was english manual pages for driving a car [18:23] i want to say the manuals for lexus cars cause of the logo on the dash of one of the manuals [18:23] 93010 id is where i see that lexus logo [18:25] i'm also getting lexus car manuals in japanese too [18:25] I have to use however you set the language metadata [18:25] ok [18:26] i'm just stating that 9xxxx ids maybe a mess with english and japanese manuals mix [18:31] i'm not bothered if these random english pdfs in my japanese manual grab get ocred wrong [18:32] it looks like a big pdf split into smaller pdfs for reason on there site [18:32] some look to be broken too [18:33] *** thelounge is now known as LowLevelM [18:48] Oh, I KNOW some are broken [18:51] what's the channel for #drawr? [18:52] #drawrnomore [19:14] *** manjaro-u has joined #archiveteam-bs [19:28] *** MaximeleG has joined #archiveteam-bs [19:59] https://archive.org/details/game_machine_magazine_jp [20:39] *** MaximeleG has quit IRC (Quit: MaximeleG) [20:47] *** Jens has quit IRC (Remote host closed the connection) [20:47] *** Jens has joined #archiveteam-bs [20:52] *** schbirid has quit IRC (Quit: Leaving) [21:24] SketchCow: I've been working on a crawl of Medium.com into WARCs, not using warrior etc but doing it myself. Can I upload it into the Inbox and get a collection please? [21:34] SketchCow: A couple years ago I watched your defcon17 presentation "that awesome time I was sued for two billion dollars" and really enjoyed it. I've just recently gotten more interested in archiving. fun to see you here. :) [22:13] *** klg has joined #archiveteam-bs [22:23] HCross: Nice. How did you do the discovery? [22:24] Parsed the site map [22:24] It’s all in there [22:28] D'oh, right, I was actually looking at that the other day. lol [22:31] prq: small potatoes by today's potato standards :) https://youtu.be/UN8bJb8biZU?t=512 [22:38] *** Nick-PC_ has quit IRC (Read error: Connection reset by peer) [22:44] *** HP_Archiv has joined #archiveteam-bs [22:57] HCross: Go for it [23:17] *** godane has quit IRC (Quit: Leaving.) [23:43] Hi, I have a request for one of the Ops - Urbandictionary.com This hasn't been archived yet [23:45] *** OrIdow6 has quit IRC (Quit: Leaving.) [23:52] how many pages is that? [23:53] markedL looks like 731 pages [23:53] Uh, yeah, no. [23:53] UD is huge. [23:53] *** BlueMax has joined #archiveteam-bs [23:53] But it's an invaluable cultural resource for linguistics, not kidding [23:53] Warrior for UD? [23:54] The 731 pages you see on the homepage are, idk, selected contributions or something? [23:54] https://archive.org/details/game_machine_magazine_jp wheeeeee [23:54] We may back it up [23:54] But not via warrior [23:55] I'm not sure JAA. I clicked 'last page' it brought me to #731 [23:55] I'm sure it's more than 731 pages. [23:55] Like, 90% of the Archive Team is the lyrics of "Daddy, Can I Turn This" [23:55] Like, orders of magnitude more. [23:56] Seconded [23:56] https://www.urbandictionary.com/define.php?term=Sex alone has 151 pages of definitions, and that's just one term. [23:56] Well just passing on a reminder then if you're not going to dedicate resources to it right this moment. But it should be backed up at some point [23:56] I just verified that 7,500 pages are just the "Z" pages [23:57] Oh, well then you were right JAA ^^ [23:59] There are 1015 sitemaps currently, each with 2k entries, so that's at least 2 million entries. [23:59] Yeah [23:59] But that might not include pagination, user profiles, tags, etc. [23:59] It can certainly be done if we want to though.