[00:07] *** IanR has quit IRC (Remote host closed the connection) [00:15] *** IanR has joined #archiveteam-bs [00:44] *** wp494 has quit IRC (Read error: Operation timed out) [00:45] *** wp494 has joined #archiveteam-bs [01:03] *** ryry has joined #archiveteam-bs [01:10] SketchCow: so i'm going thru American Cinematographer 1926 someone scanned and noticed that 4 pages missing from june 1926 issue [01:10] pages 4, 5, 8 and 9 [01:11] from this item: https://archive.org/details/amemato06asch [01:17] *** robbierut has quit IRC (Read error: Operation timed out) [01:31] *** adinbied has joined #archiveteam-bs [01:58] *** hendi_ has joined #archiveteam-bs [02:00] *** hendi has quit IRC (Read error: Operation timed out) [02:13] *** BlueMax has joined #archiveteam-bs [02:17] *** jdude104 has joined #archiveteam-bs [02:31] SketchCow: i'm uploading US News and World Report magazine [02:31] i scanned the 2 issues i found at savers [02:31] one from 1997-09-15 and 1998-01-26 [02:32] looks like you guys have close to none on that one magazine [02:35] I suspect we keep getting takedowns for that one. [02:35] So is spam darked or is it deleted? I have always been curious about that sketchcow [02:36] SketchCow: that sucks [02:36] there is like no online archive of there back issues on there website [02:36] Spam is darkedd [02:41] I just darked over 1,000 spam items today [02:42] What does spam usually consist of? I cant find much documentation [02:43] You don't find any documentation [02:43] But let's see.... [02:43] Taxi Businesses, Massage, Escorts, Malwarely go See The Full Movie, Best ____ in ___, etc. [02:43] Anything where the item is not the item [02:43] It's just an ad for something else [02:43] We have some edge cases, but that's life [02:44] Like, we have people who upload lifestyle blogging and then it's got these huge AND I DO AWESOME STUFF COME VISIT MY SITEEEEEEEEEEEEEEEEEEEEEE [02:44] And they're shitball. Depending on the spam person doing the work, they stay or don't [02:44] Does spam sometimes include web captures? Like save now? Cause I worry some of the weird stuff I find and use Savenow on [02:44] I tend to be nice. Others aren't. [02:44] No. [02:44] Unless you're just uploading horseshit into opensource [02:44] Alright so its often the same crap you find across the net uploaded everywhere [02:45] Not unless you count one or 2 obscure kids PC games as horseshit but I think that went into the software section [02:48] latest scan: https://archive.org/details/us-news-and-world-report-magazine-1997-09-15 [02:48] latest scan: https://archive.org/details/us-news-and-world-report-magazine-1998-01-26 [03:11] Want to share https://visidata.org/ because it is neat. [03:23] *** verifiedj has quit IRC (Ping timeout: 252 seconds) [03:25] *** Mateon1 has quit IRC (Remote host closed the connection) [03:27] *** VerifiedJ has joined #archiveteam-bs [03:30] *** Mateon1 has joined #archiveteam-bs [03:35] *** ndiddy has quit IRC (Ping timeout: 506 seconds) [03:46] *** Mateon1 has quit IRC (Remote host closed the connection) [03:46] *** Mateon1 has joined #archiveteam-bs [03:50] Is it possible to download specific sites from chromebot collections on IA? [03:53] *** Despatche has joined #archiveteam-bs [04:12] *** Stilett0 has joined #archiveteam-bs [04:14] *** Stiletto has quit IRC (Ping timeout: 506 seconds) [04:16] btw we can save magazines from nxtbooks with simple url patterns: http://transfer.nxtbook.com/nxtbooks/ac/ac0308/offline/ac_ac0308.pdf [04:17] i maybe able to find American Cinematographer issues going back to 2000 on nxtbook [04:31] *** qw3rty111 has joined #archiveteam-bs [04:34] *** qw3rty119 has quit IRC (Read error: Operation timed out) [04:37] *** odemgi has joined #archiveteam-bs [04:39] *** odemgi_ has quit IRC (Ping timeout: 252 seconds) [04:45] *** odemg has quit IRC (Ping timeout: 615 seconds) [04:52] *** odemg has joined #archiveteam-bs [05:00] Exairnous: I think chromebot's uploads would eventually end up on IA. Either it will upload WARCs or the WARCs will be combined with those of ArchiveBot to form MegaWARCs. If the site is in the files, then you should be able to find and download it. [05:01] Exairnous: The tricky part is to locate where the WARCs went. [05:01] And that depends on the specific site you're after. [05:06] *** ivan has quit IRC (Leaving) [05:07] *** ivan has joined #archiveteam-bs [05:24] t3: Yes. I found a chromebot warc that contains the sites I want. The problem is it contains a bunch of extraneous sites as well and I was hoping there was an alternative way to download just the content I care about. [05:24] *** dhyan_nat has joined #archiveteam-bs [05:25] Exairnous: Oh... I don't know if there are any alternative downloads. The only thing I can think of is to strip away the unnecessary bits from the WARC using some kind of WARC post-processing software. [05:28] Ok. Thanks t3 [05:29] *** Stilett0 has quit IRC (Ping timeout: 492 seconds) [05:30] Exairnous: Anytime. [05:33] *** Stiletto has joined #archiveteam-bs [05:34] WTF i maybe able to get all issues of American Cinematographer from at least 1980s on [05:34] maybe get all issues [05:35] fuck now i'm grabbing 1960-01 issue [05:36] so i will be able to get all missing years that IA doesn't have [05:37] Nice. [05:37] this was all cause i was making issue pdfs based on the current volume scans on IA [05:40] i may do a iphone image grab of some issues cause the pdfs take forever to render [05:40] t3: Any WARC post-processing software you'd recommend? [05:43] ok looks like all scans are on there [05:43] i can grab 1930-01 issue [05:44] and i hate the 1930-01 scan [05:44] looks like the last words on edge of page was cut off in part [05:46] ok earliest issue i can get is 1922-04 [05:48] Exairnous: I haven't tried any but there are some Python packages that might help. There is `warctools` and `warc`. I found some using `pip3 search warc`. [05:51] t3: Ok. Thanks [05:58] *** marked has quit IRC (Read error: Connection reset by peer) [05:58] *** robbierut has joined #archiveteam-bs [06:03] *** marked has joined #archiveteam-bs [06:14] *** SketchCo1 has joined #archiveteam-bs [06:14] *** Atom__ has joined #archiveteam-bs [06:16] *** IanQ has joined #archiveteam-bs [06:17] *** SketchCow has quit IRC (Read error: Connection reset by peer) [06:19] *** odemgi has quit IRC (Read error: Connection reset by peer) [06:19] *** IanR has quit IRC (Read error: Connection reset by peer) [06:19] *** Polylith_ has joined #archiveteam-bs [06:19] *** odemgi has joined #archiveteam-bs [06:19] *** VerifiedJ has quit IRC (Ping timeout: 252 seconds) [06:19] *** Atom-- has quit IRC (Ping timeout: 252 seconds) [06:19] *** jut has quit IRC (Ping timeout: 252 seconds) [06:19] *** yuitimoth has quit IRC (Ping timeout: 252 seconds) [06:19] *** deevious has quit IRC (Ping timeout: 252 seconds) [06:19] *** ColdIce has quit IRC (Ping timeout: 252 seconds) [06:20] *** yuitimoth has joined #archiveteam-bs [06:20] *** SmileyG has quit IRC (Ping timeout: 252 seconds) [06:20] *** Polylith has quit IRC (Ping timeout: 252 seconds) [06:20] *** Lord_Nigh has quit IRC (Ping timeout: 252 seconds) [06:21] *** Lord_Nigh has joined #archiveteam-bs [06:22] *** w0rmhole has quit IRC (Ping timeout: 252 seconds) [06:22] *** dashcloud has quit IRC (Ping timeout: 252 seconds) [06:25] *** jut has joined #archiveteam-bs [06:25] *** ranma has quit IRC (Ping timeout: 252 seconds) [06:25] *** Flashfire has quit IRC (Ping timeout: 252 seconds) [06:25] *** kiska has quit IRC (Ping timeout: 252 seconds) [06:25] *** Flashfire has joined #archiveteam-bs [06:25] *** w0rmhole has joined #archiveteam-bs [06:25] *** kiska has joined #archiveteam-bs [06:25] *** ranma has joined #archiveteam-bs [06:26] *** deevious has joined #archiveteam-bs [06:26] *** svchfoo1 sets mode: +o kiska [06:26] *** svchfoo3 sets mode: +o kiska [06:26] *** ColdIce has joined #archiveteam-bs [06:30] *** Smiley has joined #archiveteam-bs [06:35] *** jdude104 has quit IRC (Read error: Operation timed out) [06:46] *** Exairnous has quit IRC (Ping timeout: 265 seconds) [06:47] *** kiska sets mode: +o kiska1 [06:47] *** kiska sets mode: +o kiskabak [06:57] *** Exairnous has joined #archiveteam-bs [07:02] *** Exairnous has quit IRC (Read error: Operation timed out) [07:12] *** Exairnous has joined #archiveteam-bs [07:39] *** polar has joined #archiveteam-bs [08:13] *** Smiley has quit IRC (Remote host closed the connection) [08:13] *** Smiley has joined #archiveteam-bs [08:14] *** antomati_ has joined #archiveteam-bs [08:16] *** dashcloud has joined #archiveteam-bs [08:18] *** Exairnous has quit IRC (Read error: Operation timed out) [08:20] *** ivan has quit IRC (Leaving) [08:21] *** antomatic has quit IRC (Ping timeout: 615 seconds) [08:22] *** ivan has joined #archiveteam-bs [08:39] *** robbier97 has joined #archiveteam-bs [08:45] *** robbierut has quit IRC (Read error: Operation timed out) [08:52] *** icedice has joined #archiveteam-bs [08:54] *** robbier97 has quit IRC (Read error: Connection reset by peer) [08:55] *** robbierut has joined #archiveteam-bs [09:34] *** polar has quit IRC (Quit: Page closed) [09:39] *** polar has joined #archiveteam-bs [09:40] *** Hintswen has quit IRC (Ping timeout: 265 seconds) [09:41] *** fuzy802 has joined #archiveteam-bs [09:43] *** Hintswen has joined #archiveteam-bs [09:45] *** wp494 has quit IRC (Read error: Operation timed out) [09:46] *** wp494 has joined #archiveteam-bs [09:49] *** fuzzy8021 has quit IRC (Ping timeout: 615 seconds) [09:51] *** fuzy802 is now known as fuzzy8021 [10:03] *** VerifiedJ has joined #archiveteam-bs [10:07] *** fredgido has quit IRC (Read error: Connection reset by peer) [10:10] *** fredgido has joined #archiveteam-bs [10:15] *** fuzzy8021 has quit IRC (Read error: Operation timed out) [10:15] *** fuzzy8021 has joined #archiveteam-bs [10:23] *** BlueMax has quit IRC (Quit: Leaving) [10:33] *** icedice has quit IRC (Quit: Leaving) [11:50] *** netsound has quit IRC (Leaving) [11:53] *** polar has quit IRC (Quit: Page closed) [13:10] *** Pixi has quit IRC (Read error: Operation timed out) [13:11] *** Pixi has joined #archiveteam-bs [13:53] *** ryry has quit IRC (Ping timeout: 260 seconds) [14:38] *** Hintswen| has joined #archiveteam-bs [14:41] *** Despatche has quit IRC (Quit: Read error: Connection reset by deer) [14:43] *** Hintswen has quit IRC (Ping timeout: 604 seconds) [14:52] *** wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES) [15:07] *** wp494 has joined #archiveteam-bs [15:43] *** wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES) [17:01] *** wp494 has joined #archiveteam-bs [17:14] *** deevious has quit IRC (Quit: deevious) [17:49] *** Exairnous has joined #archiveteam-bs [18:03] *** Exairnous has quit IRC (Read error: Operation timed out) [18:09] *** SketchCo1 is now known as SketchCow [18:46] *** Odd0002_ has joined #archiveteam-bs [18:54] *** Odd0002 has quit IRC (Ping timeout: 615 seconds) [18:54] *** Odd0002_ is now known as Odd0002 [19:12] SketchCow: i'm going to be uploading channelawesome videos from vid.me that i have to FOS [19:12] its over 100gb [19:14] SketchCow: for me this is a way make sure you have a copy of it [19:15] and don't touch it without contacting me before uploading it [19:15] its going to take a very long time for me to upload this [19:16] even at best speeds we are talking maybe a week for it to be completely uploaded [19:42] *** killsushi has joined #archiveteam-bs [20:23] *** ReimuHaku has quit IRC (Read error: Operation timed out) [20:24] *** ReimuHaku has joined #archiveteam-bs [20:32] time for mongodb then? [20:35] *** Stilett0 has joined #archiveteam-bs [20:35] *** Atom-- has joined #archiveteam-bs [20:39] *** ndiddy has joined #archiveteam-bs [20:39] *** netsound has joined #archiveteam-bs [20:41] *** Stiletto has quit IRC (Read error: Operation timed out) [20:41] *** Atom__ has quit IRC (Read error: Operation timed out) [21:00] *** Hintswen| is now known as Hintswen [21:06] *** schbirid has quit IRC (Remote host closed the connection) [21:23] *** dhyan_nat has quit IRC (Read error: Operation timed out) [21:29] *** BlueMax has joined #archiveteam-bs [21:37] *** dhyan_nat has joined #archiveteam-bs [22:01] *** icedice has joined #archiveteam-bs [22:07] *** dhyan_nat has quit IRC (Read error: Operation timed out) [22:40] *** Ing3b0rg has joined #archiveteam-bs [22:52] *** Exairnous has joined #archiveteam-bs [22:55] *** t3 has quit IRC (Quit: Connection closed for inactivity) [23:00] *** Exairnous has quit IRC (Read error: Operation timed out) [23:32] *** t3 has joined #archiveteam-bs [23:48] Soo, I want to be nitpicky... [23:48] Yes, we did reach 1 PB, but not 1 PiB yet. [23:48] The tracker shows GiB while labelling it as GB. :-( [23:49] We're at 0.9939 PiB currently. [23:49] the decimal data notations are a bit daft [23:50] I mean, I don't really care which unit is used as long as it's used correctly. [23:51] I always thought PB = PiB was correct usage, ignoring ancient professors and confused millenials, and anyone with a business interest in selling inferior products/services [23:51] Personally, I always use the binary prefixes, but that's mostly because it's unambiguous. [23:52] I'd like a decimal prefix to clarify the other direction, in the rare case I'd want base 10 binary [23:53] Using "GB" to mean 2^30 bytes is wrong since giga- is defined since ages to mean 10^9. But writing "GB" is unfortunately ambiguous because so many people use it incorrectly. [23:53] When I write it, I always actually mean 10^9 bytes, but yeah, to avoid the ambiguity, I tend to use the other units instead. [23:53] I grew up when incorrect was considered correct [23:54] Yeah, "considered" but still factually wrong. :-) [23:54] and correct is breaking alot of ground truths [23:54] It was wrong from the very start when some stoned engineer thought calling 1024 bytes a "kilobyte" was fine. [23:54] Anyway, this doesn't belong in -bs anymore. [23:55] this is ot, oh, ni it isn't, oops [23:56] topic line has #archiveteam-ot in it, misleading :-)