[00:01] *** gui7 has joined #archiveteam-bs [00:02] *** gui7 has quit IRC (Client Quit) [00:03] *** BlueMaxim has joined #archiveteam-bs [00:21] *** JRWR-Thin has joined #archiveteam-bs [00:27] *** BlueMaxim has quit IRC (Quit: Leaving) [00:31] *** Whopper_ has quit IRC (Ping timeout: 246 seconds) [00:31] *** Whopper has joined #archiveteam-bs [00:35] so far I'm impressed with the speed of the scaleway ARM64-2GB for a warrior [00:35] just wish it has more disk space [00:35] does run-pipeline handle low diskspace? [00:37] *** pizzaiolo has quit IRC (Remote host closed the connection) [01:07] Yo, if you're running SPUF via scripts please make sure to update them. It's a wall of Warriors over on the tracker right now... [01:07] (And no, the project is not done yet.) [01:10] *** ndiddy has quit IRC (Read error: Operation timed out) [01:20] *** j08nY has quit IRC (Quit: Leaving) [01:22] *** JRWR-Thin has quit IRC (Ping timeout: 268 seconds) [01:28] *** JRWR-iPAD has joined #archiveteam-bs [01:35] *** username1 has joined #archiveteam-bs [01:35] *** JRWR-iPAD has quit IRC (Ping timeout: 268 seconds) [01:38] *** schbirid2 has quit IRC (Read error: Operation timed out) [01:40] *** JRWR-iPAD has joined #archiveteam-bs [02:05] *** ZexaronS has quit IRC (Leaving) [02:17] JRWR: you can add additional disk space, no? [02:18] its a OVH Box [02:18] So I could spin up another box and move the DNS [02:30] But I can't afford it [03:11] JRWR: huh? you said Scaleway [03:11] not OVH :P [03:11] and Scaleway has block storage stuff iirc [03:11] Wait [03:11] What is the topic, I was talking about the Rsync Ingress I'm running [03:13] [02:35] so far I'm impressed with the speed of the scaleway ARM64-2GB for a warrior [03:13] [02:35] just wish it has more disk space [03:33] *** yuitimoth has quit IRC (Remote host closed the connection) [03:33] *** yuitimoth has joined #archiveteam-bs [03:49] *** r3c0d3x has joined #archiveteam-bs [04:15] arkiver xmc: Someone from pixiv is asking us to throttle our requests a bit [04:16] Since our grab is triggering their automatic anti-DDoS systems [04:17] I think this is day 3 of my yahoo answers thread running [04:17] Kaz, GLaDOS: See above ^^^ [04:18] Over in #savepixiv [04:44] *** BlueMaxim has joined #archiveteam-bs [04:59] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:59] *** dashcloud has quit IRC (Ping timeout: 492 seconds) [05:06] *** Sk1d has joined #archiveteam-bs [05:31] Hrm [05:31] We could use a tracker admin [05:31] need to put the brakes on pixiv [05:31] 40RPS has beerequested [05:32] ok, I'll pause mine for now [05:33] Ill stop mine as well [06:00] *** SHODAN_UI has joined #archiveteam-bs [06:36] *** bsmith093 has quit IRC (Read error: Operation timed out) [07:23] so i'm close to going past last month number of items [07:24] i have like another 100 items to upload to past it [07:32] *** SHODAN_UI has quit IRC (Remote host closed the connection) [07:34] *** tuluu has quit IRC (Ping timeout: 260 seconds) [07:52] *** bsmith093 has joined #archiveteam-bs [08:07] *** bsmith093 has quit IRC (Ping timeout: 245 seconds) [08:15] *** BlueMaxim has quit IRC (Read error: Operation timed out) [08:16] *** BlueMaxim has joined #archiveteam-bs [08:25] *** bsmith093 has joined #archiveteam-bs [08:28] http://www.npr.org/sections/alltechconsidered/2017/06/03/529155865/videotapes-are-becoming-unwatchable-as-archivists-work-to-save-them [08:39] *** JRWR-iPAD has quit IRC (Quit: Page closed) [08:45] *** jiphex has joined #archiveteam-bs [09:40] *** BlueMaxim has quit IRC (Quit: Leaving) [09:50] *** j08nY has joined #archiveteam-bs [10:02] *** ZexaronS has joined #archiveteam-bs [10:45] *** Stiletto has quit IRC (Read error: Operation timed out) [12:14] I did a scrape of literotica.com, let me know if there's any issues with it. [12:14] https://archive.org/details/literotica.com_2017-04 [14:16] *** pizzaiolo has joined #archiveteam-bs [14:21] This may be of interest to some of you: https://www.reddit.com/r/DataHoarder/comments/6fku0s/2873_nes_roms_the_eye/ [14:33] *** alfie has quit IRC (Ping timeout: 260 seconds) [14:44] *** alfie has joined #archiveteam-bs [15:17] *** SHODAN_UI has joined #archiveteam-bs [15:17] *** pizzaiolo has quit IRC (Read error: Connection reset by peer) [15:19] *** pizzaiolo has joined #archiveteam-bs [16:08] *** username1 is now known as spirit [16:08] JAA: probably better to use the standard sets [16:09] Aoede: thanks >:) [16:20] spirit: not like I have anything better to do with my time :P [16:20] well, in about 27 minutes I do [16:23] ? [16:28] says wget [16:29] oh :D [16:32] It's around 11GB uncompressed [16:42] *** pizzaiolo has quit IRC (Quit: pizzaiolo) [16:42] I was just pointed at this very, very interesting project: http://formats.kaitai.io/ -- cc SketchCow [16:43] essentially standardized format specifications for a crapload of (binary) formats [16:43] with parser and chart generation and whatnot [16:43] (though the crapload is not yet big enough :P) [16:45] Someone with an account on the File Formats wiki should add a link there: http://fileformats.archiveteam.org/wiki/Main_Page [16:47] Interesting. Shall we throw it into ArchiveBot just in case? [17:46] *** bmcginty has quit IRC (Ping timeout: 268 seconds) [17:47] *** icedice has joined #archiveteam-bs [17:52] *** bmcginty has joined #archiveteam-bs [17:59] *** Pudsey has joined #archiveteam-bs [18:04] *** Pudsey has quit IRC (Remote host closed the connection) [18:36] *** JRWR-iPAD has joined #archiveteam-bs [18:44] *** SHODAN_UI has quit IRC (Remote host closed the connection) [19:43] *** Honno has quit IRC (Quit: Leaving) [19:48] *** Aranje has joined #archiveteam-bs [20:02] *** kristian_ has joined #archiveteam-bs [20:03] I'm looking into archiving Steam Greenlight. Some of it looks pretty straightforward, other things will be really annoying. [20:06] what sort of stuff is there to archive? Comments and pages? [20:09] The list of games which were released through Greenlight, announcements plus comments to those, and two discussion forums. [20:09] As far as I can see [20:11] ok [20:11] The announcement comments are the most annoying thing I've seen so far as they're based entirely on JavaScripted POST requests. [20:19] Bleh [20:36] *** ZexaronS has quit IRC (Read error: Connection reset by peer) [20:46] *** ZexaronS has joined #archiveteam-bs [20:48] *** SHODAN_UI has joined #archiveteam-bs [21:58] *** spirit has quit IRC (Quit: Leaving) [21:59] *** dashcloud has joined #archiveteam-bs [22:07] Ok, so those POST requests aren't actually necessary strictly speaking since all information can also be accessed directly as a static HTML page. That doesn't mean I won't try archiving that "API" as well, of course. [22:08] Each game released through Greenlight obviously has its own page, including a discussion forum etc. I think I'll skip those for now. [22:09] do you have a list of games [22:09] or IDs [22:09] Not yet, but I guess I can make one, why? [22:09] well for archiving [22:10] Well yeah, but I don't see those games going anywhere anytime soon. It's just the whole Greenlight framework which will disappear, as I understand it. [22:10] It would certainly be a good idea to grab all of Steam at some point though. [22:10] ah [22:10] at some point yeah [22:11] any idea how big this is? [22:11] greenlight I mean [22:11] 14k games overall over the course of its lifetime [22:11] thats including all the crap ones [22:12] ah [22:12] good enough for archivebot I think [22:12] Not big. 16k games and 6k forum threads are the main part. [22:12] right yeah, so a few GB only [22:13] I don't think ArchiveBot will handle this very well. It's spread across various directories on the steamcommunity.com domain. [22:13] A specific wpull with the relevant --accept-regex rules will work better, I think. [22:14] yep [22:14] like a mini archivebocx [22:14] bot* [22:14] :P [22:14] :-) [22:14] JustAnotherArchivebot [22:18] Hmm, actually, it may be necessary to grab the games as well. Still not big though. [22:20] Just painful [22:34] *** SHODAN_UI has quit IRC (Remote host closed the connection) [22:41] *** ZexaronS- has joined #archiveteam-bs [22:43] *** ZexaronS has quit IRC (Read error: Operation timed out) [22:54] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [23:13] *** kristian_ has quit IRC (Quit: Leaving) [23:14] holy shit http://codeology.braintreepayments.com/archiveteam/archivebot [23:14] this is a sweet repo visualization [23:15] IS THAT FLYING SPAGHETTI MONSTER? [23:16] what I really like is that the objects used to represent the repo aren't random [23:16] Python code, for example, always shows up as those purple pyramidial-ish things [23:16] stuff recognized as Makefile directives show up as those spindly structures [23:16] Archive bot is FSM [23:16] Legit thats fucking GSM [23:16] FSM* [23:17] so you develop a visual (and, if this were to be e.g. 3D-printed, tactile/material) vocabulary for a repo's composition [23:18] here's seesaw: http://codeology.braintreepayments.com/archiveteam/seesaw-kit [23:19] my god, its ALIVE [23:19] http://codeology.braintreepayments.com/archiveteam/glowing-computing-machine [23:20] anything with a lot of spindly things sticking out of it immediately signals "the build system comprises a substantial portion of the code" [23:20] that's really cool [23:22] *** JRWR has quit IRC (Quit: Page closed) [23:23] *** JRWR has joined #archiveteam-bs [23:24] *** icedice has quit IRC (Quit: Leaving) [23:25] oh [23:25] oh my god [23:25] http://codeology.braintreepayments.com/featured/torvalds/linux [23:25] Linux is AMAZING [23:29] that's pretty cool [23:30] http://codeology.braintreepayments.com/featured/microsoft/typescript [23:30] looks like a bat [23:31] haha, yep [23:49] *** BlueMaxim has joined #archiveteam-bs [23:51] *** zenguy has quit IRC (Ping timeout: 370 seconds)