[00:05] *** dashcloud has quit IRC (Ping timeout: 245 seconds) [00:06] *** j08nY has quit IRC (Quit: Leaving) [00:07] *** dashcloud has joined #archiveteam-bs [00:50] *** dashcloud has quit IRC (Ping timeout: 245 seconds) [00:58] *** dashcloud has joined #archiveteam-bs [01:40] *** BlueMaxim has joined #archiveteam-bs [01:52] *** tfgbd_znc has joined #archiveteam-bs [02:09] *** REiN^ has quit IRC (Read error: Operation timed out) [02:10] *** REiN^ has joined #archiveteam-bs [02:44] *** ndiddy has quit IRC () [02:51] *** dashcloud has quit IRC (Read error: Operation timed out) [03:51] *** Stilett0 has joined #archiveteam-bs [03:51] *** Stilett0 is now known as Stiletto [03:52] *** phuzion has quit IRC (Ping timeout: 600 seconds) [03:53] *** phuzion has joined #archiveteam-bs [04:07] *** pizzaiolo has joined #archiveteam-bs [04:08] *** pizzaiolo has quit IRC (Client Quit) [04:15] SketchCow: just read your post on retromags [04:16] thats good that he just been resting [04:56] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:01] *** Sk1d has joined #archiveteam-bs [05:18] *** SHODAN_UI has joined #archiveteam-bs [06:19] *** kristian_ has joined #archiveteam-bs [06:27] archivebot ignores robots.txt, right? [06:34] ranma: yes [07:03] *** j08nY has joined #archiveteam-bs [07:06] *** SHODAN_UI has quit IRC (Remote host closed the connection) [07:07] *** ivan has quit IRC (Leaving) [07:11] *** ivan has joined #archiveteam-bs [07:48] *** j08nY has quit IRC (Read error: Operation timed out) [08:25] 06-09 03:17:50 <@xmc> it shouldn't go above/outside the directory named in the urls that you give it -- I actually observed the opposite. If you archive https://example.com/ and there's a 302 redirect on https://example.com/foo to https://otherpage.net/, it will recursively grab otherpage.net as well. [08:26] Or maybe 301, don't remember. [08:28] Checked the logs, it was a 303 See Other on job a98hr9u2potfhw6ikf0dnbsua. [08:31] 06-08 21:38:57 <@xmc> [!ao on Twitter] won't get more than the most recent posts, but it'll go much faster -- ArchiveBot won't grab the entire tweet history regardless of the options, even with phantomjs, in my experience. [08:48] *** kristian_ has quit IRC (Quit: Leaving) [08:59] *** SHODAN_UI has joined #archiveteam-bs [09:20] *** Jonison has joined #archiveteam-bs [09:50] *** Jonison has quit IRC (Quit: Leaving) [09:55] *** icedice has joined #archiveteam-bs [10:25] *** gui7 has joined #archiveteam-bs [10:28] *** j08nY has joined #archiveteam-bs [10:28] ok so, question. there's this website in my native language that is an incredible treasure trove of soccer match data? [10:29] I just need a bit of help getting started... regex is rusty lol [10:31] Link? [10:57] *** BlueMaxim has quit IRC (Read error: Operation timed out) [11:29] not that the average archivist would want to back this up, but https://www.reddit.com/r/DataHoarder/comments/6g4c3p/erosharecom_nsfw_shutting_down_june_30th/ [11:32] Yeah, EroShare, ImgBox, ImageBam, and SendVid are all shutting down end of June. [11:33] TIL sendvid. someone mentioned imgbox, imagebam when i mentioned that link [12:12] *** phuzion has quit IRC (Remote host closed the connection) [12:24] *** phuzion has joined #archiveteam-bs [12:42] *** pizzaiolo has joined #archiveteam-bs [13:26] don't see why there couldn't be a project for it [13:41] Well, sure. We'll need a list of URLs though. EroShare and SendVid seem to use 8-char base-36 IDs (2.8 * 10^12 combinations), ImgBox 8-char base-62 IDs (2.2 * 10^14), ImageBam 14/15-char base-16 IDs (1.2 * 10^18). ImageBam also has galleries with 32-char base-36 IDs it seems (6.3 * 10^49 !)... [13:46] By the way, ImgBox, ImageBam, and SendVid are indeed operated by the same entity, Flixya Entertainment, LLC. [13:58] They also ran VideoBam, ViRoll, and Snapixel previously. Apparently, shared.com was also theirs at some point, but it looks like they sold that. [14:01] ImageBam also has a second domain: imgbam.com [14:02] they're all shutting down at once? o.o [14:04] Yep. [14:04] *** ZexaronS has joined #archiveteam-bs [14:04] Not sure if there's a connection between EroShare and Flixya. [14:04] I guess it's possible Flixya also runs EroShare but doesn't want to be associated with it or something like that. (eroshare.com is registered through a whois proxy.) [14:05] But it might also just be a coincidence. [14:05] The other three are just Flixya still not having figured out how to run a profitable image hosting website. [14:05] image/video* [14:06] nobody has figured that out and that's why they all shut down eventually [14:06] :p [14:06] or become so ad-laden as to be unusable [14:07] :-P [14:07] I've noticed imgur is becoming more and more obnoxious with their redirecting of hotlinks [14:07] Yeah, same. [14:09] picyou.com and ucash.in were also Flixya's at some point, but now seem to belong to another party (like shared.com). [14:13] I found two more Flixya registrations: adhance.com, an advertising platform, and continue.com, a "traffic recapturing" service (think adf.ly). Both are now for sale. [14:16] Also sharedhq.com, which has this beautiful quote: "[Flixya etc. founder] Ivan [Wong] is a serial entrepreneur and veteran web producer. He has been featured on the “New York Times” and excels in online advertising, analytics and project management." Perhaps he misunderstood the verb "to excel" as "I can calculate some online advertising, analytics, and project management stuff in Microsoft Excel" [14:16] ? [14:55] 'serial entrepreneur' [14:55] is that now the new term for "trying to be like Yahoo"? [15:15] hopping from one overfunded project to the next and leaving a trail of destruction [15:15] :p [15:17] *** ZexaronS has quit IRC (Leaving) [15:28] *** odemg has joined #archiveteam-bs [15:28] *** Gilfoyle has joined #archiveteam-bs [17:23] *** Honno has joined #archiveteam-bs [17:24] *** ReimuHaku has joined #archiveteam-bs [17:25] *** SHODAN_UI has quit IRC (Remote host closed the connection) [17:41] SketchCow: so this guy did a ton of scans of New Computer Express: https://archive.org/details/@zzapmort [18:30] yeah, lists of URLs are the most important for these new sites shutting down [18:30] we can try to contact them [18:31] wut, imagebam shutting down? [18:31] huh, all of them? [18:31] that's big [18:32] Can we create a list of what is exactly shutting down and how it is all connected with each other? [18:32] ImgBox, ImageBam, SendVid are all Flixya Entertainment, LLC services [18:33] EroShare is the other service shutting down on 30 June. Not related to Flixya, at least publicly. [18:34] I hope there's some way to list them without the IDs in the URLs [18:35] list the content on them* [18:35] *** godane has quit IRC (Ping timeout: 245 seconds) [18:39] *** godane has joined #archiveteam-bs [19:12] *** SHODAN_UI has joined #archiveteam-bs [19:19] *** ItsYoda has quit IRC (Quit: rippppp to the yoda you used to know!) [19:22] godane: I've written them for permission to re-render them as readable. Good catch [19:25] *** ZexaronS has joined #archiveteam-bs [19:28] SketchCow: i only found it cause i was looking at retropdfs.wordpress.com [19:29] and the guy had tons of New Computer Express missing in his collection [19:29] so i started look for the magazine and found it on archive.org [19:30] SketchCow: he also as tons of Commodore inlays [19:31] *** schbirid has joined #archiveteam-bs [19:33] i did find it weird that he started using tiff for issues 131 and 135 [19:33] based on what can tell the rest are just jpgs in zips [19:34] *** gui7 has quit IRC (Read error: Operation timed out) [19:59] *** ItsYoda has joined #archiveteam-bs [21:04] is there a channel for eroshare-related stuff yet? [21:11] *** ndiddy has joined #archiveteam-bs [21:15] no, maybe it should be #nofap though [21:17] +1 for xmc's suggestion [21:18] I'll sit in it, if that's the route we'll go [21:18] not sure if we're actually going to grab any of it though, does the archive *want* the data? [21:43] haha [21:44] I'm sure people 200 years from now would love to be able to look back at our quaint porn. [21:45] yep [21:47] "oh hah they still used their bodies back then, unlike now with our VR quantum hyperfornication" [21:49] judging by item view counts, way more people want porn than anything else in the archive.org collections [21:50] pixiv is done? nice [21:50] should change the archiveteam's choice then [21:51] Not quiet, Nazca. We're going to grab tags and then do another pass for "R18" rooms [21:51] Which require an account [21:53] ETA for that? [21:54] chfoo: how can I export the out items from a warrior project? in this case it's for pixiv [21:54] yipdw might now too ^ [21:57] welp [21:57] my current project page broke [21:57] it's completely empty now [21:57] no matter what project I pick [21:57] I already restarted the VM [21:57] how weird [22:01] hard booting without using the web menu worked [22:17] arkiver: something like: redis-cli zrange pixiv:out 0 -1 > pixiv_out.txt [22:26] *** SHODAN_UI has quit IRC (Remote host closed the connection) [22:29] *** Honno has quit IRC (Read error: Operation timed out) [22:59] *** icedice has quit IRC (Quit: Leaving) [23:02] arkiver: I don't know of a function in the tracker, but if you can ssh into tracker.archiveteam.org, you can run redis-cli zrange pixiv:out 0 -1 [23:28] *** ZexaronS has quit IRC (Leaving)