[00:36] *** Raccoon has joined #archiveteam-bs [00:51] Can ArchiveBot download from instagram? [00:52] Nope [00:52] Because Instagram banned all our pipelines and is really picky about rate limits. [00:53] Is that a good target to add to tikbot? [00:53] If your goal is to get banned from Instagram as well, sure. [00:53] I can’t just follow rate limit? [00:54] Well probably, if you figure out what the limit is. [00:54] 1 req / 10s? [00:54] I think the most I tried is 30 s delay between pagination requests in socialbot. [00:55] oh and it still bans? [00:55] Yup [00:55] And that's only the pagination, not even retrieving the post pages. [00:55] what if I use puppeteer to actually use the page? [00:55] However, if you get around the rate limit, AB works fine with Instagram. [00:56] Just needs to be done a bit carefully regarding how to queue it and ignores. [00:56] they probably notice downloading 24/7 [00:57] it’d be very expensive but..... YouTube???? [00:57] That's also not the problem. You quickly get banned trying to do the pagination even from residential connections. [00:57] like send a user to be archived [00:58] That's already a thing, #youtubearchive on hackint. [01:00] What about something like archivebot for the web? [01:00] using puppeteer* [01:00] so js pages can be saved [01:00] crocoite/chromebot? [01:00] Yeah is that a thing? [01:01] https://archiveteam.org/index.php?title=Chromebot [01:02] Interesting! [01:30] *** lennier2 has joined #archiveteam-bs [01:31] *** Arcorann has joined #archiveteam-bs [01:32] *** lennier1 has quit IRC (Read error: Operation timed out) [01:32] *** lennier2 is now known as lennier1 [01:32] *** Arcorann_ has joined #archiveteam-bs [01:40] *** Arcorann has quit IRC (Read error: Operation timed out) [01:45] *** anelki has joined #archiveteam-bs [02:17] *** qw3rty has joined #archiveteam-bs [02:20] *** robogoat_ has quit IRC (Ping timeout: 265 seconds) [02:20] *** robogoat has joined #archiveteam-bs [02:20] *** Tugboat has quit IRC (Ping timeout: 265 seconds) [02:20] *** qw3rty_ has quit IRC (Ping timeout: 265 seconds) [02:20] *** dxrt has quit IRC (Ping timeout: 265 seconds) [02:20] *** lunik1 has quit IRC (Ping timeout: 265 seconds) [02:21] *** _niklas has joined #archiveteam-bs [02:21] *** atphoenix has quit IRC (Ping timeout: 265 seconds) [02:22] *** dxrt has joined #archiveteam-bs [02:22] *** atphoenix has joined #archiveteam-bs [02:23] *** sknebel has joined #archiveteam-bs [02:23] *** Zebranky has quit IRC (Ping timeout: 265 seconds) [02:23] *** i0npulse has quit IRC (Ping timeout: 265 seconds) [02:24] *** i0npulse has joined #archiveteam-bs [02:24] *** OrIdow6 has quit IRC (Ping timeout: 265 seconds) [02:24] *** Zebranky has joined #archiveteam-bs [02:25] *** Fionera has joined #archiveteam-bs [02:26] *** yano_ has joined #archiveteam-bs [02:27] *** yano has quit IRC (Ping timeout: 265 seconds) [02:30] *** i0npulse has quit IRC (Ping timeout: 265 seconds) [02:33] *** i0npulse has joined #archiveteam-bs [02:43] is there any way to make httrack aggressive? this does not seem to work: [02:43] httrack --disable-security-limits --advanced-wait 0 --max-rate=99999999999999999999999999999999999999999999999 [02:56] *** purplebot has joined #archiveteam-bs [02:58] *** OrIdow6 has joined #archiveteam-bs [02:58] *** Tugboat has joined #archiveteam-bs [03:00] *** lunik1 has joined #archiveteam-bs [03:12] this seems to work: [03:12] httrack --disable-security-limits --max-rate=0 --sockets=99 --connection-per-second=0 [03:16] oh found this https://www.archiveteam.org/index.php?title=HTTrack_options [03:31] --- [03:31] how could i upload to IA from node? do i need to use the s3-ish API? [03:35] *** Arcorann has joined #archiveteam-bs [03:43] *** Arcorann_ has quit IRC (Ping timeout: 622 seconds) [03:56] JAA Are you an IA admin? Can you make a collection for me? [03:56] Oh I think I have to wait for 50 items to not be in a collection before making it [03:57] *** qw3rty_ has joined #archiveteam-bs [04:05] *** qw3rty has quit IRC (Read error: Operation timed out) [04:18] *** atphoenix has quit IRC (Read error: Connection reset by peer) [04:18] *** atphoenix has joined #archiveteam-bs [04:19] *** systwi has quit IRC (Read error: Connection reset by peer) [04:19] *** lennier1 has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** bsmith093 has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** Ryz has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** RichardG has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** Terbium has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** SketchCow has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** Larsenv has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** benjinsmi has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** Stiletto has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** sivoais_ has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** ephemer0l has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** tapedrive has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** Datechnom has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** kiska2 has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** legoktm has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** tonsofpcs has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** ndiddy has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** atomicthu has quit IRC (ircd.choopa.net irc.mzima.net) [04:19] *** underscor has quit IRC (Read error: Operation timed out) [04:19] *** asdf01011 has quit IRC (Ping timeout (120 seconds)) [04:20] *** britmob has quit IRC (Ping timeout (120 seconds)) [04:20] *** asdf01011 has joined #archiveteam-bs [04:20] *** britmob has joined #archiveteam-bs [04:20] *** systwi has joined #archiveteam-bs [04:21] *** katocala has quit IRC (Remote host closed the connection) [04:22] *** katocala has joined #archiveteam-bs [04:24] *** underscor has joined #archiveteam-bs [04:30] *** lennier1 has joined #archiveteam-bs [04:30] *** bsmith093 has joined #archiveteam-bs [04:30] *** Ryz has joined #archiveteam-bs [04:30] *** RichardG has joined #archiveteam-bs [04:30] *** Terbium has joined #archiveteam-bs [04:30] *** SketchCow has joined #archiveteam-bs [04:30] *** Larsenv has joined #archiveteam-bs [04:30] *** benjinsmi has joined #archiveteam-bs [04:30] *** Stiletto has joined #archiveteam-bs [04:30] *** sivoais_ has joined #archiveteam-bs [04:30] *** ephemer0l has joined #archiveteam-bs [04:30] *** tapedrive has joined #archiveteam-bs [04:30] *** Datechnom has joined #archiveteam-bs [04:30] *** kiska2 has joined #archiveteam-bs [04:30] *** legoktm has joined #archiveteam-bs [04:30] *** tonsofpcs has joined #archiveteam-bs [04:30] *** ndiddy has joined #archiveteam-bs [04:30] *** atomicthu has joined #archiveteam-bs [04:42] Yeah, they can be moved to their own collection after they've been uploaded [04:42] If IA approves this in the first place (if they haven't already) [04:44] I haven't heard of a client for IA S3 being written in node - unless one exists, or unless it's super easy to write, if I were you, I'd just go through the python library (or the CLI tool that uses it) [05:03] a long time ago, IA had a ftp server [05:08] documentation still is here https://archive.org/help/contrib-advanced.php [06:59] *** jshoard has joined #archiveteam-bs [07:03] *** jshoard has quit IRC (Client Quit) [07:06] *** jshoard has joined #archiveteam-bs [07:06] *** jshoard has quit IRC (Remote host closed the connection!) [07:08] *** Ryz has quit IRC (Read error: Operation timed out) [07:09] *** jshoard has joined #archiveteam-bs [07:10] *** Ryz has joined #archiveteam-bs [07:28] *** lunik1 has quit IRC (Ping timeout: 265 seconds) [07:29] *** lunik1 has joined #archiveteam-bs [10:32] *** BlueMaxim has quit IRC (Quit: Leaving) [14:19] *** lunik1 has quit IRC (Ping timeout: 265 seconds) [14:20] *** lunik1 has joined #archiveteam-bs [15:00] *** Arcorann_ has joined #archiveteam-bs [15:07] How could I archive a Discourse forum? [15:07] *** Arcorann has quit IRC (Read error: Operation timed out) [15:31] Wingy: I'm not an IA admin. You'll have to ask Jason or info@. Upload the stuff first, then give them a list of all items that should be in the collection. [15:32] Okay no problem :) [15:32] As for uploading, use the Python tool: https://archive.org/services/docs/api/internetarchive/ [15:32] Or yeah, implement the S3-like API if you hate yourself. :-) [15:41] Wingy: Discourse works fairly well just with wpull/grab-site/AB. If it's an older version, things might break in the WBM though unless the user disables. [15:41] disables JS* [15:41] I ended up doing this for uploading: https://ghostbin.co/paste/4ckyy [15:42] And the Discourse forum is shutting down but it's private and requires login to view [15:43] *** Arcorann_ has quit IRC (Leaving) [15:43] Ah, that makes it a bit trickier. Supplying cookies to wpull or grab-site should work though. [15:43] Do a few threads, check playback with pywb. [15:44] *** Arcorann has joined #archiveteam-bs [15:45] wait doesn't AB use grab-site? [15:45] Nope [15:46] grab-site is basically a local, single-machine version of AB though. [15:48] I never did get my AT wiki password reset [15:48] Maybe I should try email? Discord and IRC didn't work [15:52] arkiver, jrwr: ^ [15:52] *** Arcorann has quit IRC (Read error: Connection reset by peer) [15:59] Any way to archive Docker Hub in its entirety? With the 6-month deletion thing [16:00] Nope, it's essentially impossible. [16:00] You can't access the image history. [16:18] *** VADemon has joined #archiveteam-bs [16:55] what is the highest sane concurrency for grab-site? [16:55] *** semisimpl has joined #archiveteam-bs [16:55] depends on site?? [16:58] Yes [16:58] And wpull has a hard-coded connection limit of 6 per host, which also plays into it. [16:58] Not sure if ludios_wpull (the wpull fork grab-site uses) removed that limit. [17:07] *** semisimpl has quit IRC (Quit: semisimpl) [17:13] *** HP_Archiv has joined #archiveteam-bs [17:21] *** HP_Archiv has quit IRC (Quit: Leaving) [17:43] *** VADemon has quit IRC (left4dead) [19:27] Is wpull still maintained? [19:28] Hardly, just like much of our other tooling. [19:30] patch the software, create a pull request on github and do some lobbying on the irc channel to get commited [19:30] :) [19:36] Well, yeah. On wpull, the situation's still kind of the same as it has been for a long time now: there's my big PR with various bugfixes and some new features, and it's basically blocked by a required detailed (i.e. very time-consuming) performance analysis. [19:36] That's https://github.com/ArchiveTeam/wpull/pull/393 if anyone's wondering. [19:38] JAA: Do you know if this format would be okay? https://archive.org/details/tikbot.test2-markrober [19:40] Wingy: Let's take that to the TikTok channel. [19:40] oh oops wrong channel sorry [20:52] *** semisimpl has joined #archiveteam-bs [23:11] *** BlueMax has joined #archiveteam-bs [23:13] *** Arcorann has joined #archiveteam-bs [23:14] *** semisimpl has quit IRC (Quit: semisimpl) [23:22] *** jshoard has quit IRC (Quit: Leaving)