#archiveteam-bs 2020-09-05,Sat

↑back Search

Time Nickname Message
00:36 πŸ”— Raccoon has joined #archiveteam-bs
00:51 πŸ”— Wingy Can ArchiveBot download from instagram?
00:52 πŸ”— JAA Nope
00:52 πŸ”— JAA Because Instagram banned all our pipelines and is really picky about rate limits.
00:53 πŸ”— Wingy Is that a good target to add to tikbot?
00:53 πŸ”— JAA If your goal is to get banned from Instagram as well, sure.
00:53 πŸ”— Wingy I can’t just follow rate limit?
00:54 πŸ”— JAA Well probably, if you figure out what the limit is.
00:54 πŸ”— Wingy 1 req / 10s?
00:54 πŸ”— JAA I think the most I tried is 30 s delay between pagination requests in socialbot.
00:55 πŸ”— Wingy oh and it still bans?
00:55 πŸ”— JAA Yup
00:55 πŸ”— JAA And that's only the pagination, not even retrieving the post pages.
00:55 πŸ”— Wingy what if I use puppeteer to actually use the page?
00:55 πŸ”— JAA However, if you get around the rate limit, AB works fine with Instagram.
00:56 πŸ”— JAA Just needs to be done a bit carefully regarding how to queue it and ignores.
00:56 πŸ”— Wingy they probably notice downloading 24/7
00:57 πŸ”— Wingy it’d be very expensive but..... YouTube????
00:57 πŸ”— JAA That's also not the problem. You quickly get banned trying to do the pagination even from residential connections.
00:57 πŸ”— Wingy like send a user to be archived
00:58 πŸ”— JAA That's already a thing, #youtubearchive on hackint.
01:00 πŸ”— Wingy What about something like archivebot for the web?
01:00 πŸ”— Wingy using puppeteer*
01:00 πŸ”— Wingy so js pages can be saved
01:00 πŸ”— JAA crocoite/chromebot?
01:00 πŸ”— Wingy Yeah is that a thing?
01:01 πŸ”— JAA https://archiveteam.org/index.php?title=Chromebot
01:02 πŸ”— Wingy Interesting!
01:30 πŸ”— lennier2 has joined #archiveteam-bs
01:31 πŸ”— Arcorann has joined #archiveteam-bs
01:32 πŸ”— lennier1 has quit IRC (Read error: Operation timed out)
01:32 πŸ”— lennier2 is now known as lennier1
01:32 πŸ”— Arcorann_ has joined #archiveteam-bs
01:40 πŸ”— Arcorann has quit IRC (Read error: Operation timed out)
01:45 πŸ”— anelki has joined #archiveteam-bs
02:17 πŸ”— qw3rty has joined #archiveteam-bs
02:20 πŸ”— robogoat_ has quit IRC (Ping timeout: 265 seconds)
02:20 πŸ”— robogoat has joined #archiveteam-bs
02:20 πŸ”— Tugboat has quit IRC (Ping timeout: 265 seconds)
02:20 πŸ”— qw3rty_ has quit IRC (Ping timeout: 265 seconds)
02:20 πŸ”— dxrt has quit IRC (Ping timeout: 265 seconds)
02:20 πŸ”— lunik1 has quit IRC (Ping timeout: 265 seconds)
02:21 πŸ”— _niklas has joined #archiveteam-bs
02:21 πŸ”— atphoenix has quit IRC (Ping timeout: 265 seconds)
02:22 πŸ”— dxrt has joined #archiveteam-bs
02:22 πŸ”— atphoenix has joined #archiveteam-bs
02:23 πŸ”— sknebel has joined #archiveteam-bs
02:23 πŸ”— Zebranky has quit IRC (Ping timeout: 265 seconds)
02:23 πŸ”— i0npulse has quit IRC (Ping timeout: 265 seconds)
02:24 πŸ”— i0npulse has joined #archiveteam-bs
02:24 πŸ”— OrIdow6 has quit IRC (Ping timeout: 265 seconds)
02:24 πŸ”— Zebranky has joined #archiveteam-bs
02:25 πŸ”— Fionera has joined #archiveteam-bs
02:26 πŸ”— yano_ has joined #archiveteam-bs
02:27 πŸ”— yano has quit IRC (Ping timeout: 265 seconds)
02:30 πŸ”— i0npulse has quit IRC (Ping timeout: 265 seconds)
02:33 πŸ”— i0npulse has joined #archiveteam-bs
02:43 πŸ”— Wingy is there any way to make httrack aggressive? this does not seem to work:
02:43 πŸ”— Wingy httrack --disable-security-limits --advanced-wait 0 --max-rate=99999999999999999999999999999999999999999999999
02:56 πŸ”— purplebot has joined #archiveteam-bs
02:58 πŸ”— OrIdow6 has joined #archiveteam-bs
02:58 πŸ”— Tugboat has joined #archiveteam-bs
03:00 πŸ”— lunik1 has joined #archiveteam-bs
03:12 πŸ”— Wingy this seems to work:
03:12 πŸ”— Wingy httrack --disable-security-limits --max-rate=0 --sockets=99 --connection-per-second=0
03:16 πŸ”— Wingy oh found this https://www.archiveteam.org/index.php?title=HTTrack_options
03:31 πŸ”— Wingy ---
03:31 πŸ”— Wingy how could i upload to IA from node? do i need to use the s3-ish API?
03:35 πŸ”— Arcorann has joined #archiveteam-bs
03:43 πŸ”— Arcorann_ has quit IRC (Ping timeout: 622 seconds)
03:56 πŸ”— Wingy JAA Are you an IA admin? Can you make a collection for me?
03:56 πŸ”— Wingy Oh I think I have to wait for 50 items to not be in a collection before making it
03:57 πŸ”— qw3rty_ has joined #archiveteam-bs
04:05 πŸ”— qw3rty has quit IRC (Read error: Operation timed out)
04:18 πŸ”— atphoenix has quit IRC (Read error: Connection reset by peer)
04:18 πŸ”— atphoenix has joined #archiveteam-bs
04:19 πŸ”— systwi has quit IRC (Read error: Connection reset by peer)
04:19 πŸ”— lennier1 has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— bsmith093 has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— Ryz has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— RichardG has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— Terbium has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— SketchCow has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— Larsenv has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— benjinsmi has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— Stiletto has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— sivoais_ has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— ephemer0l has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— tapedrive has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— Datechnom has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— kiska2 has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— legoktm has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— tonsofpcs has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— ndiddy has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— atomicthu has quit IRC (ircd.choopa.net irc.mzima.net)
04:19 πŸ”— underscor has quit IRC (Read error: Operation timed out)
04:19 πŸ”— asdf01011 has quit IRC (Ping timeout (120 seconds))
04:20 πŸ”— britmob has quit IRC (Ping timeout (120 seconds))
04:20 πŸ”— asdf01011 has joined #archiveteam-bs
04:20 πŸ”— britmob has joined #archiveteam-bs
04:20 πŸ”— systwi has joined #archiveteam-bs
04:21 πŸ”— katocala has quit IRC (Remote host closed the connection)
04:22 πŸ”— katocala has joined #archiveteam-bs
04:24 πŸ”— underscor has joined #archiveteam-bs
04:30 πŸ”— lennier1 has joined #archiveteam-bs
04:30 πŸ”— bsmith093 has joined #archiveteam-bs
04:30 πŸ”— Ryz has joined #archiveteam-bs
04:30 πŸ”— RichardG has joined #archiveteam-bs
04:30 πŸ”— Terbium has joined #archiveteam-bs
04:30 πŸ”— SketchCow has joined #archiveteam-bs
04:30 πŸ”— Larsenv has joined #archiveteam-bs
04:30 πŸ”— benjinsmi has joined #archiveteam-bs
04:30 πŸ”— Stiletto has joined #archiveteam-bs
04:30 πŸ”— sivoais_ has joined #archiveteam-bs
04:30 πŸ”— ephemer0l has joined #archiveteam-bs
04:30 πŸ”— tapedrive has joined #archiveteam-bs
04:30 πŸ”— Datechnom has joined #archiveteam-bs
04:30 πŸ”— kiska2 has joined #archiveteam-bs
04:30 πŸ”— legoktm has joined #archiveteam-bs
04:30 πŸ”— tonsofpcs has joined #archiveteam-bs
04:30 πŸ”— ndiddy has joined #archiveteam-bs
04:30 πŸ”— atomicthu has joined #archiveteam-bs
04:42 πŸ”— OrIdow6 Yeah, they can be moved to their own collection after they've been uploaded
04:42 πŸ”— OrIdow6 If IA approves this in the first place (if they haven't already)
04:44 πŸ”— OrIdow6 I haven't heard of a client for IA S3 being written in node - unless one exists, or unless it's super easy to write, if I were you, I'd just go through the python library (or the CLI tool that uses it)
05:03 πŸ”— nico_32 a long time ago, IA had a ftp server
05:08 πŸ”— nico_32 documentation still is here https://archive.org/help/contrib-advanced.php
06:59 πŸ”— jshoard has joined #archiveteam-bs
07:03 πŸ”— jshoard has quit IRC (Client Quit)
07:06 πŸ”— jshoard has joined #archiveteam-bs
07:06 πŸ”— jshoard has quit IRC (Remote host closed the connection!)
07:08 πŸ”— Ryz has quit IRC (Read error: Operation timed out)
07:09 πŸ”— jshoard has joined #archiveteam-bs
07:10 πŸ”— Ryz has joined #archiveteam-bs
07:28 πŸ”— lunik1 has quit IRC (Ping timeout: 265 seconds)
07:29 πŸ”— lunik1 has joined #archiveteam-bs
10:32 πŸ”— BlueMaxim has quit IRC (Quit: Leaving)
14:19 πŸ”— lunik1 has quit IRC (Ping timeout: 265 seconds)
14:20 πŸ”— lunik1 has joined #archiveteam-bs
15:00 πŸ”— Arcorann_ has joined #archiveteam-bs
15:07 πŸ”— Wingy How could I archive a Discourse forum?
15:07 πŸ”— Arcorann has quit IRC (Read error: Operation timed out)
15:31 πŸ”— JAA Wingy: I'm not an IA admin. You'll have to ask Jason or info@. Upload the stuff first, then give them a list of all items that should be in the collection.
15:32 πŸ”— Wingy Okay no problem :)
15:32 πŸ”— JAA As for uploading, use the Python tool: https://archive.org/services/docs/api/internetarchive/
15:32 πŸ”— JAA Or yeah, implement the S3-like API if you hate yourself. :-)
15:41 πŸ”— JAA Wingy: Discourse works fairly well just with wpull/grab-site/AB. If it's an older version, things might break in the WBM though unless the user disables.
15:41 πŸ”— JAA disables JS*
15:41 πŸ”— Wingy I ended up doing this for uploading: https://ghostbin.co/paste/4ckyy
15:42 πŸ”— Wingy And the Discourse forum is shutting down but it's private and requires login to view
15:43 πŸ”— Arcorann_ has quit IRC (Leaving)
15:43 πŸ”— JAA Ah, that makes it a bit trickier. Supplying cookies to wpull or grab-site should work though.
15:43 πŸ”— JAA Do a few threads, check playback with pywb.
15:44 πŸ”— Arcorann has joined #archiveteam-bs
15:45 πŸ”— Wingy wait doesn't AB use grab-site?
15:45 πŸ”— JAA Nope
15:46 πŸ”— JAA grab-site is basically a local, single-machine version of AB though.
15:48 πŸ”— Wingy I never did get my AT wiki password reset
15:48 πŸ”— Wingy Maybe I should try email? Discord and IRC didn't work
15:52 πŸ”— JAA arkiver, jrwr: ^
15:52 πŸ”— Arcorann has quit IRC (Read error: Connection reset by peer)
15:59 πŸ”— Wingy Any way to archive Docker Hub in its entirety? With the 6-month deletion thing
16:00 πŸ”— JAA Nope, it's essentially impossible.
16:00 πŸ”— JAA You can't access the image history.
16:18 πŸ”— VADemon has joined #archiveteam-bs
16:55 πŸ”— Wingy what is the highest sane concurrency for grab-site?
16:55 πŸ”— semisimpl has joined #archiveteam-bs
16:55 πŸ”— Wingy depends on site??
16:58 πŸ”— JAA Yes
16:58 πŸ”— JAA And wpull has a hard-coded connection limit of 6 per host, which also plays into it.
16:58 πŸ”— JAA Not sure if ludios_wpull (the wpull fork grab-site uses) removed that limit.
17:07 πŸ”— semisimpl has quit IRC (Quit: semisimpl)
17:13 πŸ”— HP_Archiv has joined #archiveteam-bs
17:21 πŸ”— HP_Archiv has quit IRC (Quit: Leaving)
17:43 πŸ”— VADemon has quit IRC (left4dead)
19:27 πŸ”— Frogging Is wpull still maintained?
19:28 πŸ”— JAA Hardly, just like much of our other tooling.
19:30 πŸ”— nico_32 patch the software, create a pull request on github and do some lobbying on the irc channel to get commited
19:30 πŸ”— nico_32 :)
19:36 πŸ”— JAA Well, yeah. On wpull, the situation's still kind of the same as it has been for a long time now: there's my big PR with various bugfixes and some new features, and it's basically blocked by a required detailed (i.e. very time-consuming) performance analysis.
19:36 πŸ”— JAA That's https://github.com/ArchiveTeam/wpull/pull/393 if anyone's wondering.
19:38 πŸ”— Wingy JAA: Do you know if this format would be okay? https://archive.org/details/tikbot.test2-markrober
19:40 πŸ”— JAA Wingy: Let's take that to the TikTok channel.
19:40 πŸ”— Wingy oh oops wrong channel sorry
20:52 πŸ”— semisimpl has joined #archiveteam-bs
23:11 πŸ”— BlueMax has joined #archiveteam-bs
23:13 πŸ”— Arcorann has joined #archiveteam-bs
23:14 πŸ”— semisimpl has quit IRC (Quit: semisimpl)
23:22 πŸ”— jshoard has quit IRC (Quit: Leaving)

irclogger-viewer