| Time |
Nickname |
Message |
|
00:36
π
|
|
Raccoon has joined #archiveteam-bs |
|
00:51
π
|
Wingy |
Can ArchiveBot download from instagram? |
|
00:52
π
|
JAA |
Nope |
|
00:52
π
|
JAA |
Because Instagram banned all our pipelines and is really picky about rate limits. |
|
00:53
π
|
Wingy |
Is that a good target to add to tikbot? |
|
00:53
π
|
JAA |
If your goal is to get banned from Instagram as well, sure. |
|
00:53
π
|
Wingy |
I canβt just follow rate limit? |
|
00:54
π
|
JAA |
Well probably, if you figure out what the limit is. |
|
00:54
π
|
Wingy |
1 req / 10s? |
|
00:54
π
|
JAA |
I think the most I tried is 30 s delay between pagination requests in socialbot. |
|
00:55
π
|
Wingy |
oh and it still bans? |
|
00:55
π
|
JAA |
Yup |
|
00:55
π
|
JAA |
And that's only the pagination, not even retrieving the post pages. |
|
00:55
π
|
Wingy |
what if I use puppeteer to actually use the page? |
|
00:55
π
|
JAA |
However, if you get around the rate limit, AB works fine with Instagram. |
|
00:56
π
|
JAA |
Just needs to be done a bit carefully regarding how to queue it and ignores. |
|
00:56
π
|
Wingy |
they probably notice downloading 24/7 |
|
00:57
π
|
Wingy |
itβd be very expensive but..... YouTube???? |
|
00:57
π
|
JAA |
That's also not the problem. You quickly get banned trying to do the pagination even from residential connections. |
|
00:57
π
|
Wingy |
like send a user to be archived |
|
00:58
π
|
JAA |
That's already a thing, #youtubearchive on hackint. |
|
01:00
π
|
Wingy |
What about something like archivebot for the web? |
|
01:00
π
|
Wingy |
using puppeteer* |
|
01:00
π
|
Wingy |
so js pages can be saved |
|
01:00
π
|
JAA |
crocoite/chromebot? |
|
01:00
π
|
Wingy |
Yeah is that a thing? |
|
01:01
π
|
JAA |
https://archiveteam.org/index.php?title=Chromebot |
|
01:02
π
|
Wingy |
Interesting! |
|
01:30
π
|
|
lennier2 has joined #archiveteam-bs |
|
01:31
π
|
|
Arcorann has joined #archiveteam-bs |
|
01:32
π
|
|
lennier1 has quit IRC (Read error: Operation timed out) |
|
01:32
π
|
|
lennier2 is now known as lennier1 |
|
01:32
π
|
|
Arcorann_ has joined #archiveteam-bs |
|
01:40
π
|
|
Arcorann has quit IRC (Read error: Operation timed out) |
|
01:45
π
|
|
anelki has joined #archiveteam-bs |
|
02:17
π
|
|
qw3rty has joined #archiveteam-bs |
|
02:20
π
|
|
robogoat_ has quit IRC (Ping timeout: 265 seconds) |
|
02:20
π
|
|
robogoat has joined #archiveteam-bs |
|
02:20
π
|
|
Tugboat has quit IRC (Ping timeout: 265 seconds) |
|
02:20
π
|
|
qw3rty_ has quit IRC (Ping timeout: 265 seconds) |
|
02:20
π
|
|
dxrt has quit IRC (Ping timeout: 265 seconds) |
|
02:20
π
|
|
lunik1 has quit IRC (Ping timeout: 265 seconds) |
|
02:21
π
|
|
_niklas has joined #archiveteam-bs |
|
02:21
π
|
|
atphoenix has quit IRC (Ping timeout: 265 seconds) |
|
02:22
π
|
|
dxrt has joined #archiveteam-bs |
|
02:22
π
|
|
atphoenix has joined #archiveteam-bs |
|
02:23
π
|
|
sknebel has joined #archiveteam-bs |
|
02:23
π
|
|
Zebranky has quit IRC (Ping timeout: 265 seconds) |
|
02:23
π
|
|
i0npulse has quit IRC (Ping timeout: 265 seconds) |
|
02:24
π
|
|
i0npulse has joined #archiveteam-bs |
|
02:24
π
|
|
OrIdow6 has quit IRC (Ping timeout: 265 seconds) |
|
02:24
π
|
|
Zebranky has joined #archiveteam-bs |
|
02:25
π
|
|
Fionera has joined #archiveteam-bs |
|
02:26
π
|
|
yano_ has joined #archiveteam-bs |
|
02:27
π
|
|
yano has quit IRC (Ping timeout: 265 seconds) |
|
02:30
π
|
|
i0npulse has quit IRC (Ping timeout: 265 seconds) |
|
02:33
π
|
|
i0npulse has joined #archiveteam-bs |
|
02:43
π
|
Wingy |
is there any way to make httrack aggressive? this does not seem to work: |
|
02:43
π
|
Wingy |
httrack --disable-security-limits --advanced-wait 0 --max-rate=99999999999999999999999999999999999999999999999 |
|
02:56
π
|
|
purplebot has joined #archiveteam-bs |
|
02:58
π
|
|
OrIdow6 has joined #archiveteam-bs |
|
02:58
π
|
|
Tugboat has joined #archiveteam-bs |
|
03:00
π
|
|
lunik1 has joined #archiveteam-bs |
|
03:12
π
|
Wingy |
this seems to work: |
|
03:12
π
|
Wingy |
httrack --disable-security-limits --max-rate=0 --sockets=99 --connection-per-second=0 |
|
03:16
π
|
Wingy |
oh found this https://www.archiveteam.org/index.php?title=HTTrack_options |
|
03:31
π
|
Wingy |
--- |
|
03:31
π
|
Wingy |
how could i upload to IA from node? do i need to use the s3-ish API? |
|
03:35
π
|
|
Arcorann has joined #archiveteam-bs |
|
03:43
π
|
|
Arcorann_ has quit IRC (Ping timeout: 622 seconds) |
|
03:56
π
|
Wingy |
JAA Are you an IA admin? Can you make a collection for me? |
|
03:56
π
|
Wingy |
Oh I think I have to wait for 50 items to not be in a collection before making it |
|
03:57
π
|
|
qw3rty_ has joined #archiveteam-bs |
|
04:05
π
|
|
qw3rty has quit IRC (Read error: Operation timed out) |
|
04:18
π
|
|
atphoenix has quit IRC (Read error: Connection reset by peer) |
|
04:18
π
|
|
atphoenix has joined #archiveteam-bs |
|
04:19
π
|
|
systwi has quit IRC (Read error: Connection reset by peer) |
|
04:19
π
|
|
lennier1 has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
bsmith093 has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
Ryz has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
RichardG has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
Terbium has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
SketchCow has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
Larsenv has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
benjinsmi has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
Stiletto has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
sivoais_ has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
ephemer0l has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
tapedrive has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
Datechnom has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
kiska2 has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
legoktm has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
tonsofpcs has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
ndiddy has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
atomicthu has quit IRC (ircd.choopa.net irc.mzima.net) |
|
04:19
π
|
|
underscor has quit IRC (Read error: Operation timed out) |
|
04:19
π
|
|
asdf01011 has quit IRC (Ping timeout (120 seconds)) |
|
04:20
π
|
|
britmob has quit IRC (Ping timeout (120 seconds)) |
|
04:20
π
|
|
asdf01011 has joined #archiveteam-bs |
|
04:20
π
|
|
britmob has joined #archiveteam-bs |
|
04:20
π
|
|
systwi has joined #archiveteam-bs |
|
04:21
π
|
|
katocala has quit IRC (Remote host closed the connection) |
|
04:22
π
|
|
katocala has joined #archiveteam-bs |
|
04:24
π
|
|
underscor has joined #archiveteam-bs |
|
04:30
π
|
|
lennier1 has joined #archiveteam-bs |
|
04:30
π
|
|
bsmith093 has joined #archiveteam-bs |
|
04:30
π
|
|
Ryz has joined #archiveteam-bs |
|
04:30
π
|
|
RichardG has joined #archiveteam-bs |
|
04:30
π
|
|
Terbium has joined #archiveteam-bs |
|
04:30
π
|
|
SketchCow has joined #archiveteam-bs |
|
04:30
π
|
|
Larsenv has joined #archiveteam-bs |
|
04:30
π
|
|
benjinsmi has joined #archiveteam-bs |
|
04:30
π
|
|
Stiletto has joined #archiveteam-bs |
|
04:30
π
|
|
sivoais_ has joined #archiveteam-bs |
|
04:30
π
|
|
ephemer0l has joined #archiveteam-bs |
|
04:30
π
|
|
tapedrive has joined #archiveteam-bs |
|
04:30
π
|
|
Datechnom has joined #archiveteam-bs |
|
04:30
π
|
|
kiska2 has joined #archiveteam-bs |
|
04:30
π
|
|
legoktm has joined #archiveteam-bs |
|
04:30
π
|
|
tonsofpcs has joined #archiveteam-bs |
|
04:30
π
|
|
ndiddy has joined #archiveteam-bs |
|
04:30
π
|
|
atomicthu has joined #archiveteam-bs |
|
04:42
π
|
OrIdow6 |
Yeah, they can be moved to their own collection after they've been uploaded |
|
04:42
π
|
OrIdow6 |
If IA approves this in the first place (if they haven't already) |
|
04:44
π
|
OrIdow6 |
I haven't heard of a client for IA S3 being written in node - unless one exists, or unless it's super easy to write, if I were you, I'd just go through the python library (or the CLI tool that uses it) |
|
05:03
π
|
nico_32 |
a long time ago, IA had a ftp server |
|
05:08
π
|
nico_32 |
documentation still is here https://archive.org/help/contrib-advanced.php |
|
06:59
π
|
|
jshoard has joined #archiveteam-bs |
|
07:03
π
|
|
jshoard has quit IRC (Client Quit) |
|
07:06
π
|
|
jshoard has joined #archiveteam-bs |
|
07:06
π
|
|
jshoard has quit IRC (Remote host closed the connection!) |
|
07:08
π
|
|
Ryz has quit IRC (Read error: Operation timed out) |
|
07:09
π
|
|
jshoard has joined #archiveteam-bs |
|
07:10
π
|
|
Ryz has joined #archiveteam-bs |
|
07:28
π
|
|
lunik1 has quit IRC (Ping timeout: 265 seconds) |
|
07:29
π
|
|
lunik1 has joined #archiveteam-bs |
|
10:32
π
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
|
14:19
π
|
|
lunik1 has quit IRC (Ping timeout: 265 seconds) |
|
14:20
π
|
|
lunik1 has joined #archiveteam-bs |
|
15:00
π
|
|
Arcorann_ has joined #archiveteam-bs |
|
15:07
π
|
Wingy |
How could I archive a Discourse forum? |
|
15:07
π
|
|
Arcorann has quit IRC (Read error: Operation timed out) |
|
15:31
π
|
JAA |
Wingy: I'm not an IA admin. You'll have to ask Jason or info@. Upload the stuff first, then give them a list of all items that should be in the collection. |
|
15:32
π
|
Wingy |
Okay no problem :) |
|
15:32
π
|
JAA |
As for uploading, use the Python tool: https://archive.org/services/docs/api/internetarchive/ |
|
15:32
π
|
JAA |
Or yeah, implement the S3-like API if you hate yourself. :-) |
|
15:41
π
|
JAA |
Wingy: Discourse works fairly well just with wpull/grab-site/AB. If it's an older version, things might break in the WBM though unless the user disables. |
|
15:41
π
|
JAA |
disables JS* |
|
15:41
π
|
Wingy |
I ended up doing this for uploading: https://ghostbin.co/paste/4ckyy |
|
15:42
π
|
Wingy |
And the Discourse forum is shutting down but it's private and requires login to view |
|
15:43
π
|
|
Arcorann_ has quit IRC (Leaving) |
|
15:43
π
|
JAA |
Ah, that makes it a bit trickier. Supplying cookies to wpull or grab-site should work though. |
|
15:43
π
|
JAA |
Do a few threads, check playback with pywb. |
|
15:44
π
|
|
Arcorann has joined #archiveteam-bs |
|
15:45
π
|
Wingy |
wait doesn't AB use grab-site? |
|
15:45
π
|
JAA |
Nope |
|
15:46
π
|
JAA |
grab-site is basically a local, single-machine version of AB though. |
|
15:48
π
|
Wingy |
I never did get my AT wiki password reset |
|
15:48
π
|
Wingy |
Maybe I should try email? Discord and IRC didn't work |
|
15:52
π
|
JAA |
arkiver, jrwr: ^ |
|
15:52
π
|
|
Arcorann has quit IRC (Read error: Connection reset by peer) |
|
15:59
π
|
Wingy |
Any way to archive Docker Hub in its entirety? With the 6-month deletion thing |
|
16:00
π
|
JAA |
Nope, it's essentially impossible. |
|
16:00
π
|
JAA |
You can't access the image history. |
|
16:18
π
|
|
VADemon has joined #archiveteam-bs |
|
16:55
π
|
Wingy |
what is the highest sane concurrency for grab-site? |
|
16:55
π
|
|
semisimpl has joined #archiveteam-bs |
|
16:55
π
|
Wingy |
depends on site?? |
|
16:58
π
|
JAA |
Yes |
|
16:58
π
|
JAA |
And wpull has a hard-coded connection limit of 6 per host, which also plays into it. |
|
16:58
π
|
JAA |
Not sure if ludios_wpull (the wpull fork grab-site uses) removed that limit. |
|
17:07
π
|
|
semisimpl has quit IRC (Quit: semisimpl) |
|
17:13
π
|
|
HP_Archiv has joined #archiveteam-bs |
|
17:21
π
|
|
HP_Archiv has quit IRC (Quit: Leaving) |
|
17:43
π
|
|
VADemon has quit IRC (left4dead) |
|
19:27
π
|
Frogging |
Is wpull still maintained? |
|
19:28
π
|
JAA |
Hardly, just like much of our other tooling. |
|
19:30
π
|
nico_32 |
patch the software, create a pull request on github and do some lobbying on the irc channel to get commited |
|
19:30
π
|
nico_32 |
:) |
|
19:36
π
|
JAA |
Well, yeah. On wpull, the situation's still kind of the same as it has been for a long time now: there's my big PR with various bugfixes and some new features, and it's basically blocked by a required detailed (i.e. very time-consuming) performance analysis. |
|
19:36
π
|
JAA |
That's https://github.com/ArchiveTeam/wpull/pull/393 if anyone's wondering. |
|
19:38
π
|
Wingy |
JAA: Do you know if this format would be okay? https://archive.org/details/tikbot.test2-markrober |
|
19:40
π
|
JAA |
Wingy: Let's take that to the TikTok channel. |
|
19:40
π
|
Wingy |
oh oops wrong channel sorry |
|
20:52
π
|
|
semisimpl has joined #archiveteam-bs |
|
23:11
π
|
|
BlueMax has joined #archiveteam-bs |
|
23:13
π
|
|
Arcorann has joined #archiveteam-bs |
|
23:14
π
|
|
semisimpl has quit IRC (Quit: semisimpl) |
|
23:22
π
|
|
jshoard has quit IRC (Quit: Leaving) |