Time |
Nickname |
Message |
00:36
π
|
|
Raccoon has joined #archiveteam-bs |
00:51
π
|
Wingy |
Can ArchiveBot download from instagram? |
00:52
π
|
JAA |
Nope |
00:52
π
|
JAA |
Because Instagram banned all our pipelines and is really picky about rate limits. |
00:53
π
|
Wingy |
Is that a good target to add to tikbot? |
00:53
π
|
JAA |
If your goal is to get banned from Instagram as well, sure. |
00:53
π
|
Wingy |
I canβt just follow rate limit? |
00:54
π
|
JAA |
Well probably, if you figure out what the limit is. |
00:54
π
|
Wingy |
1 req / 10s? |
00:54
π
|
JAA |
I think the most I tried is 30 s delay between pagination requests in socialbot. |
00:55
π
|
Wingy |
oh and it still bans? |
00:55
π
|
JAA |
Yup |
00:55
π
|
JAA |
And that's only the pagination, not even retrieving the post pages. |
00:55
π
|
Wingy |
what if I use puppeteer to actually use the page? |
00:55
π
|
JAA |
However, if you get around the rate limit, AB works fine with Instagram. |
00:56
π
|
JAA |
Just needs to be done a bit carefully regarding how to queue it and ignores. |
00:56
π
|
Wingy |
they probably notice downloading 24/7 |
00:57
π
|
Wingy |
itβd be very expensive but..... YouTube???? |
00:57
π
|
JAA |
That's also not the problem. You quickly get banned trying to do the pagination even from residential connections. |
00:57
π
|
Wingy |
like send a user to be archived |
00:58
π
|
JAA |
That's already a thing, #youtubearchive on hackint. |
01:00
π
|
Wingy |
What about something like archivebot for the web? |
01:00
π
|
Wingy |
using puppeteer* |
01:00
π
|
Wingy |
so js pages can be saved |
01:00
π
|
JAA |
crocoite/chromebot? |
01:00
π
|
Wingy |
Yeah is that a thing? |
01:01
π
|
JAA |
https://archiveteam.org/index.php?title=Chromebot |
01:02
π
|
Wingy |
Interesting! |
01:30
π
|
|
lennier2 has joined #archiveteam-bs |
01:31
π
|
|
Arcorann has joined #archiveteam-bs |
01:32
π
|
|
lennier1 has quit IRC (Read error: Operation timed out) |
01:32
π
|
|
lennier2 is now known as lennier1 |
01:32
π
|
|
Arcorann_ has joined #archiveteam-bs |
01:40
π
|
|
Arcorann has quit IRC (Read error: Operation timed out) |
01:45
π
|
|
anelki has joined #archiveteam-bs |
02:17
π
|
|
qw3rty has joined #archiveteam-bs |
02:20
π
|
|
robogoat_ has quit IRC (Ping timeout: 265 seconds) |
02:20
π
|
|
robogoat has joined #archiveteam-bs |
02:20
π
|
|
Tugboat has quit IRC (Ping timeout: 265 seconds) |
02:20
π
|
|
qw3rty_ has quit IRC (Ping timeout: 265 seconds) |
02:20
π
|
|
dxrt has quit IRC (Ping timeout: 265 seconds) |
02:20
π
|
|
lunik1 has quit IRC (Ping timeout: 265 seconds) |
02:21
π
|
|
_niklas has joined #archiveteam-bs |
02:21
π
|
|
atphoenix has quit IRC (Ping timeout: 265 seconds) |
02:22
π
|
|
dxrt has joined #archiveteam-bs |
02:22
π
|
|
atphoenix has joined #archiveteam-bs |
02:23
π
|
|
sknebel has joined #archiveteam-bs |
02:23
π
|
|
Zebranky has quit IRC (Ping timeout: 265 seconds) |
02:23
π
|
|
i0npulse has quit IRC (Ping timeout: 265 seconds) |
02:24
π
|
|
i0npulse has joined #archiveteam-bs |
02:24
π
|
|
OrIdow6 has quit IRC (Ping timeout: 265 seconds) |
02:24
π
|
|
Zebranky has joined #archiveteam-bs |
02:25
π
|
|
Fionera has joined #archiveteam-bs |
02:26
π
|
|
yano_ has joined #archiveteam-bs |
02:27
π
|
|
yano has quit IRC (Ping timeout: 265 seconds) |
02:30
π
|
|
i0npulse has quit IRC (Ping timeout: 265 seconds) |
02:33
π
|
|
i0npulse has joined #archiveteam-bs |
02:43
π
|
Wingy |
is there any way to make httrack aggressive? this does not seem to work: |
02:43
π
|
Wingy |
httrack --disable-security-limits --advanced-wait 0 --max-rate=99999999999999999999999999999999999999999999999 |
02:56
π
|
|
purplebot has joined #archiveteam-bs |
02:58
π
|
|
OrIdow6 has joined #archiveteam-bs |
02:58
π
|
|
Tugboat has joined #archiveteam-bs |
03:00
π
|
|
lunik1 has joined #archiveteam-bs |
03:12
π
|
Wingy |
this seems to work: |
03:12
π
|
Wingy |
httrack --disable-security-limits --max-rate=0 --sockets=99 --connection-per-second=0 |
03:16
π
|
Wingy |
oh found this https://www.archiveteam.org/index.php?title=HTTrack_options |
03:31
π
|
Wingy |
--- |
03:31
π
|
Wingy |
how could i upload to IA from node? do i need to use the s3-ish API? |
03:35
π
|
|
Arcorann has joined #archiveteam-bs |
03:43
π
|
|
Arcorann_ has quit IRC (Ping timeout: 622 seconds) |
03:56
π
|
Wingy |
JAA Are you an IA admin? Can you make a collection for me? |
03:56
π
|
Wingy |
Oh I think I have to wait for 50 items to not be in a collection before making it |
03:57
π
|
|
qw3rty_ has joined #archiveteam-bs |
04:05
π
|
|
qw3rty has quit IRC (Read error: Operation timed out) |
04:18
π
|
|
atphoenix has quit IRC (Read error: Connection reset by peer) |
04:18
π
|
|
atphoenix has joined #archiveteam-bs |
04:19
π
|
|
systwi has quit IRC (Read error: Connection reset by peer) |
04:19
π
|
|
lennier1 has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
bsmith093 has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
Ryz has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
RichardG has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
Terbium has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
SketchCow has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
Larsenv has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
benjinsmi has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
Stiletto has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
sivoais_ has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
ephemer0l has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
tapedrive has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
Datechnom has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
kiska2 has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
legoktm has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
tonsofpcs has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
ndiddy has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
atomicthu has quit IRC (ircd.choopa.net irc.mzima.net) |
04:19
π
|
|
underscor has quit IRC (Read error: Operation timed out) |
04:19
π
|
|
asdf01011 has quit IRC (Ping timeout (120 seconds)) |
04:20
π
|
|
britmob has quit IRC (Ping timeout (120 seconds)) |
04:20
π
|
|
asdf01011 has joined #archiveteam-bs |
04:20
π
|
|
britmob has joined #archiveteam-bs |
04:20
π
|
|
systwi has joined #archiveteam-bs |
04:21
π
|
|
katocala has quit IRC (Remote host closed the connection) |
04:22
π
|
|
katocala has joined #archiveteam-bs |
04:24
π
|
|
underscor has joined #archiveteam-bs |
04:30
π
|
|
lennier1 has joined #archiveteam-bs |
04:30
π
|
|
bsmith093 has joined #archiveteam-bs |
04:30
π
|
|
Ryz has joined #archiveteam-bs |
04:30
π
|
|
RichardG has joined #archiveteam-bs |
04:30
π
|
|
Terbium has joined #archiveteam-bs |
04:30
π
|
|
SketchCow has joined #archiveteam-bs |
04:30
π
|
|
Larsenv has joined #archiveteam-bs |
04:30
π
|
|
benjinsmi has joined #archiveteam-bs |
04:30
π
|
|
Stiletto has joined #archiveteam-bs |
04:30
π
|
|
sivoais_ has joined #archiveteam-bs |
04:30
π
|
|
ephemer0l has joined #archiveteam-bs |
04:30
π
|
|
tapedrive has joined #archiveteam-bs |
04:30
π
|
|
Datechnom has joined #archiveteam-bs |
04:30
π
|
|
kiska2 has joined #archiveteam-bs |
04:30
π
|
|
legoktm has joined #archiveteam-bs |
04:30
π
|
|
tonsofpcs has joined #archiveteam-bs |
04:30
π
|
|
ndiddy has joined #archiveteam-bs |
04:30
π
|
|
atomicthu has joined #archiveteam-bs |
04:42
π
|
OrIdow6 |
Yeah, they can be moved to their own collection after they've been uploaded |
04:42
π
|
OrIdow6 |
If IA approves this in the first place (if they haven't already) |
04:44
π
|
OrIdow6 |
I haven't heard of a client for IA S3 being written in node - unless one exists, or unless it's super easy to write, if I were you, I'd just go through the python library (or the CLI tool that uses it) |
05:03
π
|
nico_32 |
a long time ago, IA had a ftp server |
05:08
π
|
nico_32 |
documentation still is here https://archive.org/help/contrib-advanced.php |
06:59
π
|
|
jshoard has joined #archiveteam-bs |
07:03
π
|
|
jshoard has quit IRC (Client Quit) |
07:06
π
|
|
jshoard has joined #archiveteam-bs |
07:06
π
|
|
jshoard has quit IRC (Remote host closed the connection!) |
07:08
π
|
|
Ryz has quit IRC (Read error: Operation timed out) |
07:09
π
|
|
jshoard has joined #archiveteam-bs |
07:10
π
|
|
Ryz has joined #archiveteam-bs |
07:28
π
|
|
lunik1 has quit IRC (Ping timeout: 265 seconds) |
07:29
π
|
|
lunik1 has joined #archiveteam-bs |
10:32
π
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
14:19
π
|
|
lunik1 has quit IRC (Ping timeout: 265 seconds) |
14:20
π
|
|
lunik1 has joined #archiveteam-bs |
15:00
π
|
|
Arcorann_ has joined #archiveteam-bs |
15:07
π
|
Wingy |
How could I archive a Discourse forum? |
15:07
π
|
|
Arcorann has quit IRC (Read error: Operation timed out) |
15:31
π
|
JAA |
Wingy: I'm not an IA admin. You'll have to ask Jason or info@. Upload the stuff first, then give them a list of all items that should be in the collection. |
15:32
π
|
Wingy |
Okay no problem :) |
15:32
π
|
JAA |
As for uploading, use the Python tool: https://archive.org/services/docs/api/internetarchive/ |
15:32
π
|
JAA |
Or yeah, implement the S3-like API if you hate yourself. :-) |
15:41
π
|
JAA |
Wingy: Discourse works fairly well just with wpull/grab-site/AB. If it's an older version, things might break in the WBM though unless the user disables. |
15:41
π
|
JAA |
disables JS* |
15:41
π
|
Wingy |
I ended up doing this for uploading: https://ghostbin.co/paste/4ckyy |
15:42
π
|
Wingy |
And the Discourse forum is shutting down but it's private and requires login to view |
15:43
π
|
|
Arcorann_ has quit IRC (Leaving) |
15:43
π
|
JAA |
Ah, that makes it a bit trickier. Supplying cookies to wpull or grab-site should work though. |
15:43
π
|
JAA |
Do a few threads, check playback with pywb. |
15:44
π
|
|
Arcorann has joined #archiveteam-bs |
15:45
π
|
Wingy |
wait doesn't AB use grab-site? |
15:45
π
|
JAA |
Nope |
15:46
π
|
JAA |
grab-site is basically a local, single-machine version of AB though. |
15:48
π
|
Wingy |
I never did get my AT wiki password reset |
15:48
π
|
Wingy |
Maybe I should try email? Discord and IRC didn't work |
15:52
π
|
JAA |
arkiver, jrwr: ^ |
15:52
π
|
|
Arcorann has quit IRC (Read error: Connection reset by peer) |
15:59
π
|
Wingy |
Any way to archive Docker Hub in its entirety? With the 6-month deletion thing |
16:00
π
|
JAA |
Nope, it's essentially impossible. |
16:00
π
|
JAA |
You can't access the image history. |
16:18
π
|
|
VADemon has joined #archiveteam-bs |
16:55
π
|
Wingy |
what is the highest sane concurrency for grab-site? |
16:55
π
|
|
semisimpl has joined #archiveteam-bs |
16:55
π
|
Wingy |
depends on site?? |
16:58
π
|
JAA |
Yes |
16:58
π
|
JAA |
And wpull has a hard-coded connection limit of 6 per host, which also plays into it. |
16:58
π
|
JAA |
Not sure if ludios_wpull (the wpull fork grab-site uses) removed that limit. |
17:07
π
|
|
semisimpl has quit IRC (Quit: semisimpl) |
17:13
π
|
|
HP_Archiv has joined #archiveteam-bs |
17:21
π
|
|
HP_Archiv has quit IRC (Quit: Leaving) |
17:43
π
|
|
VADemon has quit IRC (left4dead) |
19:27
π
|
Frogging |
Is wpull still maintained? |
19:28
π
|
JAA |
Hardly, just like much of our other tooling. |
19:30
π
|
nico_32 |
patch the software, create a pull request on github and do some lobbying on the irc channel to get commited |
19:30
π
|
nico_32 |
:) |
19:36
π
|
JAA |
Well, yeah. On wpull, the situation's still kind of the same as it has been for a long time now: there's my big PR with various bugfixes and some new features, and it's basically blocked by a required detailed (i.e. very time-consuming) performance analysis. |
19:36
π
|
JAA |
That's https://github.com/ArchiveTeam/wpull/pull/393 if anyone's wondering. |
19:38
π
|
Wingy |
JAA: Do you know if this format would be okay? https://archive.org/details/tikbot.test2-markrober |
19:40
π
|
JAA |
Wingy: Let's take that to the TikTok channel. |
19:40
π
|
Wingy |
oh oops wrong channel sorry |
20:52
π
|
|
semisimpl has joined #archiveteam-bs |
23:11
π
|
|
BlueMax has joined #archiveteam-bs |
23:13
π
|
|
Arcorann has joined #archiveteam-bs |
23:14
π
|
|
semisimpl has quit IRC (Quit: semisimpl) |
23:22
π
|
|
jshoard has quit IRC (Quit: Leaving) |