Time |
Nickname |
Message |
00:36
π
|
nicolas17 |
lennier1: http://data.nicolas17.xyz/reckful/ I got the reckful clips yesterday, finally got around to uploading |
00:36
π
|
nicolas17 |
(and setting up a more generic URL so I'm not tied to S3 :P) |
00:38
π
|
lennier1 |
Thanks! What are you using to download clips/VODs? |
00:42
π
|
nicolas17 |
I started archiving two friends' VODs like two years ago, my python script for that has been growing since |
00:43
π
|
nicolas17 |
for this archive there were also some adhoc bash horrors |
00:44
π
|
nicolas17 |
like for f in *.json; do jshon -e video_url -u < $f; done | aria2c -i - --auto-file-renaming=false -R -j5 |
00:51
π
|
JAA |
"Horrors"? You're using the right tools for the job. That's wonderful. |
00:52
π
|
JAA |
I'm guilty of processing WARCs, JSON, and HTML with grep & Co. |
00:54
π
|
nicolas17 |
renaming the clips :D for f in *.json; do video_id=$(jshon -e id -u < $f); video_url=$(jshon -e video_url -u < $f | sed 's/%7C/|/g'); filename=${video_url##*/}; ln -v "./$filename" "byid/$video_id.mp4"; done |
00:54
π
|
|
twigfoot has quit IRC (Operation timed out) |
00:55
π
|
JAA |
Still wonderful. |
00:59
π
|
lennier1 |
I don't know how much trouble it would be, but would be useful to have it on-line somewhere. I've definitely found myself spending too much time downloading videos one at a time. |
01:00
π
|
JAA |
Surely youtube-dl supports Twitch? |
01:01
π
|
|
tobbez has left |
01:07
π
|
lennier1 |
Cool, sounds like it probably does (based on the search I just did). Twitch Leecher is the popular program, and does do the job, but gets kind of tedious if you want to download several VODs. |
01:18
π
|
Arcorann |
What about chat? |
01:20
π
|
lennier1 |
The only program I know for that is Twitch Chat Downloader. https://github.com/PetterKraabol/Twitch-Chat-Downloader |
01:41
π
|
benjins |
JAA I noticed that Twitter search is, at least for me, now sending out guest tokens via a script tag again, instead of via a set-cookie header. sn-scrape might be broken as a result |
01:41
π
|
JAA |
benjins: Yep, thanks. |
01:53
π
|
nicolas17 |
JAA: youtube-dl supports Twitch and I'm taking advantage of it :) |
01:54
π
|
nicolas17 |
I use the Twitch API myself to get the list of VODs, but then I pass "https://www.twitch.tv/videos/640057509" to youtube-dl and get "https://d2nvs31859zcd8.cloudfront.net/b1b5218f5381aa7b04a1_reckful_38454871120_1472326520/chunked/index-muted-9C3480QJI9.m3u8" back |
01:55
π
|
nicolas17 |
reimplementing that part seems annoying, and wasteful if youtube-dl can already do it for me :) |
01:59
π
|
Ryz |
!archive https://www.vgcats.com/ --explain "VGCats.com revamped (in temporary form) on 2020 May: https://www.vgcats.com/index.php#MAY-12-2020" --concurrency 1 |
01:59
π
|
Ryz |
...Oops |
02:01
π
|
|
twigfoot has joined #archiveteam-bs |
02:48
π
|
|
ats has quit IRC (Ping timeout: 622 seconds) |
02:54
π
|
|
ats has joined #archiveteam-bs |
03:21
π
|
|
qw3rty_ has joined #archiveteam-bs |
03:29
π
|
|
qw3rty__ has quit IRC (Read error: Operation timed out) |
05:04
π
|
|
acridAxid has quit IRC (Quit: marauder) |
05:07
π
|
|
acridAxid has joined #archiveteam-bs |
05:14
π
|
|
mgrandi has joined #archiveteam-bs |
05:27
π
|
|
fuzzy802 has joined #archiveteam-bs |
05:32
π
|
|
fuzzy8021 has quit IRC (Read error: Operation timed out) |
05:37
π
|
|
fuzzy802 is now known as fuzzy8021 |
06:29
π
|
|
drcd has quit IRC (Ping timeout: 186 seconds) |
06:29
π
|
|
Deewiant has quit IRC (Ping timeout: 186 seconds) |
06:30
π
|
|
drcd has joined #archiveteam-bs |
06:30
π
|
|
Deewiant has joined #archiveteam-bs |
06:46
π
|
|
kiska has quit IRC (Remote host closed the connection) |
06:49
π
|
|
kiska has joined #archiveteam-bs |
07:21
π
|
|
Raccoon has quit IRC (Ping timeout: 272 seconds) |
07:32
π
|
|
Darkstar has quit IRC (Read error: Operation timed out) |
07:35
π
|
|
BlueMax has quit IRC (Quit: Leaving) |
07:37
π
|
|
Darkstar has joined #archiveteam-bs |
08:34
π
|
|
mgrandi has quit IRC (Leaving) |
08:39
π
|
|
schbirid has joined #archiveteam-bs |
08:51
π
|
|
puppefan has joined #archiveteam-bs |
08:55
π
|
|
puppefan has quit IRC (Ping timeout: 252 seconds) |
09:17
π
|
|
Gallifrey has joined #archiveteam-bs |
09:21
π
|
Gallifrey |
Hi, is this the right place to talk about manually* adding a large number (1000+) of pages to the Wayback Machine? |
09:22
π
|
Gallifrey |
(*Or semi-manually, using my script that uses the pastpages/savepagenow library on github) |
10:06
π
|
OrIdow6 |
Gallifrey: I'd say here |
10:29
π
|
Gallifrey |
Well, I suppose my first question is, at what point should I feel guilty about 'overloading' archive.org? 10k pages? 100k pages? |
10:49
π
|
|
Smiley has quit IRC (Ping timeout: 272 seconds) |
11:31
π
|
|
ats has quit IRC (se.hub irc.efnet.nl) |
11:31
π
|
|
colona_ has quit IRC (se.hub irc.efnet.nl) |
11:31
π
|
|
Ctrl has quit IRC (se.hub irc.efnet.nl) |
11:31
π
|
|
wessel152 has quit IRC (se.hub irc.efnet.nl) |
11:32
π
|
|
Arcorann has quit IRC (Read error: Connection reset by peer) |
11:35
π
|
|
ats has joined #archiveteam-bs |
11:35
π
|
|
colona_ has joined #archiveteam-bs |
11:35
π
|
|
Ctrl has joined #archiveteam-bs |
11:35
π
|
|
wessel152 has joined #archiveteam-bs |
11:38
π
|
OrIdow6 |
I don't think there's much of a good answer for that |
11:55
π
|
|
wp494 has quit IRC (LOUD UNNECESSARY QUIT MESSAGES) |
12:00
π
|
|
SmileyG has joined #archiveteam-bs |
12:02
π
|
|
Arcorann has joined #archiveteam-bs |
12:12
π
|
Gallifrey |
Second question is, are there any more 'official' channels for doing this work (e.g. ArchiveBot), rather than going rogue and doing it with a script? And if so, would it depend on the type of content I'm trying to archive? I don't know if several thousand handpicked Reddit URLs is a suitable use of ArchiveBot's time. |
12:36
π
|
|
FlyWalk has joined #archiveteam-bs |
12:36
π
|
|
FlyWalk has quit IRC (Client Quit) |
12:37
π
|
|
WalkFly has joined #archiveteam-bs |
12:42
π
|
|
Arcorann_ has joined #archiveteam-bs |
12:45
π
|
|
Arcorann has quit IRC (Ping timeout: 265 seconds) |
12:48
π
|
|
Datechnom has quit IRC (Read error: Connection reset by peer) |
12:49
π
|
|
Datechnom has joined #archiveteam-bs |
12:59
π
|
JAA |
Gallifrey: Are those threads on a particular subject or similar? If it's of general public interest, yes, we can run it through AB. Note though that it won't grab comment pagination and deeply nested comments. Also, it'll have to be old.reddit.com since the new design sucks. |
13:04
π
|
|
HP_Archiv has joined #archiveteam-bs |
13:17
π
|
Gallifrey |
This would be for soon-to-be-banned subreddits like /r/TumblrInAction, /r/WatchRedditDie + any others I can't think of. Already using old.reddit.com and grabbing the ?limit=500 link as well if there are >200 comments |
13:18
π
|
HP_Archiv |
!a https://www.youtube.com/watch?v=pIZrHCXIPkY |
13:18
π
|
HP_Archiv |
oops |
13:18
π
|
HP_Archiv |
wrong channel |
13:20
π
|
Gallifrey |
I also grabbed the www.reddit.com links for a sense of completion, plus the image or the link if there is one. But I ran into a problem when I archived a page every 10 seconds - the dreaded 429 error. |
13:22
π
|
Gallifrey |
To be clear, this was not the WB machine rate-limiting my requests. Rather it was WB being rate-limited by Reddit itself. |
13:23
π
|
|
HP_Archiv has quit IRC (Quit: Leaving) |
13:25
π
|
|
qw3rty_ has quit IRC (se.hub efnet.deic.eu) |
13:25
π
|
|
kiskaWee has quit IRC (se.hub efnet.deic.eu) |
13:25
π
|
|
Maylay has quit IRC (se.hub efnet.deic.eu) |
13:25
π
|
|
maxfan8 has quit IRC (se.hub efnet.deic.eu) |
13:25
π
|
|
Arcorann_ has quit IRC (Read error: Connection reset by peer) |
13:25
π
|
|
Arcorann_ has joined #archiveteam-bs |
13:25
π
|
|
ats has quit IRC (se.hub irc.efnet.nl) |
13:25
π
|
|
colona_ has quit IRC (se.hub irc.efnet.nl) |
13:25
π
|
|
Ctrl has quit IRC (se.hub irc.efnet.nl) |
13:25
π
|
|
wessel152 has quit IRC (se.hub irc.efnet.nl) |
13:26
π
|
|
fredgido has joined #archiveteam-bs |
13:30
π
|
|
ats has joined #archiveteam-bs |
13:30
π
|
|
colona_ has joined #archiveteam-bs |
13:30
π
|
|
Ctrl has joined #archiveteam-bs |
13:30
π
|
|
wessel152 has joined #archiveteam-bs |
13:31
π
|
|
fredgido_ has quit IRC (Read error: Operation timed out) |
13:39
π
|
VoynichCr |
is there a project/collection for all Linux distros isos? |
13:39
π
|
VoynichCr |
look any distro and you have a download page full of links to different versions, sizes, etc https://alpinelinux.org/downloads/ |
13:40
π
|
VoynichCr |
does it worth saving all them? |
13:42
π
|
VoynichCr |
the same for their websites |
13:45
π
|
|
qw3rty_ has joined #archiveteam-bs |
13:45
π
|
|
kiskaWee has joined #archiveteam-bs |
13:45
π
|
|
Maylay has joined #archiveteam-bs |
13:45
π
|
|
maxfan8 has joined #archiveteam-bs |
13:55
π
|
|
Arcorann_ has quit IRC (Read error: Connection reset by peer) |
13:55
π
|
|
Arcorann_ has joined #archiveteam-bs |
13:59
π
|
|
ats has quit IRC (se.hub irc.efnet.nl) |
13:59
π
|
|
colona_ has quit IRC (se.hub irc.efnet.nl) |
13:59
π
|
|
Ctrl has quit IRC (se.hub irc.efnet.nl) |
13:59
π
|
|
wessel152 has quit IRC (se.hub irc.efnet.nl) |
13:59
π
|
|
qw3rty_ has quit IRC (se.hub efnet.deic.eu) |
13:59
π
|
|
kiskaWee has quit IRC (se.hub efnet.deic.eu) |
13:59
π
|
|
Maylay has quit IRC (se.hub efnet.deic.eu) |
13:59
π
|
|
maxfan8 has quit IRC (se.hub efnet.deic.eu) |
14:00
π
|
|
Arcorann_ has quit IRC (Read error: Connection reset by peer) |
14:00
π
|
|
Arcorann_ has joined #archiveteam-bs |
14:02
π
|
|
ats has joined #archiveteam-bs |
14:02
π
|
|
colona_ has joined #archiveteam-bs |
14:02
π
|
|
Ctrl has joined #archiveteam-bs |
14:02
π
|
|
wessel152 has joined #archiveteam-bs |
14:02
π
|
|
qw3rty_ has joined #archiveteam-bs |
14:02
π
|
|
kiskaWee has joined #archiveteam-bs |
14:02
π
|
|
Maylay has joined #archiveteam-bs |
14:02
π
|
|
maxfan8 has joined #archiveteam-bs |
14:45
π
|
|
Arcorann_ has quit IRC (Read error: Connection reset by peer) |
15:01
π
|
|
HP_Archiv has joined #archiveteam-bs |
15:05
π
|
|
Ryz has quit IRC (Remote host closed the connection) |
15:05
π
|
|
kiska1825 has quit IRC (Remote host closed the connection) |
15:05
π
|
|
kiska1825 has joined #archiveteam-bs |
15:06
π
|
|
Ryz has joined #archiveteam-bs |
15:17
π
|
|
DogsRNice has joined #archiveteam-bs |
15:27
π
|
nicolas17 |
wtf |
15:28
π
|
nicolas17 |
apparently Twitch's clip API has a limit to how much it will return, even after you follow the pagination cursor stuff to the end |
15:28
π
|
nicolas17 |
someone on reddit suggested sending a starting timestamp |
15:29
π
|
nicolas17 |
and instead of 1019 clips all-time, I got 814 clips for the first day of the stream's history alone |
15:36
π
|
|
lunik13 has quit IRC (Quit: :x) |
15:39
π
|
|
lunik13 has joined #archiveteam-bs |
16:22
π
|
|
HP_Archiv has quit IRC (Quit: Leaving) |
17:02
π
|
|
schbirid has quit IRC (Quit: Leaving) |
17:08
π
|
|
igloo25 has joined #archiveteam-bs |
17:08
π
|
|
wp494 has joined #archiveteam-bs |
17:11
π
|
|
Kaz_ has joined #archiveteam-bs |
17:12
π
|
Kaz_ |
being nick squatted is fun, isn't it |
17:12
π
|
Kaz_ |
(will confirm identity outside of irc, pls don't ban) |
18:08
π
|
jodizzle |
Gallifrey: I think we've already had some people go through and archive various at-risk subreddits. |
18:08
π
|
jodizzle |
Though there are various pagination limits. |
18:10
π
|
jodizzle |
Ryz might know more. |
18:12
π
|
Ryz |
Gallifrey, I usually just archive subreddits via ArchiveBot; I'm assuming you're archiving via WBM itself? Are you logged in? If not, there's a 12 links to archive per minute (rather than the 15 they claimed to say) |
18:13
π
|
Ryz |
I grabbed both https://old.reddit.com/r/WatchRedditDie/ and https://old.reddit.com/r/TumblrInAction/ several days ago when they did a rumor announcement of banning those sub-Reddits with the updated rules |
18:14
π
|
jodizzle |
But there are depths limits basically, right? AB doesn't paginate the entire history of the subreddit? |
18:15
π
|
Ryz |
Unfortunately yes, there are pagination limits; even when browsing on the sub-Reddits normally; how far it goes back is a bit strange and random |
18:15
π
|
Ryz |
Some go back up to 6 months; others a month |
18:17
π
|
jodizzle |
Right. So I guess if Gallifrey has particular threads in mind that are old enough, we could still get those separately. |
18:18
π
|
nicolas17 |
okay, I don't know if this actually got *everything*, but |
18:18
π
|
nicolas17 |
135173 twitch clips |
18:35
π
|
|
Kaz_ is now known as Kaz |
18:35
π
|
Kaz |
ayy, we back :) |
18:50
π
|
|
qw3rty has joined #archiveteam-bs |
18:50
π
|
|
SmileyG has quit IRC (Remote host closed the connection) |
18:50
π
|
|
qw3rty_ has quit IRC (irc.efnet.nl efnet.deic.eu) |
18:50
π
|
|
kiskaWee has quit IRC (irc.efnet.nl efnet.deic.eu) |
18:50
π
|
|
Maylay has quit IRC (irc.efnet.nl efnet.deic.eu) |
18:50
π
|
|
maxfan8 has quit IRC (irc.efnet.nl efnet.deic.eu) |
18:50
π
|
|
Smiley has joined #archiveteam-bs |
18:54
π
|
|
kiskaWee has joined #archiveteam-bs |
18:54
π
|
|
Maylay has joined #archiveteam-bs |
18:54
π
|
|
maxfan8 has joined #archiveteam-bs |
19:19
π
|
|
xit has joined #archiveteam-bs |
19:19
π
|
|
mgrandi has joined #archiveteam-bs |
19:27
π
|
|
horkermon has joined #archiveteam-bs |
19:28
π
|
|
mgrytbak has joined #archiveteam-bs |
19:29
π
|
|
picklefac has joined #archiveteam-bs |
19:29
π
|
Kaz |
whee, here comes the irccloud crowd |
19:30
π
|
|
riking_ has joined #archiveteam-bs |
19:30
π
|
|
diggan has joined #archiveteam-bs |
19:33
π
|
|
DrasticAc has joined #archiveteam-bs |
19:34
π
|
|
pnJay has joined #archiveteam-bs |
19:36
π
|
|
justcool3 has joined #archiveteam-bs |
19:37
π
|
|
Vito` has joined #archiveteam-bs |
19:38
π
|
|
jesse-s has joined #archiveteam-bs |
19:40
π
|
|
abartov__ has joined #archiveteam-bs |
19:41
π
|
nicolas17 |
man wtf is up with this Twitch API |
19:42
π
|
|
jrwr has joined #archiveteam-bs |
19:42
π
|
nicolas17 |
I get clips from 2018-01-01 to 2018-07-01, then from 2018-07-01 to 2019-01-01... if I then ask for clips from 2018-01-01 to 2019-01-01 I get a few that weren't returned in either of the former two requests |
19:44
π
|
|
lenary has joined #archiveteam-bs |
19:44
π
|
|
amelia386 has joined #archiveteam-bs |
19:45
π
|
|
starlord has joined #archiveteam-bs |
19:45
π
|
|
revi has joined #archiveteam-bs |
19:45
π
|
mgrandi |
are you using their graphql api thing? |
19:47
π
|
nicolas17 |
https://api.twitch.tv/helix/clips |
19:47
π
|
|
lennier2 has joined #archiveteam-bs |
19:49
π
|
|
Stilett0 has joined #archiveteam-bs |
19:50
π
|
mgrandi |
the script i saw when people were freaking out over the dmca stuff was some graphql api |
19:51
π
|
nicolas17 |
well, deleting all clips is easier... it doesn't matter if you only get some arbitrary subset of 1000, once you delete them you can search again, until it's empty :P |
19:51
π
|
nicolas17 |
but I'll look into that... |
19:52
π
|
mgrandi |
oh no, its using the helix api |
19:52
π
|
mgrandi |
but it using some graphql api to get info about clips i guess |
19:52
π
|
mgrandi |
https://github.com/danefairbanks/TwitchClipManager/blob/master/Program.cs#L315 |
19:53
π
|
nicolas17 |
ah yes, to get the actual .mp4 URL I guess |
19:53
π
|
|
Stiletto has quit IRC (Ping timeout: 622 seconds) |
19:53
π
|
nicolas17 |
I'm delegating that part to youtube-dl ^^ |
19:53
π
|
|
lennier1 has quit IRC (Read error: Operation timed out) |
19:53
π
|
|
lennier2 is now known as lennier1 |
19:54
π
|
nicolas17 |
see GetClipsApi |
19:54
π
|
|
tchaypo_ has joined #archiveteam-bs |
19:55
π
|
mgrandi |
yeah, thats using helix |
19:55
π
|
mgrandi |
i've heard...not great things about helix |
19:55
π
|
mgrandi |
so maybe its just some 'eventually consistent' thing |
19:56
π
|
nicolas17 |
apparently if you keep using the pagination cursor to get next page, you get n per page (the &first= parameter says how many) but a total of 900-1000 |
19:56
π
|
nicolas17 |
I'll see if I can do something recursive instead of arbitrarily picking a small time period... |
19:57
π
|
nicolas17 |
if I get more than 500 clips, subdivide the time period and try again |
19:57
π
|
|
xit has quit IRC () |
19:57
π
|
|
Stiletto has joined #archiveteam-bs |
19:58
π
|
|
Stilett0 has quit IRC (Ping timeout: 260 seconds) |
19:59
π
|
mgrandi |
there is also TwitchLeecher that has the same functionality you are wanting, maybe look into how it does it |
19:59
π
|
mgrandi |
https://github.com/Franiac/TwitchLeecher/releases |
20:00
π
|
|
JSharp___ has joined #archiveteam-bs |
20:01
π
|
|
Kaz_ has joined #archiveteam-bs |
20:02
π
|
|
Kaz has quit IRC (Quit: leaving) |
20:02
π
|
|
Kaz_ is now known as Kaz |
20:06
π
|
|
hook54321 has joined #archiveteam-bs |
20:06
π
|
|
svchfoo1 sets mode: +o hook54321 |
20:15
π
|
|
SJon___ has joined #archiveteam-bs |
20:18
π
|
|
fallenoak has joined #archiveteam-bs |
20:18
π
|
|
HCross has joined #archiveteam-bs |
20:19
π
|
|
mattl has joined #archiveteam-bs |
20:21
π
|
|
ThisAsYou has joined #archiveteam-bs |
20:25
π
|
|
Ctrl-S___ has joined #archiveteam-bs |
20:26
π
|
|
c0mpass has joined #archiveteam-bs |
21:00
π
|
|
kyledrake has joined #archiveteam-bs |
21:05
π
|
|
Craigle has quit IRC (Quit: Ping timeout (120 seconds)) |
21:05
π
|
|
apache2_ has quit IRC (Remote host closed the connection) |
21:05
π
|
|
PotcFdk has quit IRC (Quit: ~'o'/) |
21:05
π
|
|
coderobe has quit IRC (Quit: Ping timeout (120 seconds)) |
21:06
π
|
|
DopefishJ has joined #archiveteam-bs |
21:06
π
|
|
apache2 has joined #archiveteam-bs |
21:06
π
|
|
Craigle has joined #archiveteam-bs |
21:06
π
|
|
coderobe has joined #archiveteam-bs |
21:06
π
|
|
mtntmnky_ has quit IRC (Remote host closed the connection) |
21:07
π
|
|
mtntmnky_ has joined #archiveteam-bs |
21:07
π
|
|
PotcFdk has joined #archiveteam-bs |
21:12
π
|
|
Yurume_ has joined #archiveteam-bs |
21:12
π
|
|
Yurume has quit IRC (Read error: Connection reset by peer) |
21:14
π
|
|
HP_Archiv has joined #archiveteam-bs |
21:15
π
|
|
DFJustin has quit IRC (Ping timeout: 745 seconds) |
21:15
π
|
|
HP_Archiv has quit IRC (Client Quit) |
21:16
π
|
|
HP_Archiv has joined #archiveteam-bs |
21:20
π
|
|
fredgido has quit IRC (Read error: Operation timed out) |
21:47
π
|
nicolas17 |
"pop datetime interval from queue, request clips from Twitch, save any metadata that we don't already have, if the request returned >500 clips, split interval into two and push the two back into the queue" |
21:47
π
|
nicolas17 |
my queue currently has 190 5-day intervals and it keeps finding new videos x_x |
21:49
π
|
|
HP_Archiv has quit IRC (Quit: Leaving) |
22:42
π
|
Ryz |
I recall someone was talking about archiving Twitch content earlier, is there? |
22:42
π
|
Ryz |
There's another person that died via suicide - https://www.twitch.tv/ohlana - https://www.dexerto.com/general/twitch-streamer-ohlana-passes-away-at-26-from-suicide-1389840 |
22:43
π
|
nicolas17 |
Ryz: apparently she has subscriber-only VODs |
22:43
π
|
Ryz |
Well, that's an oof s: |
22:43
π
|
nicolas17 |
https://www.reddit.com/r/LivestreamFail/comments/hnfkd5/twitch_streamer_ohlana_has_passed_away_clip_is/fxbqe4x/ |
22:43
π
|
Ryz |
I didn't check further since there doesn't seem to be an indicator if the content is subscriber only~ |
22:49
π
|
nicolas17 |
Ryz: apparently there was some kind of "suicide cluster" :/ |
22:51
π
|
nicolas17 |
Ryz: https://twitter.com/venomous_pyscho/status/1280932651193503747 |
22:57
π
|
Ryz |
Mm, ran through the stuff I could accordingly~ |
23:07
π
|
mgrandi |
if you want an existing thing to download the raw videos, you can use TwitchLeecher to just download the videos, won't get any of the other stuff like the web pages or anyhting |
23:10
π
|
|
mgrandi has quit IRC (Leaving) |
23:15
π
|
|
Arcorann_ has joined #archiveteam-bs |
23:53
π
|
Gallifrey |
Jodizzle - that's fantastic news! Do you know how many threads were archived per subreddit? |
23:54
π
|
JAA |
Gallifrey: The pagination lists (about) 1000 threads on each of the top, new, etc. lists. |
23:54
π
|
JAA |
So depending on how much overlap there is, anywhere between 1k and I think 10k? |
23:54
π
|
Gallifrey |
JAA - That was going to be my next question! |
23:55
π
|
JAA |
hot + new + rising + controversial + gilded + top hour + top day + top week + top month + top year + top all time, 1k each |
23:55
π
|
JAA |
So between 1k and 11k |
23:57
π
|
JAA |
There's no way to archive anything further than that except by bruteforcing all possible thread IDs (unless you know the URL, obviously). |
23:57
π
|
JAA |
We'll have a project for archiving all of Reddit soonβ’. The channel for that is #shreddit on hackint. |