Time |
Nickname |
Message |
00:02
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 246 seconds) |
00:04
🔗
|
|
ats_ has joined #archiveteam |
00:07
🔗
|
|
atlogbot has quit IRC (ny.us.hub irc.servercentral.net) |
00:07
🔗
|
|
yipdw has quit IRC (ny.us.hub irc.servercentral.net) |
00:07
🔗
|
|
WinterFox has quit IRC (ny.us.hub irc.servercentral.net) |
00:07
🔗
|
|
ravetcofx has quit IRC (ny.us.hub irc.servercentral.net) |
00:07
🔗
|
|
ats has quit IRC (ny.us.hub irc.servercentral.net) |
00:07
🔗
|
|
edsu has quit IRC (ny.us.hub irc.servercentral.net) |
00:07
🔗
|
|
robink has quit IRC (ny.us.hub irc.servercentral.net) |
00:07
🔗
|
|
swebb has quit IRC (ny.us.hub irc.servercentral.net) |
00:07
🔗
|
|
Laverne has quit IRC (ny.us.hub irc.servercentral.net) |
00:07
🔗
|
|
Flierp has quit IRC (ny.us.hub irc.servercentral.net) |
00:07
🔗
|
|
ZizzyDizz has quit IRC (ny.us.hub irc.servercentral.net) |
00:14
🔗
|
|
Start has joined #archiveteam |
00:21
🔗
|
|
PovAddict has joined #archiveteam |
00:21
🔗
|
|
WinterFox has joined #archiveteam |
00:21
🔗
|
|
ravetcofx has joined #archiveteam |
00:21
🔗
|
|
Laverne has joined #archiveteam |
00:21
🔗
|
|
chazchaz has joined #archiveteam |
00:21
🔗
|
|
dserodio has joined #archiveteam |
00:21
🔗
|
|
atlogbot has joined #archiveteam |
00:21
🔗
|
|
Cameron_D has joined #archiveteam |
00:21
🔗
|
|
MrRadar has joined #archiveteam |
00:21
🔗
|
|
Flierp has joined #archiveteam |
00:21
🔗
|
|
ZizzyDizz has joined #archiveteam |
00:22
🔗
|
|
swebb_ is now known as swebb |
00:25
🔗
|
|
brayden has joined #archiveteam |
00:53
🔗
|
|
tfgbd_znc has joined #archiveteam |
01:11
🔗
|
balrog |
https://codebender.cc is shutting down |
01:15
🔗
|
|
Somebody has joined #archiveteam |
01:17
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
01:20
🔗
|
|
Start has joined #archiveteam |
01:23
🔗
|
|
Stiletto has joined #archiveteam |
01:37
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
01:59
🔗
|
arkiver |
balrog: thanks for the reminder |
02:11
🔗
|
icedice |
Can someone help me out a bit with an ignore set for w3bin.com? |
02:11
🔗
|
|
nekomune has quit IRC (Ping timeout: 244 seconds) |
02:11
🔗
|
icedice |
I think it's best if all https://w3bin.com/domain/ links are skipped, otherwise ArchiveBot will probably archive the domain record for every domain name on the Internet |
02:13
🔗
|
|
nekomune has joined #archiveteam |
02:21
🔗
|
|
tfgbd_znc has quit IRC (Read error: Connection reset by peer) |
02:26
🔗
|
|
tfgbd_znc has joined #archiveteam |
02:40
🔗
|
icedice |
never mind, figured it out |
03:05
🔗
|
|
Somebody has quit IRC (Ping timeout: 370 seconds) |
03:13
🔗
|
icedice |
Is there any ignore set that can be applied to a running archivation job that restricts archivation to a specfic domain? |
03:18
🔗
|
|
jrwr has quit IRC (Leaving) |
03:21
🔗
|
|
Stiletto has quit IRC (Ping timeout: 246 seconds) |
03:43
🔗
|
|
Somebody has joined #archiveteam |
03:50
🔗
|
|
Stiletto has joined #archiveteam |
03:56
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
03:57
🔗
|
|
dashcloud has joined #archiveteam |
04:26
🔗
|
|
vitzli has joined #archiveteam |
04:29
🔗
|
|
BlueMaxim has joined #archiveteam |
04:37
🔗
|
Frogging |
Pebble app store torrent https://www.reddit.com/r/pebble/comments/5g0gmx/in_light_of_recent_news_i_archived_the_app_store/ |
04:47
🔗
|
PovAddict |
can't download |
04:47
🔗
|
PovAddict |
it complains about some weirdly-named file, even though I'm on Linux |
04:55
🔗
|
|
Yoshimura has quit IRC (Ping timeout: 255 seconds) |
04:59
🔗
|
Frogging |
make sure you're using the second link |
05:00
🔗
|
Frogging |
apparently the first one is broken |
05:01
🔗
|
PovAddict |
which first one? |
05:02
🔗
|
PovAddict |
when I clicked it already said "I've created a torrent UPDATED LINK" |
05:02
🔗
|
Frogging |
oh, hm. I just used the magnet link |
05:07
🔗
|
|
icedice has quit IRC (Quit: Leaving) |
05:08
🔗
|
PovAddict |
me too |
05:08
🔗
|
PovAddict |
and it failed |
05:09
🔗
|
Frogging |
that's odd |
05:24
🔗
|
|
no2penci1 is now known as no2pencil |
05:29
🔗
|
|
PovAddict has quit IRC (Quit: zzz) |
05:45
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
05:48
🔗
|
|
Aranje has joined #archiveteam |
05:52
🔗
|
|
Sk1d has joined #archiveteam |
05:54
🔗
|
|
Aranje has quit IRC (Read error: Connection timed out) |
05:54
🔗
|
|
Aranje has joined #archiveteam |
06:02
🔗
|
|
ravetcofx has quit IRC (Read error: Operation timed out) |
06:10
🔗
|
|
ndiddy has quit IRC (Read error: Connection reset by peer) |
06:14
🔗
|
|
Somebody has quit IRC (Ping timeout: 370 seconds) |
06:17
🔗
|
|
Froggypwn has quit IRC (Ping timeout: 244 seconds) |
06:18
🔗
|
|
ravetcofx has joined #archiveteam |
06:26
🔗
|
|
krazedkat has quit IRC (Ping timeout: 244 seconds) |
06:29
🔗
|
|
krazedkat has joined #archiveteam |
06:33
🔗
|
|
jsp12345 has quit IRC (Ping timeout: 492 seconds) |
06:41
🔗
|
|
krazedkat has quit IRC (Ping timeout: 244 seconds) |
06:42
🔗
|
|
krazedkat has joined #archiveteam |
06:55
🔗
|
|
Aranje has quit IRC (Quit: Three sheets to the wind) |
07:04
🔗
|
|
alembic has joined #archiveteam |
07:07
🔗
|
|
alembic has quit IRC (Client Quit) |
07:10
🔗
|
|
alembic has joined #archiveteam |
07:11
🔗
|
|
Somebody has joined #archiveteam |
07:11
🔗
|
|
alembic has quit IRC (Client Quit) |
07:11
🔗
|
|
alembic has joined #archiveteam |
07:12
🔗
|
|
krazedkat has quit IRC (Quit: Leaving) |
07:15
🔗
|
|
REiN^ has quit IRC (Max SendQ exceeded) |
07:15
🔗
|
|
REiN^ has joined #archiveteam |
07:31
🔗
|
|
maelstrom has quit IRC (Quit: Leaving) |
07:39
🔗
|
|
Stiletto has quit IRC (Read error: Connection reset by peer) |
07:40
🔗
|
|
Stiletto has joined #archiveteam |
07:47
🔗
|
|
db48x has joined #archiveteam |
08:23
🔗
|
|
Somebody has quit IRC (Ping timeout: 370 seconds) |
09:02
🔗
|
|
sHATNER_ is now known as sHATNER |
09:12
🔗
|
|
yipdw_ is now known as yipdw |
09:19
🔗
|
|
hawc145 is now known as HCross |
09:22
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
09:28
🔗
|
|
vitzli has joined #archiveteam |
09:34
🔗
|
|
HCross has quit IRC (Read error: Connection reset by peer) |
09:35
🔗
|
|
HCross has joined #archiveteam |
09:36
🔗
|
|
xx343 has quit IRC (Read error: Connection reset by peer) |
09:37
🔗
|
|
xx343 has joined #archiveteam |
09:43
🔗
|
|
Sanqui sets mode: -b *!*webchat@*.res.bhn.net |
09:43
🔗
|
|
ArchiveAL has joined #archiveteam |
09:43
🔗
|
ArchiveAL |
Ayy |
09:43
🔗
|
ArchiveAL |
According to the account creation page - "WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD" |
09:43
🔗
|
Sanqui |
note, #archiveteam-bs is for non priority conversation |
09:44
🔗
|
Sanqui |
ah, sure. 'yahoosucks' |
09:44
🔗
|
ArchiveAL |
Lol |
09:45
🔗
|
ArchiveAL |
literally just wanted to update the Desura artical to add that Desura has been purchased, although the site is still technically in danger. |
09:46
🔗
|
Sanqui |
no worries. cheers! |
09:48
🔗
|
ArchiveAL |
also has any part of desuras games library been saved, since i assume thats the only thing that is worth saving unless they have a forums |
09:49
🔗
|
yipdw |
Desura's a storefront; there's nothing there to get that won't run afoul of morals |
09:50
🔗
|
ArchiveAL |
i mean their freeware games |
09:50
🔗
|
ArchiveAL |
that have no other host |
09:50
🔗
|
ArchiveAL |
also @Sanqui why was i banned to begin with, was it literally just a range ban or did the previous owner of my (isp owned) router do some stupid stuff |
09:50
🔗
|
Sanqui |
the ban was for *!*webchat@*.res.bhn.net |
09:50
🔗
|
yipdw |
for good reasons |
09:50
🔗
|
Sanqui |
i grepped logs and haven't found it |
09:51
🔗
|
ArchiveAL |
so someone banned all brighthouse users? jeez. |
09:52
🔗
|
HCross |
tl:dr someone from your ISP was being a knob and kept rebooting their router to get around bans |
09:53
🔗
|
ArchiveAL |
. |
09:53
🔗
|
yipdw |
as far as the library goes, I don't know if anyone has a copy |
09:53
🔗
|
ArchiveAL |
Hm. |
09:53
🔗
|
Sanqui |
oh, my grep was insufficient |
09:53
🔗
|
ArchiveAL |
Oh btw why does the wiki homepage under Proposed projects say "Google Drive web hosting to be discontinued on August 31, 2016." i just checked my google drive to be sure, its still running |
09:53
🔗
|
Sanqui |
honestly i'm on the verge of vouching for that person because they've helped with nifty but i'm too kind when in power |
09:54
🔗
|
Sanqui |
google drive *web hosting* |
09:54
🔗
|
Sanqui |
there was a freehost service |
09:54
🔗
|
ArchiveAL |
Oh. |
09:54
🔗
|
ArchiveAL |
hm, rip |
09:55
🔗
|
ArchiveAL |
also Radio shack is still going aswell, new CEO as of Jan 2016 |
09:57
🔗
|
ArchiveAL |
the main page says they are closing* |
10:04
🔗
|
|
ArchiveAL has quit IRC (Quit: Page closed) |
10:37
🔗
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
10:38
🔗
|
|
BlueMaxim has joined #archiveteam |
10:39
🔗
|
|
WinterFox has quit IRC (Read error: Operation timed out) |
10:39
🔗
|
|
ravetcofx has quit IRC (Read error: Operation timed out) |
10:49
🔗
|
|
WinterFox has joined #archiveteam |
11:40
🔗
|
|
BlueMaxim has quit IRC (Ping timeout: 370 seconds) |
12:16
🔗
|
|
signius_ has joined #archiveteam |
12:54
🔗
|
|
VADemon has joined #archiveteam |
13:42
🔗
|
|
WinterFox has quit IRC (Read error: Operation timed out) |
14:27
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
15:28
🔗
|
|
tomcat has joined #archiveteam |
15:30
🔗
|
|
RichardG_ has joined #archiveteam |
15:34
🔗
|
|
RichardG has quit IRC (Ping timeout: 364 seconds) |
15:42
🔗
|
|
RichardG_ is now known as RichardG |
15:56
🔗
|
HCross |
https://www.thelayoff.com/t/KBEVoB1 only a rumor so far, but Solaris may be at threat due to Oracle layoffs |
15:59
🔗
|
|
fie has joined #archiveteam |
16:05
🔗
|
|
tomcat has quit IRC (Ping timeout: 194 seconds) |
16:07
🔗
|
Frogging |
ruh roh |
16:17
🔗
|
joepie91 |
HCross: it's Oracle. it's extremely likely that any horrible rumour in existence is true, and then some :) |
16:19
🔗
|
SketchCow |
I like that site. |
16:19
🔗
|
Frogging |
Orrible |
16:19
🔗
|
SketchCow |
It's like a nicer version of Fuckedcompany |
16:21
🔗
|
SketchCow |
I flooded the deriver and OCR queue! |
16:26
🔗
|
|
tomcat has joined #archiveteam |
16:34
🔗
|
godane |
same here |
16:34
🔗
|
godane |
everything is waiting to deriver |
16:35
🔗
|
godane |
ok its not that bad now |
16:35
🔗
|
godane |
only 14 waiting to be derive |
16:35
🔗
|
HCross |
joepie91, yea. Ill start DOWNLOADING ALL THE SOLARIS THINGS |
16:40
🔗
|
|
tomcat has quit IRC (Remote host closed the connection) |
16:42
🔗
|
Jon |
might have been mentioned already but DoomRL has been hit with a legal letter |
16:42
🔗
|
Jon |
https://doom.chaosforge.org/ |
16:43
🔗
|
SketchCow |
It's been archived a few times by us now. |
16:44
🔗
|
Jon |
cool, cool. ta. I notice it's not open source (apparently it's pascal too, for whatever that's worth) |
16:44
🔗
|
Jon |
I imagine it's going to go poof shortly |
16:44
🔗
|
|
RichardG has quit IRC (Ping timeout: 250 seconds) |
16:45
🔗
|
Jon |
ok back to iabak :) later all |
16:51
🔗
|
|
atomotic has joined #archiveteam |
16:59
🔗
|
|
RichardG has joined #archiveteam |
16:59
🔗
|
alembic |
Pebble is being acquired by Fitbit... https://techcrunch.com/2016/11/30/fitbit-pebble/ |
17:00
🔗
|
|
icedice has joined #archiveteam |
17:01
🔗
|
icedice |
What would a wildcard for sub-domains look like? |
17:02
🔗
|
icedice |
like if I want to exclude es.hostadvice.com, de.hostadvice.com, fr..hostadvice.com, it.hostadvice.com, and so on? |
17:02
🔗
|
|
nwf has joined #archiveteam |
17:02
🔗
|
icedice |
(All of the sub-domains just give 403s and are a waste of time) |
17:03
🔗
|
HCross |
*.foo.com |
17:07
🔗
|
|
nwf__ has quit IRC (Read error: Operation timed out) |
17:13
🔗
|
DFJustin |
see #archivebot |
17:14
🔗
|
|
icedice has quit IRC (Quit: Leaving) |
17:36
🔗
|
|
Somebody has joined #archiveteam |
17:41
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
18:09
🔗
|
|
owl has joined #archiveteam |
18:41
🔗
|
|
owl has quit IRC (Read error: Operation timed out) |
18:43
🔗
|
|
Somebody has quit IRC (Ping timeout: 370 seconds) |
19:05
🔗
|
|
cadbury_ has quit IRC (Ping timeout: 250 seconds) |
19:07
🔗
|
|
cadbury_ has joined #archiveteam |
19:07
🔗
|
|
icedice has joined #archiveteam |
19:12
🔗
|
|
owl has joined #archiveteam |
19:12
🔗
|
|
owl has quit IRC (Client Quit) |
19:29
🔗
|
|
nicolas17 has joined #archiveteam |
19:31
🔗
|
|
ravetcofx has joined #archiveteam |
19:34
🔗
|
|
drunksci has quit IRC (Remote host closed the connection) |
19:38
🔗
|
|
jsp12345 has joined #archiveteam |
19:42
🔗
|
|
VonGuard has joined #archiveteam |
19:53
🔗
|
|
icedice has quit IRC (Read error: Operation timed out) |
19:59
🔗
|
SketchCow |
Grab all |
20:26
🔗
|
|
drunksci has joined #archiveteam |
20:27
🔗
|
|
BlueMaxim has joined #archiveteam |
20:31
🔗
|
|
Coderjoe has joined #archiveteam |
20:32
🔗
|
|
jrwr has joined #archiveteam |
21:45
🔗
|
|
kcaj has quit IRC (Ping timeout: 506 seconds) |
21:56
🔗
|
|
drunksci has quit IRC () |
22:11
🔗
|
|
kcaj has joined #archiveteam |
23:09
🔗
|
|
WinterFox has joined #archiveteam |
23:09
🔗
|
|
kcaj has quit IRC (Ping timeout: 506 seconds) |
23:13
🔗
|
|
kcaj has joined #archiveteam |
23:17
🔗
|
|
ColdIce has joined #archiveteam |
23:18
🔗
|
ColdIce |
So, recieved news that a big site in my country is going to die at beginning of new year - now, how do I archive it? And how would I continue to update archive up until new year? Bandwidth not an issue, neither is storage. |
23:18
🔗
|
xmc |
what's the site url? |
23:18
🔗
|
ColdIce |
klara-klok.no |
23:18
🔗
|
ColdIce |
You wouldn't understand much, but must save! |
23:19
🔗
|
xmc |
what is it? a government health website? |
23:20
🔗
|
ColdIce |
Sponsoered by government, operated by people who are professionals (their line-of-work, trusted people) |
23:20
🔗
|
* |
xmc nods |
23:20
🔗
|
xmc |
how long has it been around? |
23:20
🔗
|
ColdIce |
And it's going to die, been there for 8 years now. So time to save it, before 1st of january |
23:20
🔗
|
xmc |
oh gosh, okay |
23:21
🔗
|
ColdIce |
whops, 16 years **** |
23:21
🔗
|
xmc |
mostly i ask these questions to help triage and figure out what of our tools is best to use for it |
23:21
🔗
|
xmc |
not questioning whether it deserves to live |
23:21
🔗
|
ColdIce |
I understand, it's a site where you submit question, and recieve professional answer back that can be trusted or an opinion with links to more help within my country |
23:21
🔗
|
xmc |
sounds like an archivebot job might be the right tool, does anyone here want to monitor such a thing? |
23:22
🔗
|
xmc |
hmmmmm |
23:22
🔗
|
xmc |
that sounds useful though maybe hard to archive in a useful form |
23:22
🔗
|
xmc |
(unless the questions are sorted into categories) |
23:22
🔗
|
ColdIce |
So questionably, can we index the questions? It's sorted into categories :) |
23:23
🔗
|
xmc |
do you know about how many questions there are? |
23:23
🔗
|
xmc |
do the questions have urls with numbers in them, or is it more complex |
23:23
🔗
|
ColdIce |
Seems that there was a reset on the site in 2008 sadly, so 8 years of data is lost for us sadly :( |
23:24
🔗
|
xmc |
ass |
23:24
🔗
|
xmc |
if you can generate a list of urls of all the questions you want to snag, that'll make it a lot easier and more reliable |
23:24
🔗
|
xmc |
anyway, i have to get back to work |
23:24
🔗
|
ColdIce |
Yes, all the question has a number and within each question it's assigned categories that relate to the question |
23:24
🔗
|
xmc |
should probably do *something* useful today |
23:24
🔗
|
xmc |
oh that sounds delightful |
23:25
🔗
|
ColdIce |
Yep, should be easy to index, but I don't know the correct tool |
23:26
🔗
|
ColdIce |
which for I seek for help |
23:26
🔗
|
xmc |
what's the lowest and highest question number you can find? |
23:26
🔗
|
xmc |
and why don't you give some question urls here |
23:27
🔗
|
ae_g_i_s |
xmc: i'm currently trying to find the easiest way to do that |
23:27
🔗
|
xmc |
ah cool |
23:27
🔗
|
ae_g_i_s |
they have rss feeds, but their IDs are jumping around |
23:27
🔗
|
xmc |
ae_g_i_s: would you like to take point on this? |
23:27
🔗
|
ae_g_i_s |
http://www.klara-klok.no/spoersmaal/644543 |
23:27
🔗
|
ae_g_i_s |
xmc: i've never uploaded anything and don't have my boxes set up yet, so i can only help this time, sry :/ |
23:27
🔗
|
xmc |
oh, no worries |
23:28
🔗
|
ae_g_i_s |
wonder if they rate limit |
23:28
🔗
|
xmc |
if you can make it suitable for archivebot then it'll be easy peasy |
23:28
🔗
|
ae_g_i_s |
let's waste some IP reputation |
23:28
🔗
|
xmc |
is that a recent question? because less than a million is good |
23:28
🔗
|
xmc |
i was expecting like maybe ten million |
23:28
🔗
|
xmc |
which is an awkward number |
23:29
🔗
|
ColdIce |
recent question would be id 644740 |
23:29
🔗
|
ae_g_i_s |
yeah, they're all of that general order |
23:30
🔗
|
xmc |
if they don't ip-ban: |
23:31
🔗
|
xmc |
i would suggest making a list of urls with all those numbers, from 1 to 645,000, and submitting it to archivebot with "!ao <" |
23:31
🔗
|
ae_g_i_s |
looks like they don't |
23:31
🔗
|
ae_g_i_s |
i'm currently on RSS feed ~150 (starting at 0) with no slowdown |
23:31
🔗
|
xmc |
and then add a second job to archivebot with !a http://www.klara-klok.no and add an ignore for /spoersmaal/ |
23:32
🔗
|
xmc |
and then keep an eye on those two jobs to make sure they don't go off the rails |
23:32
🔗
|
ColdIce |
First post ever that is available 70367 - but then again, next on is 471748 which dated the same date - wierd |
23:32
🔗
|
xmc |
hmmmmmmmmm |
23:32
🔗
|
xmc |
so maybe 470,000 to 645,000 ? |
23:32
🔗
|
ColdIce |
but the date and categories are available on the page that would be archive for us to date it |
23:33
🔗
|
* |
xmc away |
23:34
🔗
|
|
ndiddy has joined #archiveteam |
23:34
🔗
|
ColdIce |
Seem to be random, first entry ever 01.01.2008 that is available (8 years missing of data) has ID 70367, but we also have ID 60000 with post from 14.03.2013 |
23:36
🔗
|
ae_g_i_s |
ColdIce: yeah, same for the RSS feeds, they wildly jump from around 250 to 4800 |
23:36
🔗
|
arkiver |
or we do an archivejob with "!a <", with the list of URLs and the main site, so it will also follow links from the list of questions |
23:36
🔗
|
ae_g_i_s |
currently grabbing 0-5000 just to see which ones exist |
23:37
🔗
|
xmc |
arkiver: that will likely take forever |
23:37
🔗
|
xmc |
depending on how archivebot orders found links relative to given urls |
23:38
🔗
|
xmc |
er shit i'm supposed to be away |
23:38
🔗
|
arkiver |
I believe it first grabs the given URLs, then the found URLs |
23:41
🔗
|
ae_g_i_s |
good news: judging from the first ~2500 RSS feeds, the visible subcategories seem to be the only accessible ones; there are 7 feeds with actual content in those 2.5k |
23:43
🔗
|
ColdIce |
4 main categories on the site, 21 sub-categories. Please note, a question which has assigned sub-cateogories, *might* also have a search-url attached in where the rest of categories are, indicated by "sok" in URL |
23:43
🔗
|
ColdIce |
also questions do have sub-categories assigned tho |
23:45
🔗
|
ColdIce |
ae_g_i_s: where do you see the RSS feed? I only see RSS feed for last questions... |
23:47
🔗
|
ae_g_i_s |
ColdIce: yeah, but there's one feed for each subcategory |
23:47
🔗
|
ColdIce |
and that feed returns the latest questions |
23:48
🔗
|
ae_g_i_s |
yeah, it's not useful to grab all questions or even oldest ones |
23:48
🔗
|
ae_g_i_s |
as in, the RSS is not useful for that ;) |
23:51
🔗
|
ColdIce |
wouldn't be easier to iterate each page within a sub-category? |
23:51
🔗
|
ColdIce |
Until we hit a specific HTML-text? |
23:52
🔗
|
ae_g_i_s |
i was just trying to find out which categories there are and the highest ID |
23:52
🔗
|
ae_g_i_s |
there seem to be 23 categories, latest 644740 (as you said), but they jump quite a bit :/ |
23:53
🔗
|
ae_g_i_s |
categories: http://lpaste.net/5455619423912067072 |
23:54
🔗
|
ColdIce |
Nice |
23:54
🔗
|
ae_g_i_s |
ColdIce: how did you grab the oldest ID? |
23:58
🔗
|
ColdIce |
Used last answer page without searching in specific category, oddly enough, laste page as of now is http://www.klara-klok.no/siste-svar?page=23500 |
23:58
🔗
|
ColdIce |
but the oldest id, is unknown due to randomness |
23:58
🔗
|
ColdIce |
like I said, I found id 60000 of post in 2013 and id 70367 of the oldest post ever |
23:58
🔗
|
ColdIce |
so ID is random and can't be used |
23:59
🔗
|
ColdIce |
could always iterate 1 to 23500 and fetch each html item |
23:59
🔗
|
arkiver |
let's create a channel for this |
23:59
🔗
|
arkiver |
and discuss in that channel |