Time |
Nickname |
Message |
00:00
π
|
Fusl |
i'm mering these two download directories now, fixing my fkup |
00:14
π
|
|
odemgi_ has quit IRC (Remote host closed the connection) |
00:14
π
|
|
sknebel has quit IRC (Read error: Operation timed out) |
00:14
π
|
|
wacky has quit IRC (Read error: Operation timed out) |
00:14
π
|
|
human has quit IRC (Remote host closed the connection) |
00:15
π
|
|
odemgi_ has joined #archiveteam-bs |
00:15
π
|
|
Dark_Star has quit IRC (Read error: Operation timed out) |
00:15
π
|
|
human has joined #archiveteam-bs |
00:15
π
|
|
bitBaron has quit IRC (Quit: My computer has gone to sleep. π΄πͺZZZzzzβ¦) |
00:15
π
|
|
BlueMax has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
RichardG has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
brayden has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
Stiletto has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
lenary has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
omglolbah has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
LFlare has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
BartoCH has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
ReimuHaku has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
DFJustin has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
SketchCow has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
yuitimoth has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
arbin_ has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
Phoen1x_ has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
arkiver has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
zhongfu has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
sec^nd has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
Kenshin has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
decay has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
Jusque has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
acridAxid has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
tuluu has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
Sanqui has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
Jon has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
t3 has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
bsmith093 has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
Ing3b0rg has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
obskyr has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
gandalf has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
atbk has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
ephemer0l has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
nyaomi has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
jeekl has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
eientei95 has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
cynthia has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
apache2 has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
hook54321 has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
Zebranky has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
horkermon has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
colona has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
Fusl_ has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
revi has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
octarine has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
chr1sm has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
HCross has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
Vito` has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
bitspill has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
pnJay has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
riking has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
diggan has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
JSharp has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
tsr has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
bakJAA has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
ThisAsYou has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
Ctrl-S_ has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
deathy has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
DrasticAc has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
Meroje has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
kpcyrd has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
Hecatz has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
Muad-Dib has quit IRC (hub.dk efnet.port80.se) |
00:15
π
|
|
sknebel has joined #archiveteam-bs |
00:15
π
|
|
halt has quit IRC (Read error: Operation timed out) |
00:16
π
|
|
wacky has joined #archiveteam-bs |
00:16
π
|
|
bitBaron has joined #archiveteam-bs |
00:17
π
|
|
DarkStar1 has joined #archiveteam-bs |
00:17
π
|
|
apache2 has joined #archiveteam-bs |
00:17
π
|
|
BlueMax has joined #archiveteam-bs |
00:17
π
|
|
RichardG has joined #archiveteam-bs |
00:17
π
|
|
brayden has joined #archiveteam-bs |
00:17
π
|
|
lenary has joined #archiveteam-bs |
00:17
π
|
|
Stiletto has joined #archiveteam-bs |
00:17
π
|
|
omglolbah has joined #archiveteam-bs |
00:17
π
|
|
LFlare has joined #archiveteam-bs |
00:17
π
|
|
BartoCH has joined #archiveteam-bs |
00:17
π
|
|
ReimuHaku has joined #archiveteam-bs |
00:17
π
|
|
DFJustin has joined #archiveteam-bs |
00:17
π
|
|
SketchCow has joined #archiveteam-bs |
00:17
π
|
|
yuitimoth has joined #archiveteam-bs |
00:17
π
|
|
arbin_ has joined #archiveteam-bs |
00:17
π
|
|
arkiver has joined #archiveteam-bs |
00:17
π
|
|
sec^nd has joined #archiveteam-bs |
00:17
π
|
|
zhongfu has joined #archiveteam-bs |
00:17
π
|
|
Kenshin has joined #archiveteam-bs |
00:17
π
|
|
decay has joined #archiveteam-bs |
00:17
π
|
|
Jusque has joined #archiveteam-bs |
00:17
π
|
|
acridAxid has joined #archiveteam-bs |
00:17
π
|
|
tuluu has joined #archiveteam-bs |
00:17
π
|
|
Sanqui has joined #archiveteam-bs |
00:17
π
|
|
Jon has joined #archiveteam-bs |
00:17
π
|
|
t3 has joined #archiveteam-bs |
00:17
π
|
|
Ing3b0rg has joined #archiveteam-bs |
00:17
π
|
|
bsmith093 has joined #archiveteam-bs |
00:17
π
|
|
obskyr has joined #archiveteam-bs |
00:17
π
|
|
gandalf has joined #archiveteam-bs |
00:17
π
|
|
atbk has joined #archiveteam-bs |
00:17
π
|
|
cynthia has joined #archiveteam-bs |
00:17
π
|
|
ephemer0l has joined #archiveteam-bs |
00:17
π
|
|
Fusl_ has joined #archiveteam-bs |
00:17
π
|
|
nyaomi has joined #archiveteam-bs |
00:17
π
|
|
jeekl has joined #archiveteam-bs |
00:17
π
|
|
eientei95 has joined #archiveteam-bs |
00:17
π
|
|
efnet.port80.se sets mode: +oooo SketchCow arkiver Sanqui eientei95 |
00:17
π
|
|
hook54321 has joined #archiveteam-bs |
00:17
π
|
|
Zebranky has joined #archiveteam-bs |
00:17
π
|
|
horkermon has joined #archiveteam-bs |
00:17
π
|
|
revi has joined #archiveteam-bs |
00:17
π
|
|
colona has joined #archiveteam-bs |
00:17
π
|
|
octarine has joined #archiveteam-bs |
00:17
π
|
|
chr1sm has joined #archiveteam-bs |
00:17
π
|
|
HCross has joined #archiveteam-bs |
00:17
π
|
|
efnet.port80.se sets mode: +oo hook54321 HCross |
00:17
π
|
|
Vito` has joined #archiveteam-bs |
00:17
π
|
|
bitspill has joined #archiveteam-bs |
00:17
π
|
|
pnJay has joined #archiveteam-bs |
00:17
π
|
|
riking has joined #archiveteam-bs |
00:17
π
|
|
diggan has joined #archiveteam-bs |
00:17
π
|
|
JSharp has joined #archiveteam-bs |
00:17
π
|
|
tsr has joined #archiveteam-bs |
00:17
π
|
|
bakJAA has joined #archiveteam-bs |
00:17
π
|
|
ThisAsYou has joined #archiveteam-bs |
00:17
π
|
|
Ctrl-S_ has joined #archiveteam-bs |
00:17
π
|
|
deathy has joined #archiveteam-bs |
00:17
π
|
|
DrasticAc has joined #archiveteam-bs |
00:17
π
|
|
Meroje has joined #archiveteam-bs |
00:17
π
|
|
kpcyrd has joined #archiveteam-bs |
00:17
π
|
|
Hecatz has joined #archiveteam-bs |
00:17
π
|
|
Muad-Dib has joined #archiveteam-bs |
00:17
π
|
|
efnet.port80.se sets mode: +o bakJAA |
00:17
π
|
|
JAA sets mode: +o bakJAA |
00:17
π
|
|
Phoen1x has joined #archiveteam-bs |
00:18
π
|
|
halt has joined #archiveteam-bs |
00:18
π
|
|
bakJAA sets mode: +o JAA |
00:22
π
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
00:26
π
|
|
Sk1d has joined #archiveteam-bs |
00:28
π
|
|
bitBaron has quit IRC (Quit: My computer has gone to sleep. π΄πͺZZZzzzβ¦) |
00:34
π
|
|
bitBaron has joined #archiveteam-bs |
01:05
π
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
01:08
π
|
|
Sk1d has joined #archiveteam-bs |
01:10
π
|
|
benjinsmi has quit IRC (Quit: Leaving) |
01:12
π
|
|
human has quit IRC (Remote host closed the connection) |
01:12
π
|
|
human has joined #archiveteam-bs |
01:13
π
|
|
atomicthu has quit IRC (Read error: Operation timed out) |
01:13
π
|
|
Fletcher has quit IRC (Read error: Operation timed out) |
01:14
π
|
|
halt has quit IRC (Read error: Operation timed out) |
01:15
π
|
|
atomicthu has joined #archiveteam-bs |
01:17
π
|
|
halt has joined #archiveteam-bs |
01:20
π
|
|
Fletcher has joined #archiveteam-bs |
01:25
π
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
01:27
π
|
|
Sk1d has joined #archiveteam-bs |
01:50
π
|
|
Mateon1 has quit IRC (Ping timeout: 612 seconds) |
01:54
π
|
|
Mateon1 has joined #archiveteam-bs |
02:00
π
|
|
ndiddy has quit IRC () |
02:01
π
|
|
human has quit IRC (Remote host closed the connection) |
02:02
π
|
|
human has joined #archiveteam-bs |
02:04
π
|
|
human has quit IRC (Remote host closed the connection) |
02:38
π
|
znak |
mr_archiv: I'm pretty sure the YouTube API has a free quota the resets every day, but I never actually tried pushing it to the limit. And yeah, creating a Google account w/o a phone number can be hard. |
02:39
π
|
znak |
Unrelated to YouTube comments, Super Mario Maker 2 was just announced, and now there's the question of how long Nintendo will keep hosting Super Mario Maker 1 levels. There are at least hundreds of thousands of levels. |
02:40
π
|
znak |
I seem to remember some people reverse-engineering the data formats, but does anyone know of an effort to archive levels? |
02:41
π
|
znak |
The closest thing I know of is https://github.com/tachyo/SMMData, which is metadata (only) of 300K levels, scraped from the web service at supermariomakerbookmark.nintendo.net. |
02:47
π
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
02:49
π
|
|
Sk1d has joined #archiveteam-bs |
02:51
π
|
|
halt has quit IRC (Read error: Operation timed out) |
02:53
π
|
|
halt has joined #archiveteam-bs |
02:59
π
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
03:02
π
|
|
Sk1d has joined #archiveteam-bs |
03:24
π
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
03:26
π
|
|
Sk1d has joined #archiveteam-bs |
03:32
π
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
03:36
π
|
|
Sk1d has joined #archiveteam-bs |
03:50
π
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
03:54
π
|
|
Sk1d has joined #archiveteam-bs |
04:00
π
|
|
glmd has joined #archiveteam-bs |
04:01
π
|
glmd |
Hey we're trying to archive Blogger comments. |
04:01
π
|
glmd |
The worker is here: https://github.com/afrmtbl/blogspot-comment-backup |
04:01
π
|
glmd |
We'd appreciate some help. |
04:04
π
|
|
glmd has quit IRC (Client Quit) |
04:07
π
|
Fusl |
JAA: 614 zips |
04:10
π
|
|
bitBaron has quit IRC (Quit: My computer has gone to sleep. π΄πͺZZZzzzβ¦) |
04:14
π
|
|
odemgi has joined #archiveteam-bs |
04:15
π
|
|
bobmcjr has joined #archiveteam-bs |
04:17
π
|
bobmcjr |
I'm trying to archive a phpBB forum, but it's misconfigured so it doesn't comprehend cookies. The result is many duplicates due to the &sid= tacked on to the URL, and it seems that it might not finish. |
04:17
π
|
|
odemgi_ has quit IRC (Read error: Operation timed out) |
04:21
π
|
mr_archiv |
glmd, I am interested in this. Can you be more specific on what you need help with. |
04:24
π
|
bobmcjr |
glmd? |
04:25
π
|
mr_archiv |
bobmcjr, have you considered removing &sid from the URL before fetching them? |
04:26
π
|
bobmcjr |
phpBB autogenerates it as I crawl. It also seems to change |
04:26
π
|
mr_archiv |
urllib.parse.urlsplit then urllib.parse.parse_qs then urllib.parse.urlunsplit |
04:28
π
|
bobmcjr |
I've just been using wget --mirror --warc-file |
04:29
π
|
mr_archiv |
Can you post-process the URLs via stdout? |
04:30
π
|
mr_archiv |
I have the real answer. You need to use a session. |
04:30
π
|
mr_archiv |
Then the sid will stay the same I think. |
04:31
π
|
mr_archiv |
Wait nevermind, I am wrong about that. |
04:32
π
|
mr_archiv |
I don't understand why the sid would change, sid = session ID. |
04:32
π
|
bobmcjr |
I'm trying to tell if the duplicates are from my session, or from hyperlinks people have embedded that have their sids |
04:33
π
|
mr_archiv |
Good point, I did not think of people accidently leaking their session ID. |
04:33
π
|
mr_archiv |
Can you generate a list of URLs and save it to a file. |
04:34
π
|
mr_archiv |
Then post-process the file with a Python script then instruct wget to fetch all the URLs. |
04:34
π
|
mr_archiv |
After removing the session IDs you can use sort -u so they won't be fetched twice. |
04:35
π
|
mr_archiv |
Actually I just relized, there is no point in doing this. |
04:35
π
|
mr_archiv |
Depending on how this is done it will end up being harder on the server than just fetching the duplicates. |
04:36
π
|
mr_archiv |
I think the only way is to remove the session ID before attempting to fetch the URLs |
04:36
π
|
bobmcjr |
I'll try again and just hope my sid remains stable so that there are a finite number of URLs. |
04:37
π
|
bobmcjr |
This is the first thing I've tried to crawl for proper archiving |
04:37
π
|
mr_archiv |
Generally speaking you don't have to go through the kind of trouble like I described. Traditional web crawling works for most websites. |
04:37
π
|
mr_archiv |
I mean for archiving. |
04:38
π
|
bobmcjr |
For reference it's this, which may or may not disappear in 24-48 hours: forum.eltechs.com/index.php |
04:42
π
|
bobmcjr |
Ok, my sid seems stable. I'll just tolerate duplicates. |
04:53
π
|
|
qw3rty113 has joined #archiveteam-bs |
04:59
π
|
|
qw3rty112 has quit IRC (Read error: Operation timed out) |
04:59
π
|
|
godane has quit IRC (Remote host closed the connection) |
05:02
π
|
|
godane has joined #archiveteam-bs |
05:17
π
|
|
wp494 has quit IRC (Read error: Operation timed out) |
05:17
π
|
|
wp494 has joined #archiveteam-bs |
05:19
π
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
05:22
π
|
|
Sk1d has joined #archiveteam-bs |
05:45
π
|
|
wp494 has quit IRC (Ping timeout: 252 seconds) |
05:47
π
|
|
wp494 has joined #archiveteam-bs |
06:16
π
|
bobmcjr |
Straight wget can't handle this. It is looping. Is there any more flexible software for crawling to assemble a WARC? |
06:28
π
|
jodizzle |
bobmcjr: Check out https://github.com/ludios/grab-site |
06:31
π
|
|
superkuh has joined #archiveteam-bs |
06:32
π
|
|
deevious has joined #archiveteam-bs |
07:06
π
|
|
Stilett0 has joined #archiveteam-bs |
07:16
π
|
|
Stiletto has quit IRC (Ping timeout: 615 seconds) |
07:47
π
|
|
wyatt8740 has quit IRC (Ping timeout: 360 seconds) |
07:48
π
|
|
Albardin has quit IRC (Read error: Operation timed out) |
07:48
π
|
|
paul2520 has quit IRC (Read error: Operation timed out) |
07:48
π
|
|
wacky has quit IRC (Write error: Broken pipe) |
07:48
π
|
|
kiska1 has quit IRC (Read error: Operation timed out) |
07:48
π
|
|
wacky has joined #archiveteam-bs |
07:50
π
|
|
PhrackD has quit IRC (Read error: Operation timed out) |
07:50
π
|
|
GLaDOS has quit IRC (Write error: Broken pipe) |
07:51
π
|
|
HashbangI has quit IRC (Read error: Operation timed out) |
07:51
π
|
|
sep332 has quit IRC (Read error: Operation timed out) |
07:51
π
|
|
dxrt_ has quit IRC (Read error: Operation timed out) |
07:51
π
|
|
qw3rty113 has quit IRC (Read error: Operation timed out) |
07:51
π
|
|
step has quit IRC (Read error: Operation timed out) |
07:53
π
|
|
qw3rty113 has joined #archiveteam-bs |
07:53
π
|
|
kiska1 has joined #archiveteam-bs |
07:53
π
|
|
Albardin has joined #archiveteam-bs |
07:55
π
|
|
kiska1 has quit IRC (Read error: Operation timed out) |
07:59
π
|
|
TigerbotH has quit IRC (Ping timeout: 600 seconds) |
07:59
π
|
|
bobmcjr has quit IRC (Read error: Operation timed out) |
07:59
π
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
07:59
π
|
|
Albardin has quit IRC (Read error: Operation timed out) |
08:00
π
|
|
Albardin has joined #archiveteam-bs |
08:02
π
|
|
qw3rty114 has joined #archiveteam-bs |
08:02
π
|
|
kiska1 has joined #archiveteam-bs |
08:02
π
|
|
Sk1d has joined #archiveteam-bs |
08:03
π
|
|
bobmcjr has joined #archiveteam-bs |
08:03
π
|
|
paul2520 has joined #archiveteam-bs |
08:03
π
|
|
qw3rty113 has quit IRC (Ping timeout: 600 seconds) |
08:05
π
|
|
TigerbotH has joined #archiveteam-bs |
08:05
π
|
|
dxrt_ has joined #archiveteam-bs |
08:05
π
|
|
dxrt sets mode: +o dxrt_ |
08:08
π
|
|
PhrackD has joined #archiveteam-bs |
08:08
π
|
|
Exairnous has quit IRC (Ping timeout: 252 seconds) |
08:09
π
|
|
step has joined #archiveteam-bs |
08:09
π
|
|
sep332 has joined #archiveteam-bs |
08:09
π
|
|
HashbangI has joined #archiveteam-bs |
08:11
π
|
|
GLaDOS has joined #archiveteam-bs |
08:58
π
|
JAA |
Fusl: lol nice. Do you have a log file for a complete grab? (If not, I'll wget --spider it.) |
09:00
π
|
JAA |
bobmcjr: Interesting, I haven't seen that before. Yeah, it's going to be very tricky to archive that. You could ignore any URL which doesn't have the initial SID in it, but then you'd miss parts of the forums (it would stop recursing when the SID changes). Or you could strip out the SID, as wpull does (and hence ArchiveBot and grab-site), but then all links will be broken. :-/ There probably is no |
09:00
π
|
JAA |
good solution. |
09:01
π
|
JAA |
Try ignoring the "Delete all board cookies" in the footer at least though. That should help with the changing SIDs. |
09:02
π
|
JAA |
Or maybe you could go after URLs like http://forum.eltechs.com/viewtopic.php?p=3329 instead. Unbrowsable, but at least all the content would be there. |
09:04
π
|
JAA |
A two-phase approach might also work: first archive only the thread lists, then extract the thread URLs and grab those. No recursion, simple lists of URLs. That might even preserve browsability mostly. |
09:24
π
|
|
Phoen1x has quit IRC (Quit: Leaving) |
10:00
π
|
|
BlueMax has quit IRC (Quit: Leaving) |
10:01
π
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
10:02
π
|
|
Stiletto has joined #archiveteam-bs |
11:49
π
|
|
lindalap has joined #archiveteam-bs |
11:49
π
|
|
lindalap has quit IRC (Client Quit) |
12:26
π
|
|
bitBaron has joined #archiveteam-bs |
13:06
π
|
|
sep332 has quit IRC (ZNC 1.6.3+deb1ubuntu0.1 - http://znc.in) |
13:35
π
|
kiska |
urgh on the blox shutdown |
13:48
π
|
|
bitBaron has quit IRC (My computer has gone to sleep. π΄πͺZZZzzzβ¦) |
14:02
π
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
14:04
π
|
|
Sk1d has joined #archiveteam-bs |
14:07
π
|
PurpleSym |
JAA: Thereβs a blog search, making discovery fairly easy: http://www.blox.pl/html/2883585,13107202,81.html?y |
14:07
π
|
PurpleSym |
Iβll try to get a list of blogs. |
14:11
π
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
14:14
π
|
Fusl |
JAA: nope i don't |
14:14
π
|
|
Sk1d has joined #archiveteam-bs |
14:45
π
|
JAA |
PurpleSym: Hmm, I found this list: http://www.blox.pl/html/65537.html |
14:45
π
|
JAA |
Also the category lists linked on the right on that page. |
14:48
π
|
PurpleSym |
Thereβs no pagination on the first one, so it can hardly be a complete list of blogs. |
14:49
π
|
|
wp494 has quit IRC (Ping timeout: 268 seconds) |
14:49
π
|
|
wp494 has joined #archiveteam-bs |
14:49
π
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
14:50
π
|
PurpleSym |
Categories are looking better. Iβll scrape them as well. |
14:51
π
|
JAA |
There are tags also: http://www.blox.pl/blog/tags/index |
14:52
π
|
PurpleSym |
Thanks, Iβll see how close I can get to 689136 URLs with these lists. |
14:53
π
|
|
Sk1d has joined #archiveteam-bs |
14:53
π
|
JAA |
One more: http://www.blox.pl/blog/index |
14:54
π
|
JAA |
With the groups at the top ( |
14:54
π
|
PurpleSym |
Hm, or is it 665438 blogs? |
14:54
π
|
JAA |
"Polecamy", "Biznes", etc.) |
14:54
π
|
JAA |
665k blogs |
15:29
π
|
|
Fusl has quit IRC (Quit: oh god please send help) |
15:31
π
|
|
Fusl has joined #archiveteam-bs |
15:58
π
|
arkiver |
blox.pl sounds like a warrior project |
15:58
π
|
arkiver |
:) |
15:59
π
|
arkiver |
Can probably set something up tomorrow if itΒ΄s not too difficult to archive |
16:04
π
|
PurpleSym |
arkiver: I should have a first list ready by then. As for archiving: If wpull preserves cookies, you should ignore URLs like this one: http://xxx.blox.pl/html?mobile=yes |
16:43
π
|
|
bitBaron has joined #archiveteam-bs |
16:53
π
|
|
bitBaron has quit IRC (My computer has gone to sleep. π΄πͺZZZzzzβ¦) |
16:59
π
|
|
bitBaron has joined #archiveteam-bs |
18:44
π
|
|
astrid has left ][ |
18:57
π
|
|
adwus has joined #archiveteam-bs |
19:08
π
|
|
Exairnous has joined #archiveteam-bs |
19:09
π
|
|
adwus has quit IRC (Ping timeout: 360 seconds) |
19:43
π
|
|
schbirid has joined #archiveteam-bs |
19:56
π
|
|
godane has quit IRC (Read error: Connection reset by peer) |
19:57
π
|
|
sep332 has joined #archiveteam-bs |
19:58
π
|
|
BartoCH has quit IRC (Ping timeout: 615 seconds) |
20:02
π
|
|
BartoCH has joined #archiveteam-bs |
20:14
π
|
|
godane has joined #archiveteam-bs |
20:22
π
|
|
BartoCH has quit IRC (Ping timeout: 615 seconds) |
20:32
π
|
|
fredgido has quit IRC (Read error: Connection reset by peer) |
20:36
π
|
|
m007a83_ has joined #archiveteam-bs |
20:39
π
|
|
m007a83 has quit IRC (Ping timeout: 252 seconds) |
20:48
π
|
|
bitBaron has quit IRC (My computer has gone to sleep. π΄πͺZZZzzzβ¦) |
20:57
π
|
|
bitBaron has joined #archiveteam-bs |
21:06
π
|
|
BartoCH has joined #archiveteam-bs |
21:09
π
|
|
bitBaron has quit IRC (Quit: My computer has gone to sleep. π΄πͺZZZzzzβ¦) |
21:09
π
|
|
bitBaron has joined #archiveteam-bs |
21:13
π
|
|
Despatche has quit IRC (Read error: Operation timed out) |
21:19
π
|
|
VerifiedJ has joined #archiveteam-bs |
21:21
π
|
|
VerifiedJ has quit IRC (Client Quit) |
21:22
π
|
|
VerifiedJ has joined #archiveteam-bs |
21:39
π
|
|
fredgido has joined #archiveteam-bs |
22:16
π
|
|
fredgido_ has joined #archiveteam-bs |
22:16
π
|
|
Albardin has quit IRC (Read error: Operation timed out) |
22:16
π
|
|
paul2520 has quit IRC (Read error: Operation timed out) |
22:16
π
|
|
dxrt_ has quit IRC (Write error: Broken pipe) |
22:16
π
|
|
kiska1 has quit IRC (Read error: Operation timed out) |
22:17
π
|
|
sep332 has quit IRC (Write error: Broken pipe) |
22:17
π
|
|
HashbangI has quit IRC (Read error: Operation timed out) |
22:18
π
|
|
PotcFdk has quit IRC (Read error: Operation timed out) |
22:18
π
|
|
step has quit IRC (Read error: Operation timed out) |
22:19
π
|
|
qw3rty114 has quit IRC (Read error: Operation timed out) |
22:21
π
|
|
Fusl has quit IRC (Read error: Operation timed out) |
22:21
π
|
|
PhrackD has quit IRC (Ping timeout: 600 seconds) |
22:22
π
|
|
Fusl has joined #archiveteam-bs |
22:22
π
|
|
Albardin has joined #archiveteam-bs |
22:22
π
|
|
kiska1 has joined #archiveteam-bs |
22:22
π
|
|
qw3rty114 has joined #archiveteam-bs |
22:22
π
|
|
fredgido has quit IRC (Read error: Operation timed out) |
22:22
π
|
|
TigerbotH has quit IRC (Read error: Connection reset by peer) |
22:23
π
|
|
paul2520 has joined #archiveteam-bs |
22:23
π
|
|
GLaDOS has quit IRC (Ping timeout: 600 seconds) |
22:25
π
|
|
GLaDOS has joined #archiveteam-bs |
22:25
π
|
|
PhrackD has joined #archiveteam-bs |
22:26
π
|
|
TigerbotH has joined #archiveteam-bs |
22:26
π
|
|
sep332 has joined #archiveteam-bs |
22:27
π
|
|
BlueMax has joined #archiveteam-bs |
22:28
π
|
|
HashbangI has joined #archiveteam-bs |
22:30
π
|
|
dxrt_ has joined #archiveteam-bs |
22:30
π
|
|
dxrt sets mode: +o dxrt_ |
22:30
π
|
|
step has joined #archiveteam-bs |
22:43
π
|
|
icedice has joined #archiveteam-bs |
23:11
π
|
|
PotcFdk has joined #archiveteam-bs |
23:13
π
|
|
schbirid has quit IRC (Remote host closed the connection) |
23:14
π
|
|
ndiddy has joined #archiveteam-bs |
23:48
π
|
|
Mateon1 has quit IRC (Read error: Operation timed out) |
23:48
π
|
|
Mateon1 has joined #archiveteam-bs |
23:50
π
|
|
wp494 has quit IRC (Read error: Operation timed out) |
23:50
π
|
|
icedice has quit IRC (Read error: Operation timed out) |
23:51
π
|
|
wp494 has joined #archiveteam-bs |
23:56
π
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
23:59
π
|
|
Sk1d has joined #archiveteam-bs |