#archiveteam-bs 2019-02-27,Wed

↑back Search

Time Nickname Message
00:00 πŸ”— Fusl i'm mering these two download directories now, fixing my fkup
00:14 πŸ”— odemgi_ has quit IRC (Remote host closed the connection)
00:14 πŸ”— sknebel has quit IRC (Read error: Operation timed out)
00:14 πŸ”— wacky has quit IRC (Read error: Operation timed out)
00:14 πŸ”— human has quit IRC (Remote host closed the connection)
00:15 πŸ”— odemgi_ has joined #archiveteam-bs
00:15 πŸ”— Dark_Star has quit IRC (Read error: Operation timed out)
00:15 πŸ”— human has joined #archiveteam-bs
00:15 πŸ”— bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴πŸ˜ͺZZZzzz…)
00:15 πŸ”— BlueMax has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— RichardG has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— brayden has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— Stiletto has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— lenary has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— omglolbah has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— LFlare has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— BartoCH has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— ReimuHaku has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— DFJustin has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— SketchCow has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— yuitimoth has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— arbin_ has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— Phoen1x_ has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— arkiver has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— zhongfu has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— sec^nd has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— Kenshin has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— decay has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— Jusque has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— acridAxid has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— tuluu has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— Sanqui has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— Jon has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— t3 has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— bsmith093 has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— Ing3b0rg has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— obskyr has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— gandalf has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— atbk has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— ephemer0l has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— nyaomi has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— jeekl has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— eientei95 has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— cynthia has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— apache2 has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— hook54321 has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— Zebranky has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— horkermon has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— colona has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— Fusl_ has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— revi has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— octarine has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— chr1sm has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— HCross has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— Vito` has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— bitspill has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— pnJay has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— riking has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— diggan has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— JSharp has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— tsr has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— bakJAA has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— ThisAsYou has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— Ctrl-S_ has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— deathy has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— DrasticAc has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— Meroje has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— kpcyrd has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— Hecatz has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— Muad-Dib has quit IRC (hub.dk efnet.port80.se)
00:15 πŸ”— sknebel has joined #archiveteam-bs
00:15 πŸ”— halt has quit IRC (Read error: Operation timed out)
00:16 πŸ”— wacky has joined #archiveteam-bs
00:16 πŸ”— bitBaron has joined #archiveteam-bs
00:17 πŸ”— DarkStar1 has joined #archiveteam-bs
00:17 πŸ”— apache2 has joined #archiveteam-bs
00:17 πŸ”— BlueMax has joined #archiveteam-bs
00:17 πŸ”— RichardG has joined #archiveteam-bs
00:17 πŸ”— brayden has joined #archiveteam-bs
00:17 πŸ”— lenary has joined #archiveteam-bs
00:17 πŸ”— Stiletto has joined #archiveteam-bs
00:17 πŸ”— omglolbah has joined #archiveteam-bs
00:17 πŸ”— LFlare has joined #archiveteam-bs
00:17 πŸ”— BartoCH has joined #archiveteam-bs
00:17 πŸ”— ReimuHaku has joined #archiveteam-bs
00:17 πŸ”— DFJustin has joined #archiveteam-bs
00:17 πŸ”— SketchCow has joined #archiveteam-bs
00:17 πŸ”— yuitimoth has joined #archiveteam-bs
00:17 πŸ”— arbin_ has joined #archiveteam-bs
00:17 πŸ”— arkiver has joined #archiveteam-bs
00:17 πŸ”— sec^nd has joined #archiveteam-bs
00:17 πŸ”— zhongfu has joined #archiveteam-bs
00:17 πŸ”— Kenshin has joined #archiveteam-bs
00:17 πŸ”— decay has joined #archiveteam-bs
00:17 πŸ”— Jusque has joined #archiveteam-bs
00:17 πŸ”— acridAxid has joined #archiveteam-bs
00:17 πŸ”— tuluu has joined #archiveteam-bs
00:17 πŸ”— Sanqui has joined #archiveteam-bs
00:17 πŸ”— Jon has joined #archiveteam-bs
00:17 πŸ”— t3 has joined #archiveteam-bs
00:17 πŸ”— Ing3b0rg has joined #archiveteam-bs
00:17 πŸ”— bsmith093 has joined #archiveteam-bs
00:17 πŸ”— obskyr has joined #archiveteam-bs
00:17 πŸ”— gandalf has joined #archiveteam-bs
00:17 πŸ”— atbk has joined #archiveteam-bs
00:17 πŸ”— cynthia has joined #archiveteam-bs
00:17 πŸ”— ephemer0l has joined #archiveteam-bs
00:17 πŸ”— Fusl_ has joined #archiveteam-bs
00:17 πŸ”— nyaomi has joined #archiveteam-bs
00:17 πŸ”— jeekl has joined #archiveteam-bs
00:17 πŸ”— eientei95 has joined #archiveteam-bs
00:17 πŸ”— efnet.port80.se sets mode: +oooo SketchCow arkiver Sanqui eientei95
00:17 πŸ”— hook54321 has joined #archiveteam-bs
00:17 πŸ”— Zebranky has joined #archiveteam-bs
00:17 πŸ”— horkermon has joined #archiveteam-bs
00:17 πŸ”— revi has joined #archiveteam-bs
00:17 πŸ”— colona has joined #archiveteam-bs
00:17 πŸ”— octarine has joined #archiveteam-bs
00:17 πŸ”— chr1sm has joined #archiveteam-bs
00:17 πŸ”— HCross has joined #archiveteam-bs
00:17 πŸ”— efnet.port80.se sets mode: +oo hook54321 HCross
00:17 πŸ”— Vito` has joined #archiveteam-bs
00:17 πŸ”— bitspill has joined #archiveteam-bs
00:17 πŸ”— pnJay has joined #archiveteam-bs
00:17 πŸ”— riking has joined #archiveteam-bs
00:17 πŸ”— diggan has joined #archiveteam-bs
00:17 πŸ”— JSharp has joined #archiveteam-bs
00:17 πŸ”— tsr has joined #archiveteam-bs
00:17 πŸ”— bakJAA has joined #archiveteam-bs
00:17 πŸ”— ThisAsYou has joined #archiveteam-bs
00:17 πŸ”— Ctrl-S_ has joined #archiveteam-bs
00:17 πŸ”— deathy has joined #archiveteam-bs
00:17 πŸ”— DrasticAc has joined #archiveteam-bs
00:17 πŸ”— Meroje has joined #archiveteam-bs
00:17 πŸ”— kpcyrd has joined #archiveteam-bs
00:17 πŸ”— Hecatz has joined #archiveteam-bs
00:17 πŸ”— Muad-Dib has joined #archiveteam-bs
00:17 πŸ”— efnet.port80.se sets mode: +o bakJAA
00:17 πŸ”— JAA sets mode: +o bakJAA
00:17 πŸ”— Phoen1x has joined #archiveteam-bs
00:18 πŸ”— halt has joined #archiveteam-bs
00:18 πŸ”— bakJAA sets mode: +o JAA
00:22 πŸ”— Sk1d has quit IRC (Read error: Operation timed out)
00:26 πŸ”— Sk1d has joined #archiveteam-bs
00:28 πŸ”— bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴πŸ˜ͺZZZzzz…)
00:34 πŸ”— bitBaron has joined #archiveteam-bs
01:05 πŸ”— Sk1d has quit IRC (Read error: Operation timed out)
01:08 πŸ”— Sk1d has joined #archiveteam-bs
01:10 πŸ”— benjinsmi has quit IRC (Quit: Leaving)
01:12 πŸ”— human has quit IRC (Remote host closed the connection)
01:12 πŸ”— human has joined #archiveteam-bs
01:13 πŸ”— atomicthu has quit IRC (Read error: Operation timed out)
01:13 πŸ”— Fletcher has quit IRC (Read error: Operation timed out)
01:14 πŸ”— halt has quit IRC (Read error: Operation timed out)
01:15 πŸ”— atomicthu has joined #archiveteam-bs
01:17 πŸ”— halt has joined #archiveteam-bs
01:20 πŸ”— Fletcher has joined #archiveteam-bs
01:25 πŸ”— Sk1d has quit IRC (Read error: Operation timed out)
01:27 πŸ”— Sk1d has joined #archiveteam-bs
01:50 πŸ”— Mateon1 has quit IRC (Ping timeout: 612 seconds)
01:54 πŸ”— Mateon1 has joined #archiveteam-bs
02:00 πŸ”— ndiddy has quit IRC ()
02:01 πŸ”— human has quit IRC (Remote host closed the connection)
02:02 πŸ”— human has joined #archiveteam-bs
02:04 πŸ”— human has quit IRC (Remote host closed the connection)
02:38 πŸ”— znak mr_archiv: I'm pretty sure the YouTube API has a free quota the resets every day, but I never actually tried pushing it to the limit. And yeah, creating a Google account w/o a phone number can be hard.
02:39 πŸ”— znak Unrelated to YouTube comments, Super Mario Maker 2 was just announced, and now there's the question of how long Nintendo will keep hosting Super Mario Maker 1 levels. There are at least hundreds of thousands of levels.
02:40 πŸ”— znak I seem to remember some people reverse-engineering the data formats, but does anyone know of an effort to archive levels?
02:41 πŸ”— znak The closest thing I know of is https://github.com/tachyo/SMMData, which is metadata (only) of 300K levels, scraped from the web service at supermariomakerbookmark.nintendo.net.
02:47 πŸ”— Sk1d has quit IRC (Read error: Operation timed out)
02:49 πŸ”— Sk1d has joined #archiveteam-bs
02:51 πŸ”— halt has quit IRC (Read error: Operation timed out)
02:53 πŸ”— halt has joined #archiveteam-bs
02:59 πŸ”— Sk1d has quit IRC (Read error: Operation timed out)
03:02 πŸ”— Sk1d has joined #archiveteam-bs
03:24 πŸ”— Sk1d has quit IRC (Read error: Operation timed out)
03:26 πŸ”— Sk1d has joined #archiveteam-bs
03:32 πŸ”— Sk1d has quit IRC (Read error: Operation timed out)
03:36 πŸ”— Sk1d has joined #archiveteam-bs
03:50 πŸ”— Sk1d has quit IRC (Read error: Operation timed out)
03:54 πŸ”— Sk1d has joined #archiveteam-bs
04:00 πŸ”— glmd has joined #archiveteam-bs
04:01 πŸ”— glmd Hey we're trying to archive Blogger comments.
04:01 πŸ”— glmd The worker is here: https://github.com/afrmtbl/blogspot-comment-backup
04:01 πŸ”— glmd We'd appreciate some help.
04:04 πŸ”— glmd has quit IRC (Client Quit)
04:07 πŸ”— Fusl JAA: 614 zips
04:10 πŸ”— bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴πŸ˜ͺZZZzzz…)
04:14 πŸ”— odemgi has joined #archiveteam-bs
04:15 πŸ”— bobmcjr has joined #archiveteam-bs
04:17 πŸ”— bobmcjr I'm trying to archive a phpBB forum, but it's misconfigured so it doesn't comprehend cookies. The result is many duplicates due to the &sid= tacked on to the URL, and it seems that it might not finish.
04:17 πŸ”— odemgi_ has quit IRC (Read error: Operation timed out)
04:21 πŸ”— mr_archiv glmd, I am interested in this. Can you be more specific on what you need help with.
04:24 πŸ”— bobmcjr glmd?
04:25 πŸ”— mr_archiv bobmcjr, have you considered removing &sid from the URL before fetching them?
04:26 πŸ”— bobmcjr phpBB autogenerates it as I crawl. It also seems to change
04:26 πŸ”— mr_archiv urllib.parse.urlsplit then urllib.parse.parse_qs then urllib.parse.urlunsplit
04:28 πŸ”— bobmcjr I've just been using wget --mirror --warc-file
04:29 πŸ”— mr_archiv Can you post-process the URLs via stdout?
04:30 πŸ”— mr_archiv I have the real answer. You need to use a session.
04:30 πŸ”— mr_archiv Then the sid will stay the same I think.
04:31 πŸ”— mr_archiv Wait nevermind, I am wrong about that.
04:32 πŸ”— mr_archiv I don't understand why the sid would change, sid = session ID.
04:32 πŸ”— bobmcjr I'm trying to tell if the duplicates are from my session, or from hyperlinks people have embedded that have their sids
04:33 πŸ”— mr_archiv Good point, I did not think of people accidently leaking their session ID.
04:33 πŸ”— mr_archiv Can you generate a list of URLs and save it to a file.
04:34 πŸ”— mr_archiv Then post-process the file with a Python script then instruct wget to fetch all the URLs.
04:34 πŸ”— mr_archiv After removing the session IDs you can use sort -u so they won't be fetched twice.
04:35 πŸ”— mr_archiv Actually I just relized, there is no point in doing this.
04:35 πŸ”— mr_archiv Depending on how this is done it will end up being harder on the server than just fetching the duplicates.
04:36 πŸ”— mr_archiv I think the only way is to remove the session ID before attempting to fetch the URLs
04:36 πŸ”— bobmcjr I'll try again and just hope my sid remains stable so that there are a finite number of URLs.
04:37 πŸ”— bobmcjr This is the first thing I've tried to crawl for proper archiving
04:37 πŸ”— mr_archiv Generally speaking you don't have to go through the kind of trouble like I described. Traditional web crawling works for most websites.
04:37 πŸ”— mr_archiv I mean for archiving.
04:38 πŸ”— bobmcjr For reference it's this, which may or may not disappear in 24-48 hours: forum.eltechs.com/index.php
04:42 πŸ”— bobmcjr Ok, my sid seems stable. I'll just tolerate duplicates.
04:53 πŸ”— qw3rty113 has joined #archiveteam-bs
04:59 πŸ”— qw3rty112 has quit IRC (Read error: Operation timed out)
04:59 πŸ”— godane has quit IRC (Remote host closed the connection)
05:02 πŸ”— godane has joined #archiveteam-bs
05:17 πŸ”— wp494 has quit IRC (Read error: Operation timed out)
05:17 πŸ”— wp494 has joined #archiveteam-bs
05:19 πŸ”— Sk1d has quit IRC (Read error: Operation timed out)
05:22 πŸ”— Sk1d has joined #archiveteam-bs
05:45 πŸ”— wp494 has quit IRC (Ping timeout: 252 seconds)
05:47 πŸ”— wp494 has joined #archiveteam-bs
06:16 πŸ”— bobmcjr Straight wget can't handle this. It is looping. Is there any more flexible software for crawling to assemble a WARC?
06:28 πŸ”— jodizzle bobmcjr: Check out https://github.com/ludios/grab-site
06:31 πŸ”— superkuh has joined #archiveteam-bs
06:32 πŸ”— deevious has joined #archiveteam-bs
07:06 πŸ”— Stilett0 has joined #archiveteam-bs
07:16 πŸ”— Stiletto has quit IRC (Ping timeout: 615 seconds)
07:47 πŸ”— wyatt8740 has quit IRC (Ping timeout: 360 seconds)
07:48 πŸ”— Albardin has quit IRC (Read error: Operation timed out)
07:48 πŸ”— paul2520 has quit IRC (Read error: Operation timed out)
07:48 πŸ”— wacky has quit IRC (Write error: Broken pipe)
07:48 πŸ”— kiska1 has quit IRC (Read error: Operation timed out)
07:48 πŸ”— wacky has joined #archiveteam-bs
07:50 πŸ”— PhrackD has quit IRC (Read error: Operation timed out)
07:50 πŸ”— GLaDOS has quit IRC (Write error: Broken pipe)
07:51 πŸ”— HashbangI has quit IRC (Read error: Operation timed out)
07:51 πŸ”— sep332 has quit IRC (Read error: Operation timed out)
07:51 πŸ”— dxrt_ has quit IRC (Read error: Operation timed out)
07:51 πŸ”— qw3rty113 has quit IRC (Read error: Operation timed out)
07:51 πŸ”— step has quit IRC (Read error: Operation timed out)
07:53 πŸ”— qw3rty113 has joined #archiveteam-bs
07:53 πŸ”— kiska1 has joined #archiveteam-bs
07:53 πŸ”— Albardin has joined #archiveteam-bs
07:55 πŸ”— kiska1 has quit IRC (Read error: Operation timed out)
07:59 πŸ”— TigerbotH has quit IRC (Ping timeout: 600 seconds)
07:59 πŸ”— bobmcjr has quit IRC (Read error: Operation timed out)
07:59 πŸ”— Sk1d has quit IRC (Read error: Operation timed out)
07:59 πŸ”— Albardin has quit IRC (Read error: Operation timed out)
08:00 πŸ”— Albardin has joined #archiveteam-bs
08:02 πŸ”— qw3rty114 has joined #archiveteam-bs
08:02 πŸ”— kiska1 has joined #archiveteam-bs
08:02 πŸ”— Sk1d has joined #archiveteam-bs
08:03 πŸ”— bobmcjr has joined #archiveteam-bs
08:03 πŸ”— paul2520 has joined #archiveteam-bs
08:03 πŸ”— qw3rty113 has quit IRC (Ping timeout: 600 seconds)
08:05 πŸ”— TigerbotH has joined #archiveteam-bs
08:05 πŸ”— dxrt_ has joined #archiveteam-bs
08:05 πŸ”— dxrt sets mode: +o dxrt_
08:08 πŸ”— PhrackD has joined #archiveteam-bs
08:08 πŸ”— Exairnous has quit IRC (Ping timeout: 252 seconds)
08:09 πŸ”— step has joined #archiveteam-bs
08:09 πŸ”— sep332 has joined #archiveteam-bs
08:09 πŸ”— HashbangI has joined #archiveteam-bs
08:11 πŸ”— GLaDOS has joined #archiveteam-bs
08:58 πŸ”— JAA Fusl: lol nice. Do you have a log file for a complete grab? (If not, I'll wget --spider it.)
09:00 πŸ”— JAA bobmcjr: Interesting, I haven't seen that before. Yeah, it's going to be very tricky to archive that. You could ignore any URL which doesn't have the initial SID in it, but then you'd miss parts of the forums (it would stop recursing when the SID changes). Or you could strip out the SID, as wpull does (and hence ArchiveBot and grab-site), but then all links will be broken. :-/ There probably is no
09:00 πŸ”— JAA good solution.
09:01 πŸ”— JAA Try ignoring the "Delete all board cookies" in the footer at least though. That should help with the changing SIDs.
09:02 πŸ”— JAA Or maybe you could go after URLs like http://forum.eltechs.com/viewtopic.php?p=3329 instead. Unbrowsable, but at least all the content would be there.
09:04 πŸ”— JAA A two-phase approach might also work: first archive only the thread lists, then extract the thread URLs and grab those. No recursion, simple lists of URLs. That might even preserve browsability mostly.
09:24 πŸ”— Phoen1x has quit IRC (Quit: Leaving)
10:00 πŸ”— BlueMax has quit IRC (Quit: Leaving)
10:01 πŸ”— Stilett0 has quit IRC (Read error: Operation timed out)
10:02 πŸ”— Stiletto has joined #archiveteam-bs
11:49 πŸ”— lindalap has joined #archiveteam-bs
11:49 πŸ”— lindalap has quit IRC (Client Quit)
12:26 πŸ”— bitBaron has joined #archiveteam-bs
13:06 πŸ”— sep332 has quit IRC (ZNC 1.6.3+deb1ubuntu0.1 - http://znc.in)
13:35 πŸ”— kiska urgh on the blox shutdown
13:48 πŸ”— bitBaron has quit IRC (My computer has gone to sleep. 😴πŸ˜ͺZZZzzz…)
14:02 πŸ”— Sk1d has quit IRC (Read error: Operation timed out)
14:04 πŸ”— Sk1d has joined #archiveteam-bs
14:07 πŸ”— PurpleSym JAA: There’s a blog search, making discovery fairly easy: http://www.blox.pl/html/2883585,13107202,81.html?y
14:07 πŸ”— PurpleSym I’ll try to get a list of blogs.
14:11 πŸ”— Sk1d has quit IRC (Read error: Operation timed out)
14:14 πŸ”— Fusl JAA: nope i don't
14:14 πŸ”— Sk1d has joined #archiveteam-bs
14:45 πŸ”— JAA PurpleSym: Hmm, I found this list: http://www.blox.pl/html/65537.html
14:45 πŸ”— JAA Also the category lists linked on the right on that page.
14:48 πŸ”— PurpleSym There’s no pagination on the first one, so it can hardly be a complete list of blogs.
14:49 πŸ”— wp494 has quit IRC (Ping timeout: 268 seconds)
14:49 πŸ”— wp494 has joined #archiveteam-bs
14:49 πŸ”— Sk1d has quit IRC (Read error: Operation timed out)
14:50 πŸ”— PurpleSym Categories are looking better. I’ll scrape them as well.
14:51 πŸ”— JAA There are tags also: http://www.blox.pl/blog/tags/index
14:52 πŸ”— PurpleSym Thanks, I’ll see how close I can get to 689136 URLs with these lists.
14:53 πŸ”— Sk1d has joined #archiveteam-bs
14:53 πŸ”— JAA One more: http://www.blox.pl/blog/index
14:54 πŸ”— JAA With the groups at the top (
14:54 πŸ”— PurpleSym Hm, or is it 665438 blogs?
14:54 πŸ”— JAA "Polecamy", "Biznes", etc.)
14:54 πŸ”— JAA 665k blogs
15:29 πŸ”— Fusl has quit IRC (Quit: oh god please send help)
15:31 πŸ”— Fusl has joined #archiveteam-bs
15:58 πŸ”— arkiver blox.pl sounds like a warrior project
15:58 πŸ”— arkiver :)
15:59 πŸ”— arkiver Can probably set something up tomorrow if itΒ΄s not too difficult to archive
16:04 πŸ”— PurpleSym arkiver: I should have a first list ready by then. As for archiving: If wpull preserves cookies, you should ignore URLs like this one: http://xxx.blox.pl/html?mobile=yes
16:43 πŸ”— bitBaron has joined #archiveteam-bs
16:53 πŸ”— bitBaron has quit IRC (My computer has gone to sleep. 😴πŸ˜ͺZZZzzz…)
16:59 πŸ”— bitBaron has joined #archiveteam-bs
18:44 πŸ”— astrid has left ][
18:57 πŸ”— adwus has joined #archiveteam-bs
19:08 πŸ”— Exairnous has joined #archiveteam-bs
19:09 πŸ”— adwus has quit IRC (Ping timeout: 360 seconds)
19:43 πŸ”— schbirid has joined #archiveteam-bs
19:56 πŸ”— godane has quit IRC (Read error: Connection reset by peer)
19:57 πŸ”— sep332 has joined #archiveteam-bs
19:58 πŸ”— BartoCH has quit IRC (Ping timeout: 615 seconds)
20:02 πŸ”— BartoCH has joined #archiveteam-bs
20:14 πŸ”— godane has joined #archiveteam-bs
20:22 πŸ”— BartoCH has quit IRC (Ping timeout: 615 seconds)
20:32 πŸ”— fredgido has quit IRC (Read error: Connection reset by peer)
20:36 πŸ”— m007a83_ has joined #archiveteam-bs
20:39 πŸ”— m007a83 has quit IRC (Ping timeout: 252 seconds)
20:48 πŸ”— bitBaron has quit IRC (My computer has gone to sleep. 😴πŸ˜ͺZZZzzz…)
20:57 πŸ”— bitBaron has joined #archiveteam-bs
21:06 πŸ”— BartoCH has joined #archiveteam-bs
21:09 πŸ”— bitBaron has quit IRC (Quit: My computer has gone to sleep. 😴πŸ˜ͺZZZzzz…)
21:09 πŸ”— bitBaron has joined #archiveteam-bs
21:13 πŸ”— Despatche has quit IRC (Read error: Operation timed out)
21:19 πŸ”— VerifiedJ has joined #archiveteam-bs
21:21 πŸ”— VerifiedJ has quit IRC (Client Quit)
21:22 πŸ”— VerifiedJ has joined #archiveteam-bs
21:39 πŸ”— fredgido has joined #archiveteam-bs
22:16 πŸ”— fredgido_ has joined #archiveteam-bs
22:16 πŸ”— Albardin has quit IRC (Read error: Operation timed out)
22:16 πŸ”— paul2520 has quit IRC (Read error: Operation timed out)
22:16 πŸ”— dxrt_ has quit IRC (Write error: Broken pipe)
22:16 πŸ”— kiska1 has quit IRC (Read error: Operation timed out)
22:17 πŸ”— sep332 has quit IRC (Write error: Broken pipe)
22:17 πŸ”— HashbangI has quit IRC (Read error: Operation timed out)
22:18 πŸ”— PotcFdk has quit IRC (Read error: Operation timed out)
22:18 πŸ”— step has quit IRC (Read error: Operation timed out)
22:19 πŸ”— qw3rty114 has quit IRC (Read error: Operation timed out)
22:21 πŸ”— Fusl has quit IRC (Read error: Operation timed out)
22:21 πŸ”— PhrackD has quit IRC (Ping timeout: 600 seconds)
22:22 πŸ”— Fusl has joined #archiveteam-bs
22:22 πŸ”— Albardin has joined #archiveteam-bs
22:22 πŸ”— kiska1 has joined #archiveteam-bs
22:22 πŸ”— qw3rty114 has joined #archiveteam-bs
22:22 πŸ”— fredgido has quit IRC (Read error: Operation timed out)
22:22 πŸ”— TigerbotH has quit IRC (Read error: Connection reset by peer)
22:23 πŸ”— paul2520 has joined #archiveteam-bs
22:23 πŸ”— GLaDOS has quit IRC (Ping timeout: 600 seconds)
22:25 πŸ”— GLaDOS has joined #archiveteam-bs
22:25 πŸ”— PhrackD has joined #archiveteam-bs
22:26 πŸ”— TigerbotH has joined #archiveteam-bs
22:26 πŸ”— sep332 has joined #archiveteam-bs
22:27 πŸ”— BlueMax has joined #archiveteam-bs
22:28 πŸ”— HashbangI has joined #archiveteam-bs
22:30 πŸ”— dxrt_ has joined #archiveteam-bs
22:30 πŸ”— dxrt sets mode: +o dxrt_
22:30 πŸ”— step has joined #archiveteam-bs
22:43 πŸ”— icedice has joined #archiveteam-bs
23:11 πŸ”— PotcFdk has joined #archiveteam-bs
23:13 πŸ”— schbirid has quit IRC (Remote host closed the connection)
23:14 πŸ”— ndiddy has joined #archiveteam-bs
23:48 πŸ”— Mateon1 has quit IRC (Read error: Operation timed out)
23:48 πŸ”— Mateon1 has joined #archiveteam-bs
23:50 πŸ”— wp494 has quit IRC (Read error: Operation timed out)
23:50 πŸ”— icedice has quit IRC (Read error: Operation timed out)
23:51 πŸ”— wp494 has joined #archiveteam-bs
23:56 πŸ”— Sk1d has quit IRC (Read error: Operation timed out)
23:59 πŸ”— Sk1d has joined #archiveteam-bs

irclogger-viewer