[00:10] *** wyatt874- is now known as wyatt8740 [00:27] *** fie__ has joined #archiveteam [00:31] *** fie_ has quit IRC (Ping timeout: 268 seconds) [01:04] *** JesseW has joined #archiveteam [01:07] *** primus104 has quit IRC (Leaving.) [01:21] *** w0rp has quit IRC (Ping timeout: 268 seconds) [01:21] *** w0rp has joined #archiveteam [02:21] *** dashcloud has quit IRC (Read error: Operation timed out) [02:30] *** dashcloud has joined #archiveteam [03:05] *** SilSte has quit IRC (Read error: Operation timed out) [03:05] *** SilSte has joined #archiveteam [03:05] *** BlueMaxim has joined #archiveteam [03:15] *** odie5533_ has quit IRC (Read error: Operation timed out) [03:18] *** db48x has joined #archiveteam [03:35] *** odie5533 has joined #archiveteam [03:35] *** odie5533 has quit IRC (Connection closed) [04:24] *** aaaaaaaaa has quit IRC (Leaving) [06:07] *** Ungstein has joined #archiveteam [06:16] *** khaoohs_ has joined #archiveteam [06:22] *** khaoohs__ has quit IRC (Read error: Operation timed out) [06:25] *** Dark_Star has quit IRC (Ping timeout: 600 seconds) [06:42] *** JesseW has quit IRC (Read error: Operation timed out) [06:44] has anyone looked into archiving the Arch wiki? [06:45] doesn't look like it's in Knowledge/Wikis in the archiveteam wiki template [06:55] *** Elegance has quit IRC (Quit: :(){ :|:& };:) [06:56] *** primus104 has joined #archiveteam [06:57] *** Elegance has joined #archiveteam [07:38] *** Rotab has joined #archiveteam [07:42] *** primus104 has quit IRC (Leaving.) [07:56] *** schbirid has joined #archiveteam [08:04] *** atomotic has joined #archiveteam [08:19] *** signius_ has quit IRC (Ping timeout: 252 seconds) [08:37] *** signius has joined #archiveteam [09:12] *** primus104 has joined #archiveteam [10:33] ploopkazo: https://github.com/lahwaacz/arch-wiki-docs and http://kmkeen.com/arch-wiki-lite/ are available as Arch packages, at least [10:35] Deewiant: no xml/zim dumps though? [11:00] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [11:06] *** primus104 has quit IRC (Leaving.) [11:07] ploopkazo: No idea, and no idea what could be missing from those [11:10] *** Ungstein has quit IRC (Quit: Leaving.) [11:15] *** Ungstein has joined #archiveteam [11:21] *** ete has joined #archiveteam [11:30] *** xk_id has joined #archiveteam [11:33] *** xk_id has quit IRC (Remote host closed the connection) [11:47] *** Froggypwn has quit IRC (Quit: ~ Trillian Astra - www.trillian.im ~) [11:52] *** BlueMaxim has quit IRC (Quit: Leaving) [11:57] *** xk_id has joined #archiveteam [12:06] *** atomotic has joined #archiveteam [12:27] *** xk_id has quit IRC (Remote host closed the connection) [12:29] *** xk_id has joined #archiveteam [12:49] *** xk_id has quit IRC (Remote host closed the connection) [12:51] *** xk_id has joined #archiveteam [13:01] *** zenguy_pc has quit IRC (Read error: Operation timed out) [13:02] *** xk_id has quit IRC (Remote host closed the connection) [13:04] *** xk_id has joined #archiveteam [13:07] *** mahavira has joined #archiveteam [13:08] http://www.archiveteam.org/index.php?title=Blogger Why doesn't this article mention ./sitemap.xml? [13:09] *** xk_id has quit IRC (Remote host closed the connection) [13:09] whatever.blogspot.com/sitemap.xml will more often than not provide direct links to every post. [13:11] *** xk_id has joined #archiveteam [13:11] *** Froggypwn has joined #archiveteam [13:13] *** brayden has quit IRC (Quit: Leaving) [13:23] *** primus104 has joined #archiveteam [13:27] *** xk_id has quit IRC (Remote host closed the connection) [13:28] *** xk_id has joined #archiveteam [13:28] *** zenguy_pc has joined #archiveteam [13:30] *** xk_id has quit IRC (Remote host closed the connection) [13:33] *** xk_id has joined #archiveteam [13:34] *** xk_id has quit IRC (Remote host closed the connection) [13:43] *** xk_id has joined #archiveteam [13:47] *** xk_id has quit IRC (Remote host closed the connection) [13:56] *** Start has quit IRC (Quit: Disconnected.) [14:01] *** Sk1d has quit IRC (Quit: ZNC - http://znc.in) [14:02] *** PurpleSym has joined #archiveteam [14:07] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [14:41] *** Start has joined #archiveteam [15:28] mahavira: in most cases like that it's usually just because nobody wrote it down [15:28] FWIW, wpull uses sitemap.xml if it is available [15:28] so depending on the machinery it is not necessary to explicitly mention sitemap availability [15:29] that said, feel free to add that information [15:30] I'm trying to lynx -dump urls from ./sitemap.xml but google blocks me after a couple requests. I'm about to start firefox and use extensions to do the job. Any ideas? [15:31] slowing down can help [15:31] wget -wait 15 --random-wait didn't work. What do you think would work? [15:31] user-agent tricks, Accept header stuff [15:31] Been there, done that [15:32] there's a body of successes at http://archive.fart.website/archivebot/viewer/?q=blogspot [15:32] and there's a bunch of stuff that a server can glom onto, so if you started out too aggressive then further tweaks can become irrelevant [15:32] try another machine, etc. [15:33] Another machine is not a option, I guess I'll just have to wait for a few hours before trying to do anything again. [15:34] *** xk_id has joined #archiveteam [15:34] I'm trying to download all blogs from X location ( blogspot profiles are tagged by location ), so that I can build a linguistic corpus out of them. [15:34] *** JesseW has joined #archiveteam [15:42] *** primus104 has quit IRC (Leaving.) [15:51] *** xk_id has quit IRC (Remote host closed the connection) [15:53] *** xk_id has joined #archiveteam [15:59] *** antomati_ has joined #archiveteam [16:01] *** Dark_Star has joined #archiveteam [16:02] *** Start has quit IRC (Quit: Disconnected.) [16:03] *** xk_id has quit IRC (Ping timeout: 600 seconds) [16:05] *** edsu_ is now known as edsu [16:06] *** antomatic has quit IRC (Ping timeout: 492 seconds) [16:07] *** JesseW has quit IRC (Read error: Operation timed out) [16:26] *** primus104 has joined #archiveteam [16:40] *** Start has joined #archiveteam [16:44] *** atomotic has joined #archiveteam [16:49] *** mahavira has quit IRC (Ping timeout: 483 seconds) [16:50] *** xk_id has joined #archiveteam [16:54] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [16:55] *** philpem has joined #archiveteam [17:00] *** primus104 has quit IRC (Leaving.) [17:33] *** primus104 has joined #archiveteam [17:37] *** Start has quit IRC (Quit: Disconnected.) [17:45] *** scyther has joined #archiveteam [17:59] *** superkuh has quit IRC (Remote host closed the connection) [18:05] *** Spiders has joined #archiveteam [18:16] *** aaaaaaaaa has joined #archiveteam [18:18] looks like archive.moe is having some problems. [18:19] *** db48x has quit IRC (Read error: Connection reset by peer) [18:23] *** Spiders has quit IRC (Quit: Page closed) [18:33] *** Fusl has quit IRC (Ping timeout: 600 seconds) [18:40] *** Fusl has joined #archiveteam [19:07] *** phuzion has quit IRC (Remote host closed the connection) [19:10] *** phuzion has joined #archiveteam [19:53] *** Start has joined #archiveteam [19:58] *** Start has quit IRC (Client Quit) [20:03] *** phuzion has quit IRC (Remote host closed the connection) [20:05] *** phuzion has joined #archiveteam [20:05] *** schbirid has quit IRC (Remote host closed the connection) [20:30] *** scyther has quit IRC (Quit: Leaving) [20:47] *** xk_id has quit IRC (Remote host closed the connection) [20:48] *** xk_id has joined #archiveteam [20:50] *** xk_id_ has joined #archiveteam [20:52] *** xk_id has quit IRC (Ping timeout: 252 seconds) [20:54] *** PurpleSym has quit IRC (Remote host closed the connection) [21:04] *** chfoo has quit IRC (Quit: chfoo) [21:09] *** Start has joined #archiveteam [21:12] *** chfoo has joined #archiveteam [21:15] *** Start has quit IRC (Quit: Disconnected.) [21:19] *** HarryCros is now known as HCross [21:22] *** chfoo has quit IRC (Quit: chfoo) [21:23] *** xk_id_ has quit IRC (Remote host closed the connection) [21:26] *** chfoo has joined #archiveteam [21:53] SketchCow: how's FOS on space? [21:55] blingee is restarted, the grab of rutracker and thiingiverse is running and comcast is almost finished [21:56] thingiverse has a lot of tiny files due to the fast grab we did [22:00] *** BlueMaxim has joined #archiveteam [22:09] *** Boltsie has quit IRC (Remote host closed the connection) [22:09] *** JSharp has quit IRC (Remote host closed the connection) [22:09] *** russss__ has quit IRC (Write error: Broken pipe) [22:09] *** zyphlar has quit IRC (Write error: Broken pipe) [22:10] *** nox has quit IRC (Read error: Operation timed out) [22:11] *** nox has joined #archiveteam [22:22] *** _desu_ has quit IRC (Remote host closed the connection) [22:22] *** filippo__ has quit IRC (Remote host closed the connection) [22:22] *** Ctrl-S has quit IRC (Read error: Connection reset by peer) [22:22] *** antonizoo has quit IRC (Remote host closed the connection) [22:23] *** russss__ has joined #archiveteam [22:26] *** JSharp has joined #archiveteam [22:32] *** Start has joined #archiveteam [22:41] *** zyphlar has joined #archiveteam [23:06] *** xk_id has joined #archiveteam [23:20] FOS has 4tb free [23:22] *** Boltsie has joined #archiveteam [23:47] *** primus104 has quit IRC (Leaving.) [23:48] *** Fusl has quit IRC (Ping timeout: 255 seconds) [23:55] *** filippo__ has joined #archiveteam [23:55] *** mksplg has quit IRC (Ping timeout: 506 seconds) [23:58] *** _desu_ has joined #archiveteam