#archiveteam 2018-02-18,Sun

↑back Search

Time Nickname Message
00:58 🔗 kitties has joined #archiveteam
01:28 🔗 Freddo has joined #archiveteam
01:28 🔗 RichardG has quit IRC (Ping timeout: 245 seconds)
01:28 🔗 Freddo has quit IRC (Client Quit)
01:52 🔗 namespace If I have UC Berkley Lectures sitting around.
01:52 🔗 namespace Where do I put them?
01:53 🔗 namespace I downloaded some and then never uploaded them.
02:01 🔗 Burak has joined #archiveteam
02:01 🔗 Svekla has quit IRC (Read error: Connection reset by peer)
02:34 🔗 ItsYoda has quit IRC (Ping timeout: 260 seconds)
03:38 🔗 ItsYoda has joined #archiveteam
04:34 🔗 conradev has quit IRC (Quit: ...)
04:37 🔗 conradev has joined #archiveteam
04:43 🔗 qw3rty115 has joined #archiveteam
04:46 🔗 ItsYoda has quit IRC (Ping timeout: 260 seconds)
04:46 🔗 qw3rty114 has quit IRC (Read error: Operation timed out)
05:13 🔗 ranma has quit IRC (Ping timeout: 260 seconds)
05:15 🔗 ItsYoda has joined #archiveteam
05:33 🔗 vitzli has joined #archiveteam
06:02 🔗 kitties has quit IRC (Quit: Connection closed for inactivity)
06:07 🔗 indrora has joined #archiveteam
06:08 🔗 indrora Wikispaces declared Jul 31 as the day all non-private wikis are going down forever. Turns out their API allows for a sitemap.xml which is as complete as I can surmise, which makes it good for scraping.
06:11 🔗 indrora There is one problem: For huge sites, it returns not one single sitemap, but a "mutlipart" sitemap
06:16 🔗 indrora (why not use --mirror? Because AJAX)
06:40 🔗 vitzli has quit IRC (Quit: Leaving)
08:51 🔗 bRick5772 has joined #archiveteam
09:03 🔗 schbirid has joined #archiveteam
10:31 🔗 paparus has joined #archiveteam
10:32 🔗 paparus has quit IRC (Client Quit)
10:32 🔗 Stiletto has joined #archiveteam
10:32 🔗 muramasa has quit IRC (Read error: Operation timed out)
10:33 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
10:34 🔗 muramasa has joined #archiveteam
10:34 🔗 BlueMax has quit IRC (Leaving)
14:05 🔗 RichardG has joined #archiveteam
14:59 🔗 ranavalon has joined #archiveteam
15:01 🔗 ranavalon has quit IRC (Client Quit)
15:08 🔗 Pixi has quit IRC (Quit: Pixi)
15:08 🔗 Pixi has joined #archiveteam
16:07 🔗 SketchCow Wikispaces should be the thing we go after
16:07 🔗 SketchCow It's awful
16:28 🔗 Burak has quit IRC (Read error: Connection reset by peer)
16:28 🔗 Burak has joined #archiveteam
16:29 🔗 atrocity has quit IRC (Read error: Connection reset by peer)
16:59 🔗 RichardG has quit IRC (Read error: Operation timed out)
17:03 🔗 RichardG has joined #archiveteam
17:06 🔗 djbeadle has joined #archiveteam
17:10 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
17:11 🔗 RichardG has joined #archiveteam
18:30 🔗 Mateon1 has quit IRC (Read error: Operation timed out)
18:31 🔗 Mateon1 has joined #archiveteam
18:48 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
18:49 🔗 RichardG has joined #archiveteam
18:51 🔗 ats has quit IRC (Read error: Operation timed out)
18:51 🔗 ats has joined #archiveteam
19:17 🔗 obelisk has joined #archiveteam
19:29 🔗 schbirid has quit IRC (Leaving)
19:29 🔗 schbirid has joined #archiveteam
20:06 🔗 BlueMax has joined #archiveteam
20:29 🔗 ___ has joined #archiveteam
20:30 🔗 ___ has quit IRC (Client Quit)
20:32 🔗 octothorp has quit IRC (Read error: Connection reset by peer)
20:33 🔗 octothorp has joined #archiveteam
20:39 🔗 sekolyn has joined #archiveteam
20:39 🔗 octothorp has quit IRC (Read error: Connection reset by peer)
20:44 🔗 |Ripley| has quit IRC (Quit: ZNC 1.6.3 - http://znc.in)
20:55 🔗 lexiconda has joined #archiveteam
20:55 🔗 lexiconda is now known as lexicon
21:01 🔗 K4k has quit IRC (Read error: Connection reset by peer)
21:11 🔗 obelisk has quit IRC (Remote host closed the connection)
21:20 🔗 indrora okay, so I have some terrible bash logic that checks if the sitemap is multipart and does some terrible egrep/sed pipelining to get the "complete" sitemap
21:21 🔗 indrora ` grep -q '<sitemapindex' sitemap.${WIKI} && cat sitemap.${WIKI} | egrep -o 'http://${WIKI}[^<]+' | sed 's:\&amp\;:\&:' | wget -O- -q -i - >sitemap.complete `
21:32 🔗 |Ripley| has joined #archiveteam
21:57 🔗 bRick5772 has quit IRC (Quit: Leaving.)
22:19 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
22:26 🔗 djbeadle has quit IRC (djbeadle)
22:48 🔗 dogsrcool has quit IRC (Quit: Ping timeout (120 seconds))
22:49 🔗 dogsrcool has joined #archiveteam
22:59 🔗 robink has quit IRC (Read error: Connection reset by peer)
23:01 🔗 robink has joined #archiveteam
23:08 🔗 indrora Okay, now to make wget ignore the terrible in-browser JS hackery
23:31 🔗 jschwart has quit IRC (Konversation terminated!)
23:40 🔗 godane has quit IRC (Read error: Operation timed out)
23:40 🔗 indrora Fantastic, figured that one out.
23:43 🔗 godane has joined #archiveteam
23:45 🔗 JAA indrora: Let's move this to #archiveteam-bs please. This channel is mostly intended for announcements.
23:46 🔗 JAA (As in, "ohshitohshitohshit this site is going down!!"-type messages.)
23:47 🔗 godane has quit IRC (Client Quit)
23:48 🔗 godane has joined #archiveteam
23:49 🔗 robink has quit IRC (Read error: Connection reset by peer)
23:51 🔗 robink has joined #archiveteam

irclogger-viewer