Time |
Nickname |
Message |
00:58
🔗
|
|
kitties has joined #archiveteam |
01:28
🔗
|
|
Freddo has joined #archiveteam |
01:28
🔗
|
|
RichardG has quit IRC (Ping timeout: 245 seconds) |
01:28
🔗
|
|
Freddo has quit IRC (Client Quit) |
01:52
🔗
|
namespace |
If I have UC Berkley Lectures sitting around. |
01:52
🔗
|
namespace |
Where do I put them? |
01:53
🔗
|
namespace |
I downloaded some and then never uploaded them. |
02:01
🔗
|
|
Burak has joined #archiveteam |
02:01
🔗
|
|
Svekla has quit IRC (Read error: Connection reset by peer) |
02:34
🔗
|
|
ItsYoda has quit IRC (Ping timeout: 260 seconds) |
03:38
🔗
|
|
ItsYoda has joined #archiveteam |
04:34
🔗
|
|
conradev has quit IRC (Quit: ...) |
04:37
🔗
|
|
conradev has joined #archiveteam |
04:43
🔗
|
|
qw3rty115 has joined #archiveteam |
04:46
🔗
|
|
ItsYoda has quit IRC (Ping timeout: 260 seconds) |
04:46
🔗
|
|
qw3rty114 has quit IRC (Read error: Operation timed out) |
05:13
🔗
|
|
ranma has quit IRC (Ping timeout: 260 seconds) |
05:15
🔗
|
|
ItsYoda has joined #archiveteam |
05:33
🔗
|
|
vitzli has joined #archiveteam |
06:02
🔗
|
|
kitties has quit IRC (Quit: Connection closed for inactivity) |
06:07
🔗
|
|
indrora has joined #archiveteam |
06:08
🔗
|
indrora |
Wikispaces declared Jul 31 as the day all non-private wikis are going down forever. Turns out their API allows for a sitemap.xml which is as complete as I can surmise, which makes it good for scraping. |
06:11
🔗
|
indrora |
There is one problem: For huge sites, it returns not one single sitemap, but a "mutlipart" sitemap |
06:16
🔗
|
indrora |
(why not use --mirror? Because AJAX) |
06:40
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
08:51
🔗
|
|
bRick5772 has joined #archiveteam |
09:03
🔗
|
|
schbirid has joined #archiveteam |
10:31
🔗
|
|
paparus has joined #archiveteam |
10:32
🔗
|
|
paparus has quit IRC (Client Quit) |
10:32
🔗
|
|
Stiletto has joined #archiveteam |
10:32
🔗
|
|
muramasa has quit IRC (Read error: Operation timed out) |
10:33
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
10:34
🔗
|
|
muramasa has joined #archiveteam |
10:34
🔗
|
|
BlueMax has quit IRC (Leaving) |
14:05
🔗
|
|
RichardG has joined #archiveteam |
14:59
🔗
|
|
ranavalon has joined #archiveteam |
15:01
🔗
|
|
ranavalon has quit IRC (Client Quit) |
15:08
🔗
|
|
Pixi has quit IRC (Quit: Pixi) |
15:08
🔗
|
|
Pixi has joined #archiveteam |
16:07
🔗
|
SketchCow |
Wikispaces should be the thing we go after |
16:07
🔗
|
SketchCow |
It's awful |
16:28
🔗
|
|
Burak has quit IRC (Read error: Connection reset by peer) |
16:28
🔗
|
|
Burak has joined #archiveteam |
16:29
🔗
|
|
atrocity has quit IRC (Read error: Connection reset by peer) |
16:59
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
17:03
🔗
|
|
RichardG has joined #archiveteam |
17:06
🔗
|
|
djbeadle has joined #archiveteam |
17:10
🔗
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
17:11
🔗
|
|
RichardG has joined #archiveteam |
18:30
🔗
|
|
Mateon1 has quit IRC (Read error: Operation timed out) |
18:31
🔗
|
|
Mateon1 has joined #archiveteam |
18:48
🔗
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
18:49
🔗
|
|
RichardG has joined #archiveteam |
18:51
🔗
|
|
ats has quit IRC (Read error: Operation timed out) |
18:51
🔗
|
|
ats has joined #archiveteam |
19:17
🔗
|
|
obelisk has joined #archiveteam |
19:29
🔗
|
|
schbirid has quit IRC (Leaving) |
19:29
🔗
|
|
schbirid has joined #archiveteam |
20:06
🔗
|
|
BlueMax has joined #archiveteam |
20:29
🔗
|
|
___ has joined #archiveteam |
20:30
🔗
|
|
___ has quit IRC (Client Quit) |
20:32
🔗
|
|
octothorp has quit IRC (Read error: Connection reset by peer) |
20:33
🔗
|
|
octothorp has joined #archiveteam |
20:39
🔗
|
|
sekolyn has joined #archiveteam |
20:39
🔗
|
|
octothorp has quit IRC (Read error: Connection reset by peer) |
20:44
🔗
|
|
|Ripley| has quit IRC (Quit: ZNC 1.6.3 - http://znc.in) |
20:55
🔗
|
|
lexiconda has joined #archiveteam |
20:55
🔗
|
|
lexiconda is now known as lexicon |
21:01
🔗
|
|
K4k has quit IRC (Read error: Connection reset by peer) |
21:11
🔗
|
|
obelisk has quit IRC (Remote host closed the connection) |
21:20
🔗
|
indrora |
okay, so I have some terrible bash logic that checks if the sitemap is multipart and does some terrible egrep/sed pipelining to get the "complete" sitemap |
21:21
🔗
|
indrora |
` grep -q '<sitemapindex' sitemap.${WIKI} && cat sitemap.${WIKI} | egrep -o 'http://${WIKI}[^<]+' | sed 's:\&\;:\&:' | wget -O- -q -i - >sitemap.complete ` |
21:32
🔗
|
|
|Ripley| has joined #archiveteam |
21:57
🔗
|
|
bRick5772 has quit IRC (Quit: Leaving.) |
22:19
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
22:26
🔗
|
|
djbeadle has quit IRC (djbeadle) |
22:48
🔗
|
|
dogsrcool has quit IRC (Quit: Ping timeout (120 seconds)) |
22:49
🔗
|
|
dogsrcool has joined #archiveteam |
22:59
🔗
|
|
robink has quit IRC (Read error: Connection reset by peer) |
23:01
🔗
|
|
robink has joined #archiveteam |
23:08
🔗
|
indrora |
Okay, now to make wget ignore the terrible in-browser JS hackery |
23:31
🔗
|
|
jschwart has quit IRC (Konversation terminated!) |
23:40
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
23:40
🔗
|
indrora |
Fantastic, figured that one out. |
23:43
🔗
|
|
godane has joined #archiveteam |
23:45
🔗
|
JAA |
indrora: Let's move this to #archiveteam-bs please. This channel is mostly intended for announcements. |
23:46
🔗
|
JAA |
(As in, "ohshitohshitohshit this site is going down!!"-type messages.) |
23:47
🔗
|
|
godane has quit IRC (Client Quit) |
23:48
🔗
|
|
godane has joined #archiveteam |
23:49
🔗
|
|
robink has quit IRC (Read error: Connection reset by peer) |
23:51
🔗
|
|
robink has joined #archiveteam |