#archiveteam 2018-02-18,Sun

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***kitties has joined #archiveteam [00:58]
....... (idle for 30mn)
Freddo has joined #archiveteam
RichardG has quit IRC (Ping timeout: 245 seconds)
Freddo has quit IRC (Client Quit)
[01:28]
..... (idle for 24mn)
namespaceIf I have UC Berkley Lectures sitting around.
Where do I put them?
I downloaded some and then never uploaded them.
[01:52]
***Burak has joined #archiveteam
Svekla has quit IRC (Read error: Connection reset by peer)
[02:01]
....... (idle for 33mn)
ItsYoda has quit IRC (Ping timeout: 260 seconds) [02:34]
............. (idle for 1h4mn)
ItsYoda has joined #archiveteam [03:38]
............ (idle for 56mn)
conradev has quit IRC (Quit: ...)
conradev has joined #archiveteam
[04:34]
qw3rty115 has joined #archiveteam
ItsYoda has quit IRC (Ping timeout: 260 seconds)
qw3rty114 has quit IRC (Read error: Operation timed out)
[04:43]
...... (idle for 27mn)
ranma has quit IRC (Ping timeout: 260 seconds)
ItsYoda has joined #archiveteam
[05:13]
.... (idle for 18mn)
vitzli has joined #archiveteam [05:33]
...... (idle for 29mn)
kitties has quit IRC (Quit: Connection closed for inactivity) [06:02]
indrora has joined #archiveteam [06:07]
indroraWikispaces declared Jul 31 as the day all non-private wikis are going down forever. Turns out their API allows for a sitemap.xml which is as complete as I can surmise, which makes it good for scraping.
There is one problem: For huge sites, it returns not one single sitemap, but a "mutlipart" sitemap
[06:08]
(why not use --mirror? Because AJAX) [06:16]
..... (idle for 24mn)
***vitzli has quit IRC (Quit: Leaving) [06:40]
........................... (idle for 2h11mn)
bRick5772 has joined #archiveteam [08:51]
schbirid has joined #archiveteam [09:03]
.................. (idle for 1h28mn)
paparus has joined #archiveteam
paparus has quit IRC (Client Quit)
Stiletto has joined #archiveteam
muramasa has quit IRC (Read error: Operation timed out)
Stilett0 has quit IRC (Read error: Operation timed out)
muramasa has joined #archiveteam
BlueMax has quit IRC (Leaving)
[10:31]
........................................... (idle for 3h31mn)
RichardG has joined #archiveteam [14:05]
........... (idle for 54mn)
ranavalon has joined #archiveteam
ranavalon has quit IRC (Client Quit)
[14:59]
Pixi has quit IRC (Quit: Pixi)
Pixi has joined #archiveteam
[15:08]
............ (idle for 59mn)
SketchCowWikispaces should be the thing we go after
It's awful
[16:07]
..... (idle for 21mn)
***Burak has quit IRC (Read error: Connection reset by peer)
Burak has joined #archiveteam
atrocity has quit IRC (Read error: Connection reset by peer)
[16:28]
....... (idle for 30mn)
RichardG has quit IRC (Read error: Operation timed out)
RichardG has joined #archiveteam
djbeadle has joined #archiveteam
RichardG has quit IRC (Read error: Connection reset by peer)
RichardG has joined #archiveteam
[16:59]
................ (idle for 1h19mn)
Mateon1 has quit IRC (Read error: Operation timed out)
Mateon1 has joined #archiveteam
[18:30]
.... (idle for 17mn)
RichardG has quit IRC (Read error: Connection reset by peer)
RichardG has joined #archiveteam
ats has quit IRC (Read error: Operation timed out)
ats has joined #archiveteam
[18:48]
...... (idle for 26mn)
obelisk has joined #archiveteam [19:17]
schbirid has quit IRC (Leaving)
schbirid has joined #archiveteam
[19:29]
........ (idle for 37mn)
BlueMax has joined #archiveteam [20:06]
..... (idle for 23mn)
___ has joined #archiveteam
___ has quit IRC (Client Quit)
octothorp has quit IRC (Read error: Connection reset by peer)
octothorp has joined #archiveteam
[20:29]
sekolyn has joined #archiveteam
octothorp has quit IRC (Read error: Connection reset by peer)
[20:39]
|Ripley| has quit IRC (Quit: ZNC 1.6.3 - http://znc.in) [20:44]
lexiconda has joined #archiveteam
lexiconda is now known as lexicon
[20:55]
K4k has quit IRC (Read error: Connection reset by peer) [21:01]
obelisk has quit IRC (Remote host closed the connection) [21:11]
indroraokay, so I have some terrible bash logic that checks if the sitemap is multipart and does some terrible egrep/sed pipelining to get the "complete" sitemap
` grep -q '<sitemapindex' sitemap.${WIKI} && cat sitemap.${WIKI} | egrep -o 'http://${WIKI}[^<]+' | sed 's:\&amp\;:\&:' | wget -O- -q -i - >sitemap.complete `
[21:20]
***|Ripley| has joined #archiveteam [21:32]
...... (idle for 25mn)
bRick5772 has quit IRC (Quit: Leaving.) [21:57]
..... (idle for 22mn)
BlueMax has quit IRC (Read error: Connection reset by peer) [22:19]
djbeadle has quit IRC (djbeadle) [22:26]
..... (idle for 22mn)
dogsrcool has quit IRC (Quit: Ping timeout (120 seconds))
dogsrcool has joined #archiveteam
[22:48]
robink has quit IRC (Read error: Connection reset by peer)
robink has joined #archiveteam
[22:59]
indroraOkay, now to make wget ignore the terrible in-browser JS hackery [23:08]
..... (idle for 23mn)
***jschwart has quit IRC (Konversation terminated!) [23:31]
godane has quit IRC (Read error: Operation timed out) [23:40]
indroraFantastic, figured that one out. [23:40]
***godane has joined #archiveteam [23:43]
JAAindrora: Let's move this to #archiveteam-bs please. This channel is mostly intended for announcements.
(As in, "ohshitohshitohshit this site is going down!!"-type messages.)
[23:45]
***godane has quit IRC (Client Quit)
godane has joined #archiveteam
robink has quit IRC (Read error: Connection reset by peer)
robink has joined #archiveteam
[23:47]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)