Time |
Nickname |
Message |
00:01
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
00:14
🔗
|
|
DoomTay has joined #archiveteam |
00:22
🔗
|
|
Coderjoe has joined #archiveteam |
00:31
🔗
|
|
DoomTay has quit IRC (Ping timeout: 268 seconds) |
00:42
🔗
|
|
JesseW has joined #archiveteam |
01:09
🔗
|
|
BlueMaxim has joined #archiveteam |
01:46
🔗
|
|
ravetcofx has quit IRC (Remote host closed the connection) |
01:50
🔗
|
|
ravetcofx has joined #archiveteam |
02:05
🔗
|
|
kristian_ has joined #archiveteam |
02:21
🔗
|
|
Aranje has quit IRC (Ping timeout: 260 seconds) |
02:31
🔗
|
|
yipdw_ is now known as yipdw |
03:15
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
03:49
🔗
|
|
redlob_ has joined #archiveteam |
03:50
🔗
|
|
redlob has quit IRC (Read error: Operation timed out) |
04:24
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
04:29
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
04:31
🔗
|
|
Sk1d has joined #archiveteam |
04:43
🔗
|
|
JesseW has joined #archiveteam |
04:52
🔗
|
|
tfgbd_znc has quit IRC (Ping timeout: 633 seconds) |
04:56
🔗
|
|
DoomTay has joined #archiveteam |
05:06
🔗
|
|
tfgbd_znc has joined #archiveteam |
05:06
🔗
|
|
Honno has joined #archiveteam |
05:20
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
05:28
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
05:51
🔗
|
|
TC02 has quit IRC (Read error: Connection reset by peer) |
05:51
🔗
|
|
TC02 has joined #archiveteam |
05:55
🔗
|
|
vitzli has joined #archiveteam |
06:03
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
06:07
🔗
|
|
dashcloud has joined #archiveteam |
06:41
🔗
|
|
tomwsmf has quit IRC (Read error: Operation timed out) |
07:13
🔗
|
|
midas1 is now known as midas |
07:30
🔗
|
|
signius has quit IRC (Read error: Operation timed out) |
07:31
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
07:35
🔗
|
|
brayden has joined #archiveteam |
07:35
🔗
|
|
swebb sets mode: +o brayden |
07:37
🔗
|
|
brayden_ has quit IRC (Read error: Operation timed out) |
07:45
🔗
|
|
signius has joined #archiveteam |
07:45
🔗
|
|
redlob_ has quit IRC (Read error: Operation timed out) |
07:51
🔗
|
|
redlob has joined #archiveteam |
08:16
🔗
|
|
kristian_ has quit IRC (Leaving) |
08:58
🔗
|
|
z00nx has joined #archiveteam |
09:05
🔗
|
|
schbirid has joined #archiveteam |
09:29
🔗
|
|
WinterFox has joined #archiveteam |
09:58
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
10:07
🔗
|
|
dashcloud has joined #archiveteam |
11:17
🔗
|
Jon |
hey guys, does anyone know of anyone archiving old .plan files from famous-ish people? e.g. John Carmack's .plan is widely available, but apparently some other ID people published them too |
11:18
🔗
|
schbirid |
i have a lot |
11:20
🔗
|
schbirid |
buuuut i am on and off working on turning them into a nice interface or twitter bot so i am too keen on sharing =( |
11:22
🔗
|
schbirid |
bluesnews has a nice archive |
11:27
🔗
|
Jon |
schbirid: thanks |
11:27
🔗
|
Jon |
schbirid: please don't die or something before sharing :P |
11:27
🔗
|
schbirid |
:D |
11:28
🔗
|
midas |
yeah schbirid, dont do that. |
11:28
🔗
|
schbirid |
midas has no rights to say anything about this kind of stuff until the jamendo stuff is up |
11:28
🔗
|
midas |
ill shut up |
11:28
🔗
|
midas |
:p |
11:29
🔗
|
midas |
next week ill be switching back to my old ISP, so fast internet again |
11:38
🔗
|
|
Stiletto has quit IRC (Read error: Connection reset by peer) |
11:39
🔗
|
|
Stiletto has joined #archiveteam |
11:44
🔗
|
Nemo_bis |
aww sorry for the failed switch |
11:55
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
11:59
🔗
|
|
dashcloud has joined #archiveteam |
12:06
🔗
|
|
espes__ has quit IRC (Read error: Connection reset by peer) |
12:06
🔗
|
|
yeoldeto1 has quit IRC (Read error: Connection reset by peer) |
12:17
🔗
|
|
yeoldetoa has joined #archiveteam |
12:18
🔗
|
|
espes__ has joined #archiveteam |
12:22
🔗
|
|
Sanqui has quit IRC (Remote host closed the connection) |
12:27
🔗
|
|
Sanqui has joined #archiveteam |
12:56
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
13:00
🔗
|
|
WinterFox has quit IRC (Ping timeout: 492 seconds) |
13:42
🔗
|
|
Froggypwn has joined #archiveteam |
13:50
🔗
|
|
sep332 has quit IRC (Quit: konversation out) |
14:00
🔗
|
|
sep332 has joined #archiveteam |
14:39
🔗
|
|
JesseW has joined #archiveteam |
15:01
🔗
|
JesseW |
The sites described here are probably worth a look by ArchiveTeam (ping godane): http://listheory.prattsils.org/cataloging-plunder-thoughts-on-the-digital-text-sharing-underground/ |
15:04
🔗
|
SketchCow |
We archived UbuWeb some time ago. |
15:15
🔗
|
godane |
cool |
15:15
🔗
|
JesseW |
excellent |
15:18
🔗
|
godane |
i put this url in archivebot https://www.memoryoftheworld.org/ |
15:19
🔗
|
godane |
we only have one page of that site archive based on status |
15:25
🔗
|
|
tomwsmf has joined #archiveteam |
15:29
🔗
|
JesseW |
godane: thanks! |
15:32
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
15:32
🔗
|
|
JesseW has joined #archiveteam |
15:51
🔗
|
|
Aranje has joined #archiveteam |
15:57
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
15:58
🔗
|
|
K4k has quit IRC (Quit: WeeChat 1.5) |
15:58
🔗
|
|
K4k has joined #archiveteam |
16:06
🔗
|
SketchCow |
(It's 6.2tb of material left on the machine, although I have no idea how much of that is, say, some backups) |
16:06
🔗
|
|
DoomTay has joined #archiveteam |
16:07
🔗
|
SketchCow |
Or the aforementioned "Pipes". For example, the Hip-Hop pipe is 257gb for some reason. |
16:07
🔗
|
SketchCow |
Ha ha, I see why. |
16:08
🔗
|
SketchCow |
(There's a 230gb project in it, subdirectory.) |
16:16
🔗
|
SketchCow |
Oh see, 13gb of the 27gb remaining were hip-hop albums I forgot to shove in! (They're portions of the albums that an ill-advised combining of two archives happened) |
16:32
🔗
|
|
redlob_ has joined #archiveteam |
16:33
🔗
|
|
redlob has quit IRC (Read error: Operation timed out) |
16:36
🔗
|
DoomTay |
So Lego CHIMA basically concluded back in 2015, so there's no telling how much longer http://www.lego.com/en-us/chima will stay up. The thing about sticking it in ArchiveBot is that it seems it just won't download the actual video files despite giving it PhantomJS AND youtube-dl AND UA spoofing |
16:46
🔗
|
|
JW_work1 has quit IRC (Quit: Leaving.) |
16:52
🔗
|
|
bauruine has joined #archiveteam |
17:02
🔗
|
|
AlexLehm has joined #archiveteam |
17:04
🔗
|
|
JW_work has joined #archiveteam |
18:30
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
19:17
🔗
|
|
nicolas17 has joined #archiveteam |
19:18
🔗
|
nicolas17 |
hi archivers |
19:18
🔗
|
nicolas17 |
the forum is the only part of the OpenStreetMap infrastructure that isn't managed by the main OSM operations team, and its administrator seems to be Missing In Action, I think nobody else has server access |
19:19
🔗
|
nicolas17 |
they are at the point of considering setting up a new forum and pointing the forum.openstreetmap.org hostname to it |
19:20
🔗
|
nicolas17 |
but that means losing the existing data |
19:37
🔗
|
JW_work |
nicolas17: is there a link to the discussion about setting up a new forum? |
19:42
🔗
|
nicolas17 |
sec |
19:43
🔗
|
nicolas17 |
gee it'd be nice if gmane was up |
19:43
🔗
|
JW_work |
ha. it's being worked on |
19:43
🔗
|
nicolas17 |
https://lists.openstreetmap.org/pipermail/talk/2016-August/076580.html last post |
19:45
🔗
|
nicolas17 |
if I'm going to scrape, looks like I can get the raw bbcode by logging in and trying to quote a post |
19:50
🔗
|
JW_work |
we'll do a scrape in WARC format |
19:50
🔗
|
JW_work |
it should be available in a couple of days |
19:51
🔗
|
nicolas17 |
I once archived a small forum with plain old wget -r, and it got a *lot* of redundant stuff, like following links that returned the same thread in a different order, or thread?id=1 and post?id=2 giving pretty much the same content |
19:53
🔗
|
bai |
isn't there often a url parameter you can pass to those sorts of forums to get the crawler-friendly page, with less links, all canonicalized? |
19:53
🔗
|
bai |
or maybe based on googlebot useragent |
19:53
🔗
|
nicolas17 |
hm maybe |
19:54
🔗
|
nicolas17 |
that crap I archived with wget was an old phpbb |
20:03
🔗
|
nicolas17 |
bai: I just tried setting googlebot UA and I get the same page |
20:08
🔗
|
bai |
yeah I think there's a url parameter, like when you click a forum post link on google and you get the black-and-grey-on-white printer friendly view |
20:09
🔗
|
bai |
not having much luck searching for what that option is though |
20:09
🔗
|
nicolas17 |
are you sure this software supports such thing? :P |
20:09
🔗
|
bai |
also that may be specifically phpBB or one of the other popular ones, dunno about fluxBB |
20:09
🔗
|
nicolas17 |
ah |
20:23
🔗
|
xmc |
nicolas17: yeah we have a set of options to make crawling forums work much better |
20:32
🔗
|
|
DoomTay has joined #archiveteam |
20:52
🔗
|
schbirid |
i archived the forums some months ago, can't find any trace of it though =) |
20:52
🔗
|
schbirid |
iirc it was a bit harder than usual |
20:52
🔗
|
schbirid |
but otehrwise standard forum stuff |
20:53
🔗
|
nicolas17 |
"Digital objects last forever – or 5 years, whichever comes first" |
20:56
🔗
|
schbirid |
if no one else raises their hand, i will start a wget for it right now |
20:56
🔗
|
nicolas17 |
schbirid: I interpreted JW_work's message as hand raising already? |
20:57
🔗
|
xmc |
schbirid: go for it |
20:57
🔗
|
JW_work |
schbirid: I just started a #archivebot job |
20:57
🔗
|
JW_work |
but duplicate is probably fine |
20:57
🔗
|
schbirid |
yeah :) |
20:57
🔗
|
JW_work |
esspecially as the archivebot job seems to have failed... |
20:58
🔗
|
schbirid |
on it |
21:00
🔗
|
schbirid |
man, fluxbb is nice and clean |
21:06
🔗
|
yipdw |
JW_work: yes, there's a reason why it failed |
21:06
🔗
|
yipdw |
<JW_work> !a http://http://forum.openstreetmap.org/ --ignore-sets=forums |
21:07
🔗
|
JW_work |
yeah, I see that now :-/ |
21:07
🔗
|
schbirid |
runs well here |
21:09
🔗
|
DoomTay |
I wonder what would happen if the site gets overwhelmed. BZPower would say "The servers are too busy to handle your request". No idea what status code it would return though |
21:12
🔗
|
|
MMovie has quit IRC (Read error: Connection reset by peer) |
21:18
🔗
|
|
MMovie has joined #archiveteam |
21:21
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
21:28
🔗
|
|
RichardG has joined #archiveteam |
21:35
🔗
|
schbirid |
running nice and smooth, good night for now |
21:35
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
21:39
🔗
|
|
RichardG has quit IRC (Ping timeout: 370 seconds) |
21:57
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
22:17
🔗
|
|
Gfy has quit IRC (Read error: Operation timed out) |
22:27
🔗
|
|
Gfy has joined #archiveteam |
22:45
🔗
|
|
redlob_ has quit IRC (Read error: Operation timed out) |
22:46
🔗
|
|
RichardG has joined #archiveteam |
22:51
🔗
|
|
redlob has joined #archiveteam |
23:02
🔗
|
|
DoomTay has quit IRC (Ping timeout: 268 seconds) |
23:39
🔗
|
|
AlexLehm has quit IRC (Ping timeout: 260 seconds) |
23:40
🔗
|
|
RichardG_ has joined #archiveteam |
23:41
🔗
|
|
RichardG has quit IRC (Ping timeout: 370 seconds) |
23:42
🔗
|
|
ZeoNet has joined #archiveteam |
23:47
🔗
|
|
RichardG_ has quit IRC (Ping timeout: 370 seconds) |
23:48
🔗
|
|
RichardG has joined #archiveteam |
23:58
🔗
|
|
ZeoNet has quit IRC (Ping timeout: 244 seconds) |