| Time |
Nickname |
Message |
|
00:01
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
|
00:14
🔗
|
|
DoomTay has joined #archiveteam |
|
00:22
🔗
|
|
Coderjoe has joined #archiveteam |
|
00:31
🔗
|
|
DoomTay has quit IRC (Ping timeout: 268 seconds) |
|
00:42
🔗
|
|
JesseW has joined #archiveteam |
|
01:09
🔗
|
|
BlueMaxim has joined #archiveteam |
|
01:46
🔗
|
|
ravetcofx has quit IRC (Remote host closed the connection) |
|
01:50
🔗
|
|
ravetcofx has joined #archiveteam |
|
02:05
🔗
|
|
kristian_ has joined #archiveteam |
|
02:21
🔗
|
|
Aranje has quit IRC (Ping timeout: 260 seconds) |
|
02:31
🔗
|
|
yipdw_ is now known as yipdw |
|
03:15
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
|
03:49
🔗
|
|
redlob_ has joined #archiveteam |
|
03:50
🔗
|
|
redlob has quit IRC (Read error: Operation timed out) |
|
04:24
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
|
04:29
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
|
04:31
🔗
|
|
Sk1d has joined #archiveteam |
|
04:43
🔗
|
|
JesseW has joined #archiveteam |
|
04:52
🔗
|
|
tfgbd_znc has quit IRC (Ping timeout: 633 seconds) |
|
04:56
🔗
|
|
DoomTay has joined #archiveteam |
|
05:06
🔗
|
|
tfgbd_znc has joined #archiveteam |
|
05:06
🔗
|
|
Honno has joined #archiveteam |
|
05:20
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
|
05:28
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
|
05:51
🔗
|
|
TC02 has quit IRC (Read error: Connection reset by peer) |
|
05:51
🔗
|
|
TC02 has joined #archiveteam |
|
05:55
🔗
|
|
vitzli has joined #archiveteam |
|
06:03
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
06:07
🔗
|
|
dashcloud has joined #archiveteam |
|
06:41
🔗
|
|
tomwsmf has quit IRC (Read error: Operation timed out) |
|
07:13
🔗
|
|
midas1 is now known as midas |
|
07:30
🔗
|
|
signius has quit IRC (Read error: Operation timed out) |
|
07:31
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
|
07:35
🔗
|
|
brayden has joined #archiveteam |
|
07:35
🔗
|
|
swebb sets mode: +o brayden |
|
07:37
🔗
|
|
brayden_ has quit IRC (Read error: Operation timed out) |
|
07:45
🔗
|
|
signius has joined #archiveteam |
|
07:45
🔗
|
|
redlob_ has quit IRC (Read error: Operation timed out) |
|
07:51
🔗
|
|
redlob has joined #archiveteam |
|
08:16
🔗
|
|
kristian_ has quit IRC (Leaving) |
|
08:58
🔗
|
|
z00nx has joined #archiveteam |
|
09:05
🔗
|
|
schbirid has joined #archiveteam |
|
09:29
🔗
|
|
WinterFox has joined #archiveteam |
|
09:58
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
10:07
🔗
|
|
dashcloud has joined #archiveteam |
|
11:17
🔗
|
Jon |
hey guys, does anyone know of anyone archiving old .plan files from famous-ish people? e.g. John Carmack's .plan is widely available, but apparently some other ID people published them too |
|
11:18
🔗
|
schbirid |
i have a lot |
|
11:20
🔗
|
schbirid |
buuuut i am on and off working on turning them into a nice interface or twitter bot so i am too keen on sharing =( |
|
11:22
🔗
|
schbirid |
bluesnews has a nice archive |
|
11:27
🔗
|
Jon |
schbirid: thanks |
|
11:27
🔗
|
Jon |
schbirid: please don't die or something before sharing :P |
|
11:27
🔗
|
schbirid |
:D |
|
11:28
🔗
|
midas |
yeah schbirid, dont do that. |
|
11:28
🔗
|
schbirid |
midas has no rights to say anything about this kind of stuff until the jamendo stuff is up |
|
11:28
🔗
|
midas |
ill shut up |
|
11:28
🔗
|
midas |
:p |
|
11:29
🔗
|
midas |
next week ill be switching back to my old ISP, so fast internet again |
|
11:38
🔗
|
|
Stiletto has quit IRC (Read error: Connection reset by peer) |
|
11:39
🔗
|
|
Stiletto has joined #archiveteam |
|
11:44
🔗
|
Nemo_bis |
aww sorry for the failed switch |
|
11:55
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
11:59
🔗
|
|
dashcloud has joined #archiveteam |
|
12:06
🔗
|
|
espes__ has quit IRC (Read error: Connection reset by peer) |
|
12:06
🔗
|
|
yeoldeto1 has quit IRC (Read error: Connection reset by peer) |
|
12:17
🔗
|
|
yeoldetoa has joined #archiveteam |
|
12:18
🔗
|
|
espes__ has joined #archiveteam |
|
12:22
🔗
|
|
Sanqui has quit IRC (Remote host closed the connection) |
|
12:27
🔗
|
|
Sanqui has joined #archiveteam |
|
12:56
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
|
13:00
🔗
|
|
WinterFox has quit IRC (Ping timeout: 492 seconds) |
|
13:42
🔗
|
|
Froggypwn has joined #archiveteam |
|
13:50
🔗
|
|
sep332 has quit IRC (Quit: konversation out) |
|
14:00
🔗
|
|
sep332 has joined #archiveteam |
|
14:39
🔗
|
|
JesseW has joined #archiveteam |
|
15:01
🔗
|
JesseW |
The sites described here are probably worth a look by ArchiveTeam (ping godane): http://listheory.prattsils.org/cataloging-plunder-thoughts-on-the-digital-text-sharing-underground/ |
|
15:04
🔗
|
SketchCow |
We archived UbuWeb some time ago. |
|
15:15
🔗
|
godane |
cool |
|
15:15
🔗
|
JesseW |
excellent |
|
15:18
🔗
|
godane |
i put this url in archivebot https://www.memoryoftheworld.org/ |
|
15:19
🔗
|
godane |
we only have one page of that site archive based on status |
|
15:25
🔗
|
|
tomwsmf has joined #archiveteam |
|
15:29
🔗
|
JesseW |
godane: thanks! |
|
15:32
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
|
15:32
🔗
|
|
JesseW has joined #archiveteam |
|
15:51
🔗
|
|
Aranje has joined #archiveteam |
|
15:57
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
|
15:58
🔗
|
|
K4k has quit IRC (Quit: WeeChat 1.5) |
|
15:58
🔗
|
|
K4k has joined #archiveteam |
|
16:06
🔗
|
SketchCow |
(It's 6.2tb of material left on the machine, although I have no idea how much of that is, say, some backups) |
|
16:06
🔗
|
|
DoomTay has joined #archiveteam |
|
16:07
🔗
|
SketchCow |
Or the aforementioned "Pipes". For example, the Hip-Hop pipe is 257gb for some reason. |
|
16:07
🔗
|
SketchCow |
Ha ha, I see why. |
|
16:08
🔗
|
SketchCow |
(There's a 230gb project in it, subdirectory.) |
|
16:16
🔗
|
SketchCow |
Oh see, 13gb of the 27gb remaining were hip-hop albums I forgot to shove in! (They're portions of the albums that an ill-advised combining of two archives happened) |
|
16:32
🔗
|
|
redlob_ has joined #archiveteam |
|
16:33
🔗
|
|
redlob has quit IRC (Read error: Operation timed out) |
|
16:36
🔗
|
DoomTay |
So Lego CHIMA basically concluded back in 2015, so there's no telling how much longer http://www.lego.com/en-us/chima will stay up. The thing about sticking it in ArchiveBot is that it seems it just won't download the actual video files despite giving it PhantomJS AND youtube-dl AND UA spoofing |
|
16:46
🔗
|
|
JW_work1 has quit IRC (Quit: Leaving.) |
|
16:52
🔗
|
|
bauruine has joined #archiveteam |
|
17:02
🔗
|
|
AlexLehm has joined #archiveteam |
|
17:04
🔗
|
|
JW_work has joined #archiveteam |
|
18:30
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
|
19:17
🔗
|
|
nicolas17 has joined #archiveteam |
|
19:18
🔗
|
nicolas17 |
hi archivers |
|
19:18
🔗
|
nicolas17 |
the forum is the only part of the OpenStreetMap infrastructure that isn't managed by the main OSM operations team, and its administrator seems to be Missing In Action, I think nobody else has server access |
|
19:19
🔗
|
nicolas17 |
they are at the point of considering setting up a new forum and pointing the forum.openstreetmap.org hostname to it |
|
19:20
🔗
|
nicolas17 |
but that means losing the existing data |
|
19:37
🔗
|
JW_work |
nicolas17: is there a link to the discussion about setting up a new forum? |
|
19:42
🔗
|
nicolas17 |
sec |
|
19:43
🔗
|
nicolas17 |
gee it'd be nice if gmane was up |
|
19:43
🔗
|
JW_work |
ha. it's being worked on |
|
19:43
🔗
|
nicolas17 |
https://lists.openstreetmap.org/pipermail/talk/2016-August/076580.html last post |
|
19:45
🔗
|
nicolas17 |
if I'm going to scrape, looks like I can get the raw bbcode by logging in and trying to quote a post |
|
19:50
🔗
|
JW_work |
we'll do a scrape in WARC format |
|
19:50
🔗
|
JW_work |
it should be available in a couple of days |
|
19:51
🔗
|
nicolas17 |
I once archived a small forum with plain old wget -r, and it got a *lot* of redundant stuff, like following links that returned the same thread in a different order, or thread?id=1 and post?id=2 giving pretty much the same content |
|
19:53
🔗
|
bai |
isn't there often a url parameter you can pass to those sorts of forums to get the crawler-friendly page, with less links, all canonicalized? |
|
19:53
🔗
|
bai |
or maybe based on googlebot useragent |
|
19:53
🔗
|
nicolas17 |
hm maybe |
|
19:54
🔗
|
nicolas17 |
that crap I archived with wget was an old phpbb |
|
20:03
🔗
|
nicolas17 |
bai: I just tried setting googlebot UA and I get the same page |
|
20:08
🔗
|
bai |
yeah I think there's a url parameter, like when you click a forum post link on google and you get the black-and-grey-on-white printer friendly view |
|
20:09
🔗
|
bai |
not having much luck searching for what that option is though |
|
20:09
🔗
|
nicolas17 |
are you sure this software supports such thing? :P |
|
20:09
🔗
|
bai |
also that may be specifically phpBB or one of the other popular ones, dunno about fluxBB |
|
20:09
🔗
|
nicolas17 |
ah |
|
20:23
🔗
|
xmc |
nicolas17: yeah we have a set of options to make crawling forums work much better |
|
20:32
🔗
|
|
DoomTay has joined #archiveteam |
|
20:52
🔗
|
schbirid |
i archived the forums some months ago, can't find any trace of it though =) |
|
20:52
🔗
|
schbirid |
iirc it was a bit harder than usual |
|
20:52
🔗
|
schbirid |
but otehrwise standard forum stuff |
|
20:53
🔗
|
nicolas17 |
"Digital objects last forever – or 5 years, whichever comes first" |
|
20:56
🔗
|
schbirid |
if no one else raises their hand, i will start a wget for it right now |
|
20:56
🔗
|
nicolas17 |
schbirid: I interpreted JW_work's message as hand raising already? |
|
20:57
🔗
|
xmc |
schbirid: go for it |
|
20:57
🔗
|
JW_work |
schbirid: I just started a #archivebot job |
|
20:57
🔗
|
JW_work |
but duplicate is probably fine |
|
20:57
🔗
|
schbirid |
yeah :) |
|
20:57
🔗
|
JW_work |
esspecially as the archivebot job seems to have failed... |
|
20:58
🔗
|
schbirid |
on it |
|
21:00
🔗
|
schbirid |
man, fluxbb is nice and clean |
|
21:06
🔗
|
yipdw |
JW_work: yes, there's a reason why it failed |
|
21:06
🔗
|
yipdw |
<JW_work> !a http://http://forum.openstreetmap.org/ --ignore-sets=forums |
|
21:07
🔗
|
JW_work |
yeah, I see that now :-/ |
|
21:07
🔗
|
schbirid |
runs well here |
|
21:09
🔗
|
DoomTay |
I wonder what would happen if the site gets overwhelmed. BZPower would say "The servers are too busy to handle your request". No idea what status code it would return though |
|
21:12
🔗
|
|
MMovie has quit IRC (Read error: Connection reset by peer) |
|
21:18
🔗
|
|
MMovie has joined #archiveteam |
|
21:21
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
|
21:28
🔗
|
|
RichardG has joined #archiveteam |
|
21:35
🔗
|
schbirid |
running nice and smooth, good night for now |
|
21:35
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
|
21:39
🔗
|
|
RichardG has quit IRC (Ping timeout: 370 seconds) |
|
21:57
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
|
22:17
🔗
|
|
Gfy has quit IRC (Read error: Operation timed out) |
|
22:27
🔗
|
|
Gfy has joined #archiveteam |
|
22:45
🔗
|
|
redlob_ has quit IRC (Read error: Operation timed out) |
|
22:46
🔗
|
|
RichardG has joined #archiveteam |
|
22:51
🔗
|
|
redlob has joined #archiveteam |
|
23:02
🔗
|
|
DoomTay has quit IRC (Ping timeout: 268 seconds) |
|
23:39
🔗
|
|
AlexLehm has quit IRC (Ping timeout: 260 seconds) |
|
23:40
🔗
|
|
RichardG_ has joined #archiveteam |
|
23:41
🔗
|
|
RichardG has quit IRC (Ping timeout: 370 seconds) |
|
23:42
🔗
|
|
ZeoNet has joined #archiveteam |
|
23:47
🔗
|
|
RichardG_ has quit IRC (Ping timeout: 370 seconds) |
|
23:48
🔗
|
|
RichardG has joined #archiveteam |
|
23:58
🔗
|
|
ZeoNet has quit IRC (Ping timeout: 244 seconds) |