Time |
Nickname |
Message |
00:01
🔗
|
|
Cameron_D has joined #archiveteam-bs |
00:08
🔗
|
|
Pudsey has quit IRC (Remote host closed the connection) |
00:32
🔗
|
|
pigpengui is now known as pinguin |
00:41
🔗
|
|
brayden_ has quit IRC (Read error: Operation timed out) |
00:42
🔗
|
|
brayden has joined #archiveteam-bs |
00:42
🔗
|
|
swebb sets mode: +o brayden |
00:42
🔗
|
|
j08nY has quit IRC (Quit: Leaving) |
00:47
🔗
|
|
box41 has joined #archiveteam-bs |
01:06
🔗
|
|
dashcloud has quit IRC (Read error: Connection reset by peer) |
01:07
🔗
|
|
dashcloud has joined #archiveteam-bs |
01:13
🔗
|
|
DFJustin has quit IRC (Remote host closed the connection) |
01:14
🔗
|
|
box41 has quit IRC (Ping timeout: 268 seconds) |
01:15
🔗
|
|
powerKitt has joined #archiveteam-bs |
01:17
🔗
|
|
DFJustin has joined #archiveteam-bs |
01:17
🔗
|
|
swebb sets mode: +o DFJustin |
01:20
🔗
|
|
powerArch has joined #archiveteam-bs |
02:58
🔗
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
03:00
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
03:03
🔗
|
|
powerKitt has quit IRC (Quit: Page closed) |
03:21
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
03:23
🔗
|
|
pinguin has quit IRC (Read error: Connection reset by peer) |
03:54
🔗
|
|
ndiddy has quit IRC () |
04:58
🔗
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
05:03
🔗
|
|
Sk1d has joined #archiveteam-bs |
06:14
🔗
|
|
jspiros has quit IRC (Read error: Operation timed out) |
06:17
🔗
|
|
Mayonaise has quit IRC (Read error: Operation timed out) |
06:18
🔗
|
|
jspiros has joined #archiveteam-bs |
06:31
🔗
|
|
Mayonaise has joined #archiveteam-bs |
06:50
🔗
|
|
kristian_ has joined #archiveteam-bs |
07:18
🔗
|
godane |
SketchCow: i'm uploading some of my VHS Captures to FOS |
07:19
🔗
|
godane |
i have over 60gb of my VHS tapes i have captured |
07:20
🔗
|
|
schbirid has joined #archiveteam-bs |
08:14
🔗
|
|
j08nY has joined #archiveteam-bs |
08:38
🔗
|
|
Honno has quit IRC (Read error: Connection reset by peer) |
08:42
🔗
|
|
Honno has joined #archiveteam-bs |
08:52
🔗
|
|
kristian_ has quit IRC (Quit: Leaving) |
09:03
🔗
|
JAA |
Yet another example why robots.txt is awful: the website of the UN has a pretty restrictive file, but it still allowed significant parts to be scraped. At some point, it looks like they moved everything to localised pages, i.e. un.org/foo became un.org/en/foo. And naturally, they forgot to update robots.txt. |
09:09
🔗
|
|
GE has joined #archiveteam-bs |
09:13
🔗
|
GE |
http://forums.steampowered.com/forums/announcement.php?f=14 Anyone want to do an archive job? |
09:20
🔗
|
jtn2 |
GE: see #outofsteam, http://www.archiveteam.org/index.php?title=Steam |
09:21
🔗
|
GE |
Oh cool |
09:22
🔗
|
JAA |
"so you could save off or archive some of the content" |
09:22
🔗
|
JAA |
"some" |
09:29
🔗
|
|
kyounko|2 has joined #archiveteam-bs |
09:35
🔗
|
|
kyounko has quit IRC (Ping timeout: 492 seconds) |
09:54
🔗
|
|
brayden_ has joined #archiveteam-bs |
09:54
🔗
|
|
swebb sets mode: +o brayden_ |
09:54
🔗
|
|
brayden has quit IRC (Read error: Connection reset by peer) |
10:58
🔗
|
|
j08nY has quit IRC (Read error: Operation timed out) |
11:21
🔗
|
|
j08nY has joined #archiveteam-bs |
11:35
🔗
|
|
Aoede has quit IRC (Ping timeout: 268 seconds) |
11:38
🔗
|
|
j08nY has quit IRC (Read error: Operation timed out) |
11:49
🔗
|
|
Aoede has joined #archiveteam-bs |
11:54
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
11:56
🔗
|
|
BartoCH has joined #archiveteam-bs |
11:57
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
11:57
🔗
|
|
Aoede has quit IRC (Ping timeout: 268 seconds) |
12:01
🔗
|
|
Honno_ has joined #archiveteam-bs |
12:04
🔗
|
|
Honno__ has joined #archiveteam-bs |
12:05
🔗
|
|
Aoede has joined #archiveteam-bs |
12:06
🔗
|
|
j08nY has joined #archiveteam-bs |
12:07
🔗
|
|
Honno has quit IRC (Ping timeout: 370 seconds) |
12:09
🔗
|
|
Honno_ has quit IRC (Ping timeout: 370 seconds) |
12:23
🔗
|
|
GE has quit IRC (Remote host closed the connection) |
12:23
🔗
|
|
Aoede has quit IRC (Ping timeout: 268 seconds) |
12:34
🔗
|
|
Aoede has joined #archiveteam-bs |
12:39
🔗
|
|
Aoede has quit IRC (Ping timeout: 268 seconds) |
12:40
🔗
|
|
Aoede has joined #archiveteam-bs |
13:03
🔗
|
|
Boppen has quit IRC (hub.dk irc.du.se) |
13:10
🔗
|
|
Boppen has joined #archiveteam-bs |
14:04
🔗
|
|
Whopper_ has joined #archiveteam-bs |
14:09
🔗
|
|
Whopper has quit IRC (Ping timeout: 633 seconds) |
14:12
🔗
|
|
Honno has joined #archiveteam-bs |
14:13
🔗
|
|
Honno__ has quit IRC (Ping timeout: 370 seconds) |
14:18
🔗
|
|
Stiletto has joined #archiveteam-bs |
14:21
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
14:36
🔗
|
|
GE has joined #archiveteam-bs |
14:39
🔗
|
|
Honno has quit IRC (Ping timeout: 370 seconds) |
15:19
🔗
|
SketchCow |
All the data will be saved.. with few exceptions |
15:39
🔗
|
|
Honno has joined #archiveteam-bs |
16:13
🔗
|
|
RichardG_ has joined #archiveteam-bs |
16:19
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
16:27
🔗
|
|
Aranje has joined #archiveteam-bs |
16:37
🔗
|
|
kristian_ has joined #archiveteam-bs |
16:57
🔗
|
|
icedice has joined #archiveteam-bs |
18:10
🔗
|
|
kristian_ has quit IRC (Quit: Leaving) |
18:18
🔗
|
|
GE_ has joined #archiveteam-bs |
18:19
🔗
|
|
GE_ has quit IRC (Client Quit) |
18:21
🔗
|
|
GE has quit IRC (Ping timeout: 255 seconds) |
18:48
🔗
|
|
RichardG_ is now known as RichardG |
19:27
🔗
|
|
powerSPUF has joined #archiveteam-bs |
19:27
🔗
|
powerSPUF |
RIP in Peace http://mafiathesyndicate.com |
19:28
🔗
|
powerSPUF |
Apparently a phpbb update corrupted all the posts on the forum last month and they lost about five years of mafia games. No backups, of course. |
19:29
🔗
|
|
Akiva has joined #archiveteam-bs |
19:30
🔗
|
powerSPUF |
I'm getting this info on mafiathesyndicate second hand from a friend who used it. |
19:37
🔗
|
powerSPUF |
A preventive forum backup team like WikiTeam might come in handy in these situations. |
19:40
🔗
|
|
kristian_ has joined #archiveteam-bs |
19:42
🔗
|
Kaz |
forum devs are too stupid for their own good |
19:42
🔗
|
Kaz |
5 years without taking a backup, amazing |
19:46
🔗
|
powerSPUF |
https://www.phpbb.com/community/viewtopic.php?f=556&p=14717716 My favorite thing is that they were told to backup the database in this support thread for the forum software they're using, and they still didn't do it. |
19:52
🔗
|
|
Stiletto has quit IRC () |
19:52
🔗
|
powerSPUF |
I do think making a ForumTeam would be a good idea, to make backing up forums and the information they might hold easier. |
19:52
🔗
|
powerSPUF |
There's only going to be so many kinds of forum software. |
19:53
🔗
|
xmc |
i've got a project running to do that actually, but it's slow going |
19:53
🔗
|
xmc |
needs about 30 more hours of code before it's production grade in my view |
19:55
🔗
|
powerSPUF |
Oh? What's the plan for your project, xmc? |
19:55
🔗
|
schbirid |
i added wget lines for some forums to the wiki ages ago |
19:56
🔗
|
schbirid |
forums are fun to grab |
19:57
🔗
|
powerSPUF |
I know I came across a vbulletin page on there that link me to the cityofheroes-grab code. |
19:58
🔗
|
powerSPUF |
Looks like I could adapt it to the Steam Users' Forum grab, since both forums used the same software. |
19:58
🔗
|
powerSPUF |
*adapt it for |
19:58
🔗
|
schbirid |
just make sure to identify useless pages (like login pages with unique urls) and infinite loops (calendars etc) |
19:59
🔗
|
schbirid |
i also always dont grab single post urls as that is so incredibly redundant |
20:00
🔗
|
|
Akiva_ has joined #archiveteam-bs |
20:02
🔗
|
powerSPUF |
The main thing a script would be good for is using the "quote" feature of forum software to get the raw bbcode of posts. |
20:03
🔗
|
powerSPUF |
Of course, the question then becomes "what format should data be stored in" |
20:03
🔗
|
|
Akiva has quit IRC (Ping timeout: 245 seconds) |
20:06
🔗
|
powerSPUF |
Possibly WARC for the rendered pages (forums, threads, posts and members) as well as JSON for the raw bbcode and metadata. |
20:09
🔗
|
powerSPUF |
Then I could write scripts using the JSON to convert the forum dump to a format that another forum software can support. (Ideally a good open source forum software that supports all the features that the forum which the dump was exported from did.) |
20:12
🔗
|
schbirid |
just warc |
20:12
🔗
|
schbirid |
if someone wants to recreate the structure, that is enough |
20:12
🔗
|
schbirid |
[quote] would not be available as non-member for many/most forums, seems a waste of time |
20:13
🔗
|
xmc |
powerSPUF: i'm working on a thing that turns webforums into usenet feeds |
20:13
🔗
|
xmc |
archival-grade, but extensively transformative so probably not archiveteam material |
20:14
🔗
|
xmc |
if you use wget-warc-lua, we have a lua script on the github to grab in completeness a couple different forum types |
20:14
🔗
|
xmc |
https://github.com/ArchiveTeam/wget-lua-forum-scripts |
20:16
🔗
|
powerSPUF |
schbirid: This script would use the user's login credentials for that. |
20:20
🔗
|
schbirid |
that would be quite different from archiving public websites but whatever floats your baot |
20:23
🔗
|
powerSPUF |
Update on the mafia forum situation |
20:23
🔗
|
powerSPUF |
Apparently they did have 1 single backup, but it's corrupted somehow. |
20:24
🔗
|
|
GE has joined #archiveteam-bs |
20:24
🔗
|
powerSPUF |
I'm attempting to get a copy from the forum admins through my friend. I'll see what I can manage to recover. |
20:31
🔗
|
powerSPUF |
xmc, do you have any idea how to build a windows version of wget-warc-lua |
20:31
🔗
|
xmc |
no, and we usually recommend against working in windows because its filesystem tends to eat metadata |
20:33
🔗
|
|
Akiva_ has quit IRC (Ping timeout: 245 seconds) |
20:34
🔗
|
powerSPUF |
Guess I'll need to make a bootable linux USB with my archiving tools. |
20:39
🔗
|
xmc |
a virtual machine is also a good option, vmware player is free and pretty straightforward to use |
20:39
🔗
|
xmc |
most archiving isn't very cpu intensive |
20:46
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
20:53
🔗
|
|
Akiva_ has joined #archiveteam-bs |
21:09
🔗
|
|
Akiva_ has quit IRC (Remote host closed the connection) |
21:22
🔗
|
|
powerSPUF has quit IRC (Quit: AndroIRC - Android IRC Client ( http://www.androirc.com )) |
21:30
🔗
|
|
qwebirc54 has joined #archiveteam-bs |
21:32
🔗
|
|
qwebirc54 is now known as powerKitt |
22:09
🔗
|
|
Stilett0 has joined #archiveteam-bs |
22:18
🔗
|
|
dashcloud has joined #archiveteam-bs |
22:28
🔗
|
|
kristian_ has quit IRC (Quit: Leaving) |
22:33
🔗
|
|
GE has quit IRC (Remote host closed the connection) |
22:34
🔗
|
|
powerKitt has quit IRC (Ping timeout: 268 seconds) |
22:36
🔗
|
|
powerKitt has joined #archiveteam-bs |
22:39
🔗
|
powerKitt |
Is there anyway to define a custom robots.txt and sitemap.xml for wget? |
22:40
🔗
|
|
nicolas17 has joined #archiveteam-bs |
22:40
🔗
|
powerKitt |
nevermind |
22:40
🔗
|
nicolas17 |
apparently "Windows XP isn't supported anymore" also means Microsoft went out of their way to delete already-released updates from their download site |
22:42
🔗
|
nicolas17 |
https://support.microsoft.com/en-us/help/916089 -> https://support.microsoft.com/en-us/help/927891 -> http://www.microsoft.com/downloads/details.aspx?familyid=7a81b0cd-a0b9-497e-8a89-404327772e5a -> 404 not found |
22:42
🔗
|
|
Stiletto has joined #archiveteam-bs |
22:43
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
22:43
🔗
|
powerKitt |
Good job Microsoft. |
22:45
🔗
|
nicolas17 |
the internet archive wayback machine happens to have the English version of that file, but on a Windows XP with another language it says the update doesn't match the system language and doesn't install |
22:46
🔗
|
nicolas17 |
and I can't find "WindowsXP-KB927891-v3-x86-ESN.exe" anywhere else |
22:46
🔗
|
nicolas17 |
so uh |
22:46
🔗
|
dashcloud |
did you check the catalog? |
22:46
🔗
|
dashcloud |
Microsoft Catalog that is |
22:47
🔗
|
nicolas17 |
should we start archiving the files that remain? |
22:47
🔗
|
nicolas17 |
dashcloud: after my complete failure to find something else on the Catalog a few days ago, I didn't even think of checking now |
22:47
🔗
|
nicolas17 |
looks like it's there :O |
22:57
🔗
|
powerKitt |
https://github.com/ArchiveTeam/wget-lua-forum-scripts/blob/master/vbulletin.lua Can someone who is better at Wget LUA make it so I can just variables for the member, thread, and forum max numbers? |
22:58
🔗
|
powerKitt |
*can just use variables for |
22:58
🔗
|
|
Ravenloft has joined #archiveteam-bs |
22:58
🔗
|
powerKitt |
and be able to just increment through them |
23:08
🔗
|
hook54321 |
Is the archiveteam logo copyrighted? |
23:09
🔗
|
xmc |
yes, but you can use it |
23:10
🔗
|
hook54321 |
For what? |
23:10
🔗
|
dashcloud |
it looks like they switched from the kb format: https://support.microsoft.com/kb/65260 to this format: https://support.microsoft.com/en-us/help/900000 |
23:10
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
23:12
🔗
|
xmc |
hook54321: well what are you doing |
23:14
🔗
|
hook54321 |
the robotics team at my high school is trying to decide on a new "theme" for the team for next year, but all of the themes people made are horrible. I'm thinking of proposing archiveteam as a theme, mostly for a joke, I doubt it would win. |
23:16
🔗
|
powerKitt |
~~give archivebot a physical form so it can save physical collections~~ |
23:16
🔗
|
powerKitt |
(that was meant to be strikethrough) |