[00:01] *** Cameron_D has joined #archiveteam-bs [00:08] *** Pudsey has quit IRC (Remote host closed the connection) [00:32] *** pigpengui is now known as pinguin [00:41] *** brayden_ has quit IRC (Read error: Operation timed out) [00:42] *** brayden has joined #archiveteam-bs [00:42] *** swebb sets mode: +o brayden [00:42] *** j08nY has quit IRC (Quit: Leaving) [00:47] *** box41 has joined #archiveteam-bs [01:06] *** dashcloud has quit IRC (Read error: Connection reset by peer) [01:07] *** dashcloud has joined #archiveteam-bs [01:13] *** DFJustin has quit IRC (Remote host closed the connection) [01:14] *** box41 has quit IRC (Ping timeout: 268 seconds) [01:15] *** powerKitt has joined #archiveteam-bs [01:17] *** DFJustin has joined #archiveteam-bs [01:17] *** swebb sets mode: +o DFJustin [01:20] *** powerArch has joined #archiveteam-bs [02:58] *** BlueMaxim has quit IRC (Read error: Operation timed out) [03:00] *** BlueMaxim has joined #archiveteam-bs [03:03] *** powerKitt has quit IRC (Quit: Page closed) [03:21] *** dashcloud has quit IRC (Read error: Operation timed out) [03:23] *** pinguin has quit IRC (Read error: Connection reset by peer) [03:54] *** ndiddy has quit IRC () [04:58] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:03] *** Sk1d has joined #archiveteam-bs [06:14] *** jspiros has quit IRC (Read error: Operation timed out) [06:17] *** Mayonaise has quit IRC (Read error: Operation timed out) [06:18] *** jspiros has joined #archiveteam-bs [06:31] *** Mayonaise has joined #archiveteam-bs [06:50] *** kristian_ has joined #archiveteam-bs [07:18] SketchCow: i'm uploading some of my VHS Captures to FOS [07:19] i have over 60gb of my VHS tapes i have captured [07:20] *** schbirid has joined #archiveteam-bs [08:14] *** j08nY has joined #archiveteam-bs [08:38] *** Honno has quit IRC (Read error: Connection reset by peer) [08:42] *** Honno has joined #archiveteam-bs [08:52] *** kristian_ has quit IRC (Quit: Leaving) [09:03] Yet another example why robots.txt is awful: the website of the UN has a pretty restrictive file, but it still allowed significant parts to be scraped. At some point, it looks like they moved everything to localised pages, i.e. un.org/foo became un.org/en/foo. And naturally, they forgot to update robots.txt. [09:09] *** GE has joined #archiveteam-bs [09:13] http://forums.steampowered.com/forums/announcement.php?f=14 Anyone want to do an archive job? [09:20] GE: see #outofsteam, http://www.archiveteam.org/index.php?title=Steam [09:21] Oh cool [09:22] "so you could save off or archive some of the content" [09:22] "some" [09:29] *** kyounko|2 has joined #archiveteam-bs [09:35] *** kyounko has quit IRC (Ping timeout: 492 seconds) [09:54] *** brayden_ has joined #archiveteam-bs [09:54] *** swebb sets mode: +o brayden_ [09:54] *** brayden has quit IRC (Read error: Connection reset by peer) [10:58] *** j08nY has quit IRC (Read error: Operation timed out) [11:21] *** j08nY has joined #archiveteam-bs [11:35] *** Aoede has quit IRC (Ping timeout: 268 seconds) [11:38] *** j08nY has quit IRC (Read error: Operation timed out) [11:49] *** Aoede has joined #archiveteam-bs [11:54] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [11:56] *** BartoCH has joined #archiveteam-bs [11:57] *** BlueMaxim has quit IRC (Quit: Leaving) [11:57] *** Aoede has quit IRC (Ping timeout: 268 seconds) [12:01] *** Honno_ has joined #archiveteam-bs [12:04] *** Honno__ has joined #archiveteam-bs [12:05] *** Aoede has joined #archiveteam-bs [12:06] *** j08nY has joined #archiveteam-bs [12:07] *** Honno has quit IRC (Ping timeout: 370 seconds) [12:09] *** Honno_ has quit IRC (Ping timeout: 370 seconds) [12:23] *** GE has quit IRC (Remote host closed the connection) [12:23] *** Aoede has quit IRC (Ping timeout: 268 seconds) [12:34] *** Aoede has joined #archiveteam-bs [12:39] *** Aoede has quit IRC (Ping timeout: 268 seconds) [12:40] *** Aoede has joined #archiveteam-bs [13:03] *** Boppen has quit IRC (hub.dk irc.du.se) [13:10] *** Boppen has joined #archiveteam-bs [14:04] *** Whopper_ has joined #archiveteam-bs [14:09] *** Whopper has quit IRC (Ping timeout: 633 seconds) [14:12] *** Honno has joined #archiveteam-bs [14:13] *** Honno__ has quit IRC (Ping timeout: 370 seconds) [14:18] *** Stiletto has joined #archiveteam-bs [14:21] *** Stilett0 has quit IRC (Read error: Operation timed out) [14:36] *** GE has joined #archiveteam-bs [14:39] *** Honno has quit IRC (Ping timeout: 370 seconds) [15:19] All the data will be saved.. with few exceptions [15:39] *** Honno has joined #archiveteam-bs [16:13] *** RichardG_ has joined #archiveteam-bs [16:19] *** RichardG has quit IRC (Read error: Operation timed out) [16:27] *** Aranje has joined #archiveteam-bs [16:37] *** kristian_ has joined #archiveteam-bs [16:57] *** icedice has joined #archiveteam-bs [18:10] *** kristian_ has quit IRC (Quit: Leaving) [18:18] *** GE_ has joined #archiveteam-bs [18:19] *** GE_ has quit IRC (Client Quit) [18:21] *** GE has quit IRC (Ping timeout: 255 seconds) [18:48] *** RichardG_ is now known as RichardG [19:27] *** powerSPUF has joined #archiveteam-bs [19:27] RIP in Peace http://mafiathesyndicate.com [19:28] Apparently a phpbb update corrupted all the posts on the forum last month and they lost about five years of mafia games. No backups, of course. [19:29] *** Akiva has joined #archiveteam-bs [19:30] I'm getting this info on mafiathesyndicate second hand from a friend who used it. [19:37] A preventive forum backup team like WikiTeam might come in handy in these situations. [19:40] *** kristian_ has joined #archiveteam-bs [19:42] forum devs are too stupid for their own good [19:42] 5 years without taking a backup, amazing [19:46] https://www.phpbb.com/community/viewtopic.php?f=556&p=14717716 My favorite thing is that they were told to backup the database in this support thread for the forum software they're using, and they still didn't do it. [19:52] *** Stiletto has quit IRC () [19:52] I do think making a ForumTeam would be a good idea, to make backing up forums and the information they might hold easier. [19:52] There's only going to be so many kinds of forum software. [19:53] i've got a project running to do that actually, but it's slow going [19:53] needs about 30 more hours of code before it's production grade in my view [19:55] Oh? What's the plan for your project, xmc? [19:55] i added wget lines for some forums to the wiki ages ago [19:56] forums are fun to grab [19:57] I know I came across a vbulletin page on there that link me to the cityofheroes-grab code. [19:58] Looks like I could adapt it to the Steam Users' Forum grab, since both forums used the same software. [19:58] *adapt it for [19:58] just make sure to identify useless pages (like login pages with unique urls) and infinite loops (calendars etc) [19:59] i also always dont grab single post urls as that is so incredibly redundant [20:00] *** Akiva_ has joined #archiveteam-bs [20:02] The main thing a script would be good for is using the "quote" feature of forum software to get the raw bbcode of posts. [20:03] Of course, the question then becomes "what format should data be stored in" [20:03] *** Akiva has quit IRC (Ping timeout: 245 seconds) [20:06] Possibly WARC for the rendered pages (forums, threads, posts and members) as well as JSON for the raw bbcode and metadata. [20:09] Then I could write scripts using the JSON to convert the forum dump to a format that another forum software can support. (Ideally a good open source forum software that supports all the features that the forum which the dump was exported from did.) [20:12] just warc [20:12] if someone wants to recreate the structure, that is enough [20:12] [quote] would not be available as non-member for many/most forums, seems a waste of time [20:13] powerSPUF: i'm working on a thing that turns webforums into usenet feeds [20:13] archival-grade, but extensively transformative so probably not archiveteam material [20:14] if you use wget-warc-lua, we have a lua script on the github to grab in completeness a couple different forum types [20:14] https://github.com/ArchiveTeam/wget-lua-forum-scripts [20:16] schbirid: This script would use the user's login credentials for that. [20:20] that would be quite different from archiving public websites but whatever floats your baot [20:23] Update on the mafia forum situation [20:23] Apparently they did have 1 single backup, but it's corrupted somehow. [20:24] *** GE has joined #archiveteam-bs [20:24] I'm attempting to get a copy from the forum admins through my friend. I'll see what I can manage to recover. [20:31] xmc, do you have any idea how to build a windows version of wget-warc-lua [20:31] no, and we usually recommend against working in windows because its filesystem tends to eat metadata [20:33] *** Akiva_ has quit IRC (Ping timeout: 245 seconds) [20:34] Guess I'll need to make a bootable linux USB with my archiving tools. [20:39] a virtual machine is also a good option, vmware player is free and pretty straightforward to use [20:39] most archiving isn't very cpu intensive [20:46] *** schbirid has quit IRC (Quit: Leaving) [20:53] *** Akiva_ has joined #archiveteam-bs [21:09] *** Akiva_ has quit IRC (Remote host closed the connection) [21:22] *** powerSPUF has quit IRC (Quit: AndroIRC - Android IRC Client ( http://www.androirc.com )) [21:30] *** qwebirc54 has joined #archiveteam-bs [21:32] *** qwebirc54 is now known as powerKitt [22:09] *** Stilett0 has joined #archiveteam-bs [22:18] *** dashcloud has joined #archiveteam-bs [22:28] *** kristian_ has quit IRC (Quit: Leaving) [22:33] *** GE has quit IRC (Remote host closed the connection) [22:34] *** powerKitt has quit IRC (Ping timeout: 268 seconds) [22:36] *** powerKitt has joined #archiveteam-bs [22:39] Is there anyway to define a custom robots.txt and sitemap.xml for wget? [22:40] *** nicolas17 has joined #archiveteam-bs [22:40] nevermind [22:40] apparently "Windows XP isn't supported anymore" also means Microsoft went out of their way to delete already-released updates from their download site [22:42] https://support.microsoft.com/en-us/help/916089 -> https://support.microsoft.com/en-us/help/927891 -> http://www.microsoft.com/downloads/details.aspx?familyid=7a81b0cd-a0b9-497e-8a89-404327772e5a -> 404 not found [22:42] *** Stiletto has joined #archiveteam-bs [22:43] *** Stilett0 has quit IRC (Read error: Operation timed out) [22:43] Good job Microsoft. [22:45] the internet archive wayback machine happens to have the English version of that file, but on a Windows XP with another language it says the update doesn't match the system language and doesn't install [22:46] and I can't find "WindowsXP-KB927891-v3-x86-ESN.exe" anywhere else [22:46] so uh [22:46] did you check the catalog? [22:46] Microsoft Catalog that is [22:47] should we start archiving the files that remain? [22:47] dashcloud: after my complete failure to find something else on the Catalog a few days ago, I didn't even think of checking now [22:47] looks like it's there :O [22:57] https://github.com/ArchiveTeam/wget-lua-forum-scripts/blob/master/vbulletin.lua Can someone who is better at Wget LUA make it so I can just variables for the member, thread, and forum max numbers? [22:58] *can just use variables for [22:58] *** Ravenloft has joined #archiveteam-bs [22:58] and be able to just increment through them [23:08] Is the archiveteam logo copyrighted? [23:09] yes, but you can use it [23:10] For what? [23:10] it looks like they switched from the kb format: https://support.microsoft.com/kb/65260 to this format: https://support.microsoft.com/en-us/help/900000 [23:10] *** BlueMaxim has joined #archiveteam-bs [23:12] hook54321: well what are you doing [23:14] the robotics team at my high school is trying to decide on a new "theme" for the team for next year, but all of the themes people made are horrible. I'm thinking of proposing archiveteam as a theme, mostly for a joke, I doubt it would win. [23:16] ~~give archivebot a physical form so it can save physical collections~~ [23:16] (that was meant to be strikethrough)