#archiveteam-bs 2017-05-17,Wed

↑back Search

Time Nickname Message
00:01 🔗 Cameron_D has joined #archiveteam-bs
00:08 🔗 Pudsey has quit IRC (Remote host closed the connection)
00:32 🔗 pigpengui is now known as pinguin
00:41 🔗 brayden_ has quit IRC (Read error: Operation timed out)
00:42 🔗 brayden has joined #archiveteam-bs
00:42 🔗 swebb sets mode: +o brayden
00:42 🔗 j08nY has quit IRC (Quit: Leaving)
00:47 🔗 box41 has joined #archiveteam-bs
01:06 🔗 dashcloud has quit IRC (Read error: Connection reset by peer)
01:07 🔗 dashcloud has joined #archiveteam-bs
01:13 🔗 DFJustin has quit IRC (Remote host closed the connection)
01:14 🔗 box41 has quit IRC (Ping timeout: 268 seconds)
01:15 🔗 powerKitt has joined #archiveteam-bs
01:17 🔗 DFJustin has joined #archiveteam-bs
01:17 🔗 swebb sets mode: +o DFJustin
01:20 🔗 powerArch has joined #archiveteam-bs
02:58 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
03:00 🔗 BlueMaxim has joined #archiveteam-bs
03:03 🔗 powerKitt has quit IRC (Quit: Page closed)
03:21 🔗 dashcloud has quit IRC (Read error: Operation timed out)
03:23 🔗 pinguin has quit IRC (Read error: Connection reset by peer)
03:54 🔗 ndiddy has quit IRC ()
04:58 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
05:03 🔗 Sk1d has joined #archiveteam-bs
06:14 🔗 jspiros has quit IRC (Read error: Operation timed out)
06:17 🔗 Mayonaise has quit IRC (Read error: Operation timed out)
06:18 🔗 jspiros has joined #archiveteam-bs
06:31 🔗 Mayonaise has joined #archiveteam-bs
06:50 🔗 kristian_ has joined #archiveteam-bs
07:18 🔗 godane SketchCow: i'm uploading some of my VHS Captures to FOS
07:19 🔗 godane i have over 60gb of my VHS tapes i have captured
07:20 🔗 schbirid has joined #archiveteam-bs
08:14 🔗 j08nY has joined #archiveteam-bs
08:38 🔗 Honno has quit IRC (Read error: Connection reset by peer)
08:42 🔗 Honno has joined #archiveteam-bs
08:52 🔗 kristian_ has quit IRC (Quit: Leaving)
09:03 🔗 JAA Yet another example why robots.txt is awful: the website of the UN has a pretty restrictive file, but it still allowed significant parts to be scraped. At some point, it looks like they moved everything to localised pages, i.e. un.org/foo became un.org/en/foo. And naturally, they forgot to update robots.txt.
09:09 🔗 GE has joined #archiveteam-bs
09:13 🔗 GE http://forums.steampowered.com/forums/announcement.php?f=14 Anyone want to do an archive job?
09:20 🔗 jtn2 GE: see #outofsteam, http://www.archiveteam.org/index.php?title=Steam
09:21 🔗 GE Oh cool
09:22 🔗 JAA "so you could save off or archive some of the content"
09:22 🔗 JAA "some"
09:29 🔗 kyounko|2 has joined #archiveteam-bs
09:35 🔗 kyounko has quit IRC (Ping timeout: 492 seconds)
09:54 🔗 brayden_ has joined #archiveteam-bs
09:54 🔗 swebb sets mode: +o brayden_
09:54 🔗 brayden has quit IRC (Read error: Connection reset by peer)
10:58 🔗 j08nY has quit IRC (Read error: Operation timed out)
11:21 🔗 j08nY has joined #archiveteam-bs
11:35 🔗 Aoede has quit IRC (Ping timeout: 268 seconds)
11:38 🔗 j08nY has quit IRC (Read error: Operation timed out)
11:49 🔗 Aoede has joined #archiveteam-bs
11:54 🔗 BartoCH has quit IRC (Ping timeout: 260 seconds)
11:56 🔗 BartoCH has joined #archiveteam-bs
11:57 🔗 BlueMaxim has quit IRC (Quit: Leaving)
11:57 🔗 Aoede has quit IRC (Ping timeout: 268 seconds)
12:01 🔗 Honno_ has joined #archiveteam-bs
12:04 🔗 Honno__ has joined #archiveteam-bs
12:05 🔗 Aoede has joined #archiveteam-bs
12:06 🔗 j08nY has joined #archiveteam-bs
12:07 🔗 Honno has quit IRC (Ping timeout: 370 seconds)
12:09 🔗 Honno_ has quit IRC (Ping timeout: 370 seconds)
12:23 🔗 GE has quit IRC (Remote host closed the connection)
12:23 🔗 Aoede has quit IRC (Ping timeout: 268 seconds)
12:34 🔗 Aoede has joined #archiveteam-bs
12:39 🔗 Aoede has quit IRC (Ping timeout: 268 seconds)
12:40 🔗 Aoede has joined #archiveteam-bs
13:03 🔗 Boppen has quit IRC (hub.dk irc.du.se)
13:10 🔗 Boppen has joined #archiveteam-bs
14:04 🔗 Whopper_ has joined #archiveteam-bs
14:09 🔗 Whopper has quit IRC (Ping timeout: 633 seconds)
14:12 🔗 Honno has joined #archiveteam-bs
14:13 🔗 Honno__ has quit IRC (Ping timeout: 370 seconds)
14:18 🔗 Stiletto has joined #archiveteam-bs
14:21 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
14:36 🔗 GE has joined #archiveteam-bs
14:39 🔗 Honno has quit IRC (Ping timeout: 370 seconds)
15:19 🔗 SketchCow All the data will be saved.. with few exceptions
15:39 🔗 Honno has joined #archiveteam-bs
16:13 🔗 RichardG_ has joined #archiveteam-bs
16:19 🔗 RichardG has quit IRC (Read error: Operation timed out)
16:27 🔗 Aranje has joined #archiveteam-bs
16:37 🔗 kristian_ has joined #archiveteam-bs
16:57 🔗 icedice has joined #archiveteam-bs
18:10 🔗 kristian_ has quit IRC (Quit: Leaving)
18:18 🔗 GE_ has joined #archiveteam-bs
18:19 🔗 GE_ has quit IRC (Client Quit)
18:21 🔗 GE has quit IRC (Ping timeout: 255 seconds)
18:48 🔗 RichardG_ is now known as RichardG
19:27 🔗 powerSPUF has joined #archiveteam-bs
19:27 🔗 powerSPUF RIP in Peace http://mafiathesyndicate.com
19:28 🔗 powerSPUF Apparently a phpbb update corrupted all the posts on the forum last month and they lost about five years of mafia games. No backups, of course.
19:29 🔗 Akiva has joined #archiveteam-bs
19:30 🔗 powerSPUF I'm getting this info on mafiathesyndicate second hand from a friend who used it.
19:37 🔗 powerSPUF A preventive forum backup team like WikiTeam might come in handy in these situations.
19:40 🔗 kristian_ has joined #archiveteam-bs
19:42 🔗 Kaz forum devs are too stupid for their own good
19:42 🔗 Kaz 5 years without taking a backup, amazing
19:46 🔗 powerSPUF https://www.phpbb.com/community/viewtopic.php?f=556&p=14717716 My favorite thing is that they were told to backup the database in this support thread for the forum software they're using, and they still didn't do it.
19:52 🔗 Stiletto has quit IRC ()
19:52 🔗 powerSPUF I do think making a ForumTeam would be a good idea, to make backing up forums and the information they might hold easier.
19:52 🔗 powerSPUF There's only going to be so many kinds of forum software.
19:53 🔗 xmc i've got a project running to do that actually, but it's slow going
19:53 🔗 xmc needs about 30 more hours of code before it's production grade in my view
19:55 🔗 powerSPUF Oh? What's the plan for your project, xmc?
19:55 🔗 schbirid i added wget lines for some forums to the wiki ages ago
19:56 🔗 schbirid forums are fun to grab
19:57 🔗 powerSPUF I know I came across a vbulletin page on there that link me to the cityofheroes-grab code.
19:58 🔗 powerSPUF Looks like I could adapt it to the Steam Users' Forum grab, since both forums used the same software.
19:58 🔗 powerSPUF *adapt it for
19:58 🔗 schbirid just make sure to identify useless pages (like login pages with unique urls) and infinite loops (calendars etc)
19:59 🔗 schbirid i also always dont grab single post urls as that is so incredibly redundant
20:00 🔗 Akiva_ has joined #archiveteam-bs
20:02 🔗 powerSPUF The main thing a script would be good for is using the "quote" feature of forum software to get the raw bbcode of posts.
20:03 🔗 powerSPUF Of course, the question then becomes "what format should data be stored in"
20:03 🔗 Akiva has quit IRC (Ping timeout: 245 seconds)
20:06 🔗 powerSPUF Possibly WARC for the rendered pages (forums, threads, posts and members) as well as JSON for the raw bbcode and metadata.
20:09 🔗 powerSPUF Then I could write scripts using the JSON to convert the forum dump to a format that another forum software can support. (Ideally a good open source forum software that supports all the features that the forum which the dump was exported from did.)
20:12 🔗 schbirid just warc
20:12 🔗 schbirid if someone wants to recreate the structure, that is enough
20:12 🔗 schbirid [quote] would not be available as non-member for many/most forums, seems a waste of time
20:13 🔗 xmc powerSPUF: i'm working on a thing that turns webforums into usenet feeds
20:13 🔗 xmc archival-grade, but extensively transformative so probably not archiveteam material
20:14 🔗 xmc if you use wget-warc-lua, we have a lua script on the github to grab in completeness a couple different forum types
20:14 🔗 xmc https://github.com/ArchiveTeam/wget-lua-forum-scripts
20:16 🔗 powerSPUF schbirid: This script would use the user's login credentials for that.
20:20 🔗 schbirid that would be quite different from archiving public websites but whatever floats your baot
20:23 🔗 powerSPUF Update on the mafia forum situation
20:23 🔗 powerSPUF Apparently they did have 1 single backup, but it's corrupted somehow.
20:24 🔗 GE has joined #archiveteam-bs
20:24 🔗 powerSPUF I'm attempting to get a copy from the forum admins through my friend. I'll see what I can manage to recover.
20:31 🔗 powerSPUF xmc, do you have any idea how to build a windows version of wget-warc-lua
20:31 🔗 xmc no, and we usually recommend against working in windows because its filesystem tends to eat metadata
20:33 🔗 Akiva_ has quit IRC (Ping timeout: 245 seconds)
20:34 🔗 powerSPUF Guess I'll need to make a bootable linux USB with my archiving tools.
20:39 🔗 xmc a virtual machine is also a good option, vmware player is free and pretty straightforward to use
20:39 🔗 xmc most archiving isn't very cpu intensive
20:46 🔗 schbirid has quit IRC (Quit: Leaving)
20:53 🔗 Akiva_ has joined #archiveteam-bs
21:09 🔗 Akiva_ has quit IRC (Remote host closed the connection)
21:22 🔗 powerSPUF has quit IRC (Quit: AndroIRC - Android IRC Client ( http://www.androirc.com ))
21:30 🔗 qwebirc54 has joined #archiveteam-bs
21:32 🔗 qwebirc54 is now known as powerKitt
22:09 🔗 Stilett0 has joined #archiveteam-bs
22:18 🔗 dashcloud has joined #archiveteam-bs
22:28 🔗 kristian_ has quit IRC (Quit: Leaving)
22:33 🔗 GE has quit IRC (Remote host closed the connection)
22:34 🔗 powerKitt has quit IRC (Ping timeout: 268 seconds)
22:36 🔗 powerKitt has joined #archiveteam-bs
22:39 🔗 powerKitt Is there anyway to define a custom robots.txt and sitemap.xml for wget?
22:40 🔗 nicolas17 has joined #archiveteam-bs
22:40 🔗 powerKitt nevermind
22:40 🔗 nicolas17 apparently "Windows XP isn't supported anymore" also means Microsoft went out of their way to delete already-released updates from their download site
22:42 🔗 nicolas17 https://support.microsoft.com/en-us/help/916089 -> https://support.microsoft.com/en-us/help/927891 -> http://www.microsoft.com/downloads/details.aspx?familyid=7a81b0cd-a0b9-497e-8a89-404327772e5a -> 404 not found
22:42 🔗 Stiletto has joined #archiveteam-bs
22:43 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
22:43 🔗 powerKitt Good job Microsoft.
22:45 🔗 nicolas17 the internet archive wayback machine happens to have the English version of that file, but on a Windows XP with another language it says the update doesn't match the system language and doesn't install
22:46 🔗 nicolas17 and I can't find "WindowsXP-KB927891-v3-x86-ESN.exe" anywhere else
22:46 🔗 nicolas17 so uh
22:46 🔗 dashcloud did you check the catalog?
22:46 🔗 dashcloud Microsoft Catalog that is
22:47 🔗 nicolas17 should we start archiving the files that remain?
22:47 🔗 nicolas17 dashcloud: after my complete failure to find something else on the Catalog a few days ago, I didn't even think of checking now
22:47 🔗 nicolas17 looks like it's there :O
22:57 🔗 powerKitt https://github.com/ArchiveTeam/wget-lua-forum-scripts/blob/master/vbulletin.lua Can someone who is better at Wget LUA make it so I can just variables for the member, thread, and forum max numbers?
22:58 🔗 powerKitt *can just use variables for
22:58 🔗 Ravenloft has joined #archiveteam-bs
22:58 🔗 powerKitt and be able to just increment through them
23:08 🔗 hook54321 Is the archiveteam logo copyrighted?
23:09 🔗 xmc yes, but you can use it
23:10 🔗 hook54321 For what?
23:10 🔗 dashcloud it looks like they switched from the kb format: https://support.microsoft.com/kb/65260 to this format: https://support.microsoft.com/en-us/help/900000
23:10 🔗 BlueMaxim has joined #archiveteam-bs
23:12 🔗 xmc hook54321: well what are you doing
23:14 🔗 hook54321 the robotics team at my high school is trying to decide on a new "theme" for the team for next year, but all of the themes people made are horrible. I'm thinking of proposing archiveteam as a theme, mostly for a joke, I doubt it would win.
23:16 🔗 powerKitt ~~give archivebot a physical form so it can save physical collections~~
23:16 🔗 powerKitt (that was meant to be strikethrough)

irclogger-viewer