#archiveteam-bs 2017-05-17,Wed

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***Cameron_D has joined #archiveteam-bs [00:01]
Pudsey has quit IRC (Remote host closed the connection) [00:08]
..... (idle for 24mn)
pigpengui is now known as pinguin [00:32]
brayden_ has quit IRC (Read error: Operation timed out)
brayden has joined #archiveteam-bs
swebb sets mode: +o brayden
j08nY has quit IRC (Quit: Leaving)
[00:41]
box41 has joined #archiveteam-bs [00:47]
.... (idle for 19mn)
dashcloud has quit IRC (Read error: Connection reset by peer)
dashcloud has joined #archiveteam-bs
[01:06]
DFJustin has quit IRC (Remote host closed the connection)
box41 has quit IRC (Ping timeout: 268 seconds)
powerKitt has joined #archiveteam-bs
DFJustin has joined #archiveteam-bs
swebb sets mode: +o DFJustin
powerArch has joined #archiveteam-bs
[01:13]
.................... (idle for 1h38mn)
BlueMaxim has quit IRC (Read error: Operation timed out)
BlueMaxim has joined #archiveteam-bs
powerKitt has quit IRC (Quit: Page closed)
[02:58]
.... (idle for 18mn)
dashcloud has quit IRC (Read error: Operation timed out)
pinguin has quit IRC (Read error: Connection reset by peer)
[03:21]
....... (idle for 31mn)
ndiddy has quit IRC () [03:54]
............. (idle for 1h4mn)
Sk1d has quit IRC (Ping timeout: 194 seconds) [04:58]
Sk1d has joined #archiveteam-bs [05:03]
............... (idle for 1h11mn)
jspiros has quit IRC (Read error: Operation timed out)
Mayonaise has quit IRC (Read error: Operation timed out)
jspiros has joined #archiveteam-bs
[06:14]
Mayonaise has joined #archiveteam-bs [06:31]
.... (idle for 19mn)
kristian_ has joined #archiveteam-bs [06:50]
...... (idle for 28mn)
godaneSketchCow: i'm uploading some of my VHS Captures to FOS
i have over 60gb of my VHS tapes i have captured
[07:18]
***schbirid has joined #archiveteam-bs [07:20]
........... (idle for 54mn)
j08nY has joined #archiveteam-bs [08:14]
..... (idle for 24mn)
Honno has quit IRC (Read error: Connection reset by peer)
Honno has joined #archiveteam-bs
[08:38]
kristian_ has quit IRC (Quit: Leaving) [08:52]
JAAYet another example why robots.txt is awful: the website of the UN has a pretty restrictive file, but it still allowed significant parts to be scraped. At some point, it looks like they moved everything to localised pages, i.e. un.org/foo became un.org/en/foo. And naturally, they forgot to update robots.txt. [09:03]
***GE has joined #archiveteam-bs [09:09]
GEhttp://forums.steampowered.com/forums/announcement.php?f=14 Anyone want to do an archive job? [09:13]
jtn2GE: see #outofsteam, http://www.archiveteam.org/index.php?title=Steam [09:20]
GEOh cool [09:21]
JAA"so you could save off or archive some of the content"
"some"
[09:22]
***kyounko|2 has joined #archiveteam-bs [09:29]
kyounko has quit IRC (Ping timeout: 492 seconds) [09:35]
.... (idle for 19mn)
brayden_ has joined #archiveteam-bs
swebb sets mode: +o brayden_
brayden has quit IRC (Read error: Connection reset by peer)
[09:54]
............. (idle for 1h4mn)
j08nY has quit IRC (Read error: Operation timed out) [10:58]
..... (idle for 23mn)
j08nY has joined #archiveteam-bs [11:21]
Aoede has quit IRC (Ping timeout: 268 seconds)
j08nY has quit IRC (Read error: Operation timed out)
[11:35]
Aoede has joined #archiveteam-bs [11:49]
BartoCH has quit IRC (Ping timeout: 260 seconds)
BartoCH has joined #archiveteam-bs
BlueMaxim has quit IRC (Quit: Leaving)
Aoede has quit IRC (Ping timeout: 268 seconds)
Honno_ has joined #archiveteam-bs
Honno__ has joined #archiveteam-bs
Aoede has joined #archiveteam-bs
j08nY has joined #archiveteam-bs
Honno has quit IRC (Ping timeout: 370 seconds)
Honno_ has quit IRC (Ping timeout: 370 seconds)
[11:54]
GE has quit IRC (Remote host closed the connection)
Aoede has quit IRC (Ping timeout: 268 seconds)
[12:23]
Aoede has joined #archiveteam-bs [12:34]
Aoede has quit IRC (Ping timeout: 268 seconds)
Aoede has joined #archiveteam-bs
[12:39]
..... (idle for 23mn)
Boppen has quit IRC (hub.dk irc.du.se) [13:03]
Boppen has joined #archiveteam-bs [13:10]
........... (idle for 54mn)
Whopper_ has joined #archiveteam-bs [14:04]
Whopper has quit IRC (Ping timeout: 633 seconds)
Honno has joined #archiveteam-bs
Honno__ has quit IRC (Ping timeout: 370 seconds)
[14:09]
Stiletto has joined #archiveteam-bs
Stilett0 has quit IRC (Read error: Operation timed out)
[14:18]
.... (idle for 15mn)
GE has joined #archiveteam-bs
Honno has quit IRC (Ping timeout: 370 seconds)
[14:36]
......... (idle for 40mn)
SketchCowAll the data will be saved.. with few exceptions [15:19]
..... (idle for 20mn)
***Honno has joined #archiveteam-bs [15:39]
....... (idle for 34mn)
RichardG_ has joined #archiveteam-bs [16:13]
RichardG has quit IRC (Read error: Operation timed out) [16:19]
Aranje has joined #archiveteam-bs [16:27]
kristian_ has joined #archiveteam-bs [16:37]
..... (idle for 20mn)
icedice has joined #archiveteam-bs [16:57]
............... (idle for 1h13mn)
kristian_ has quit IRC (Quit: Leaving) [18:10]
GE_ has joined #archiveteam-bs
GE_ has quit IRC (Client Quit)
GE has quit IRC (Ping timeout: 255 seconds)
[18:18]
...... (idle for 27mn)
RichardG_ is now known as RichardG [18:48]
........ (idle for 39mn)
powerSPUF has joined #archiveteam-bs [19:27]
powerSPUFRIP in Peace http://mafiathesyndicate.com
Apparently a phpbb update corrupted all the posts on the forum last month and they lost about five years of mafia games. No backups, of course.
[19:27]
***Akiva has joined #archiveteam-bs [19:29]
powerSPUFI'm getting this info on mafiathesyndicate second hand from a friend who used it. [19:30]
A preventive forum backup team like WikiTeam might come in handy in these situations. [19:37]
***kristian_ has joined #archiveteam-bs [19:40]
Kazforum devs are too stupid for their own good
5 years without taking a backup, amazing
[19:42]
powerSPUFhttps://www.phpbb.com/community/viewtopic.php?f=556&p=14717716 My favorite thing is that they were told to backup the database in this support thread for the forum software they're using, and they still didn't do it. [19:46]
***Stiletto has quit IRC () [19:52]
powerSPUFI do think making a ForumTeam would be a good idea, to make backing up forums and the information they might hold easier.
There's only going to be so many kinds of forum software.
[19:52]
xmci've got a project running to do that actually, but it's slow going
needs about 30 more hours of code before it's production grade in my view
[19:53]
powerSPUFOh? What's the plan for your project, xmc? [19:55]
schbiridi added wget lines for some forums to the wiki ages ago
forums are fun to grab
[19:55]
powerSPUFI know I came across a vbulletin page on there that link me to the cityofheroes-grab code.
Looks like I could adapt it to the Steam Users' Forum grab, since both forums used the same software.
*adapt it for
[19:57]
schbiridjust make sure to identify useless pages (like login pages with unique urls) and infinite loops (calendars etc)
i also always dont grab single post urls as that is so incredibly redundant
[19:58]
***Akiva_ has joined #archiveteam-bs [20:00]
powerSPUFThe main thing a script would be good for is using the "quote" feature of forum software to get the raw bbcode of posts.
Of course, the question then becomes "what format should data be stored in"
[20:02]
***Akiva has quit IRC (Ping timeout: 245 seconds) [20:03]
powerSPUFPossibly WARC for the rendered pages (forums, threads, posts and members) as well as JSON for the raw bbcode and metadata.
Then I could write scripts using the JSON to convert the forum dump to a format that another forum software can support. (Ideally a good open source forum software that supports all the features that the forum which the dump was exported from did.)
[20:06]
schbiridjust warc
if someone wants to recreate the structure, that is enough
[quote] would not be available as non-member for many/most forums, seems a waste of time
[20:12]
xmcpowerSPUF: i'm working on a thing that turns webforums into usenet feeds
archival-grade, but extensively transformative so probably not archiveteam material
if you use wget-warc-lua, we have a lua script on the github to grab in completeness a couple different forum types
https://github.com/ArchiveTeam/wget-lua-forum-scripts
[20:13]
powerSPUFschbirid: This script would use the user's login credentials for that. [20:16]
schbiridthat would be quite different from archiving public websites but whatever floats your baot [20:20]
powerSPUFUpdate on the mafia forum situation
Apparently they did have 1 single backup, but it's corrupted somehow.
[20:23]
***GE has joined #archiveteam-bs [20:24]
powerSPUFI'm attempting to get a copy from the forum admins through my friend. I'll see what I can manage to recover. [20:24]
xmc, do you have any idea how to build a windows version of wget-warc-lua [20:31]
xmcno, and we usually recommend against working in windows because its filesystem tends to eat metadata [20:31]
***Akiva_ has quit IRC (Ping timeout: 245 seconds) [20:33]
powerSPUFGuess I'll need to make a bootable linux USB with my archiving tools. [20:34]
xmca virtual machine is also a good option, vmware player is free and pretty straightforward to use
most archiving isn't very cpu intensive
[20:39]
***schbirid has quit IRC (Quit: Leaving) [20:46]
Akiva_ has joined #archiveteam-bs [20:53]
.... (idle for 16mn)
Akiva_ has quit IRC (Remote host closed the connection) [21:09]
powerSPUF has quit IRC (Quit: AndroIRC - Android IRC Client ( http://www.androirc.com )) [21:22]
qwebirc54 has joined #archiveteam-bs
qwebirc54 is now known as powerKitt
[21:30]
........ (idle for 37mn)
Stilett0 has joined #archiveteam-bs [22:09]
dashcloud has joined #archiveteam-bs [22:18]
kristian_ has quit IRC (Quit: Leaving) [22:28]
GE has quit IRC (Remote host closed the connection)
powerKitt has quit IRC (Ping timeout: 268 seconds)
powerKitt has joined #archiveteam-bs
[22:33]
powerKittIs there anyway to define a custom robots.txt and sitemap.xml for wget? [22:39]
***nicolas17 has joined #archiveteam-bs [22:40]
powerKittnevermind [22:40]
nicolas17apparently "Windows XP isn't supported anymore" also means Microsoft went out of their way to delete already-released updates from their download site
https://support.microsoft.com/en-us/help/916089 -> https://support.microsoft.com/en-us/help/927891 -> http://www.microsoft.com/downloads/details.aspx?familyid=7a81b0cd-a0b9-497e-8a89-404327772e5a -> 404 not found
[22:40]
***Stiletto has joined #archiveteam-bs
Stilett0 has quit IRC (Read error: Operation timed out)
[22:42]
powerKittGood job Microsoft. [22:43]
nicolas17the internet archive wayback machine happens to have the English version of that file, but on a Windows XP with another language it says the update doesn't match the system language and doesn't install
and I can't find "WindowsXP-KB927891-v3-x86-ESN.exe" anywhere else
so uh
[22:45]
dashclouddid you check the catalog?
Microsoft Catalog that is
[22:46]
nicolas17should we start archiving the files that remain?
dashcloud: after my complete failure to find something else on the Catalog a few days ago, I didn't even think of checking now
looks like it's there :O
[22:47]
powerKitthttps://github.com/ArchiveTeam/wget-lua-forum-scripts/blob/master/vbulletin.lua Can someone who is better at Wget LUA make it so I can just variables for the member, thread, and forum max numbers?
*can just use variables for
[22:57]
***Ravenloft has joined #archiveteam-bs [22:58]
powerKittand be able to just increment through them [22:58]
hook54321Is the archiveteam logo copyrighted? [23:08]
xmcyes, but you can use it [23:09]
hook54321For what? [23:10]
dashcloudit looks like they switched from the kb format: https://support.microsoft.com/kb/65260 to this format: https://support.microsoft.com/en-us/help/900000 [23:10]
***BlueMaxim has joined #archiveteam-bs [23:10]
xmchook54321: well what are you doing [23:12]
hook54321the robotics team at my high school is trying to decide on a new "theme" for the team for next year, but all of the themes people made are horrible. I'm thinking of proposing archiveteam as a theme, mostly for a joke, I doubt it would win. [23:14]
powerKitt~~give archivebot a physical form so it can save physical collections~~
(that was meant to be strikethrough)
[23:16]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)