[00:13] *** BlueMaxim has joined #archiveteam-bs [00:33] *** j08nY has quit IRC (Quit: Leaving) [01:07] *** Nazca has quit IRC (Read error: Connection reset by peer) [01:08] *** Nazca has joined #archiveteam-bs [01:12] *** Stilett0 has quit IRC (Ping timeout: 250 seconds) [01:27] looks like IA is down [01:46] *** Stilett0 has joined #archiveteam-bs [02:23] oh god [02:23] http://www.html5zombo.com/ [02:30] opps [02:39] Question, I have a Knowledge Base from a Point of Sale Product that is still in heavy use today [02:39] but the company that owns it has completely wiped the damn thing and no longer provides ANY copies of it [02:39] I have a copy in Text Files currently, What do [02:39] (As Text files) [02:40] I have a good news for Daum Tvpot http://www.archiveteam.org/index.php?title=Daum_Tvpot --- the company has decided to follow the proper archiving process in lieu of public feedbacks [02:40] personally I was busy enough and unable to track that, but I'm quite glad that the company does have a good sense of being correct [02:42] (I did make developers aware of the public concerns, at the least) [02:56] *** icedice has quit IRC (Ping timeout: 245 seconds) [02:56] jrwr: upload whatever you have to an item as files, I guess, and make sure it's well-tagged [02:58] Since its copy righted all to hell [02:58] I sent a copy to Mr Scott for safe keeping [03:34] We're all gonna die [04:02] *** ndiddy has quit IRC () [04:15] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:21] *** Sk1d has joined #archiveteam-bs [04:22] RIP archive.org [04:22] wat [04:22] i can't get to it [04:22] RIP Net Neutrality [04:23] down for everyone or just me says it's up [04:28] SketchCow: Now now, Its not THAT bad of a KB [04:28] :3 [04:31] do you have an arcade machine, SketchCow? [04:44] archive.org is up for me (Chicagoland US) [04:53] whats odd is it doesn't work ip address [04:54] so there blocking or connecting is failing for some reason [04:58] *** superkuh has quit IRC (Remote host closed the connection) [05:00] *** superkuh has joined #archiveteam-bs [05:08] *** Aranje has quit IRC (Quit: Three sheets to the wind) [05:14] 207.241.224.2 is what I get, godane [05:15] but it instantly redirects me to archive.org [05:18] that was the ip i was using [05:22] *** j08nY has joined #archiveteam-bs [05:26] ah ok, guess it just redirects to "archive.org" instead of serving up the site contents or something [05:33] Wikipedia is permanently deleting some "covfefe" pages on some wikis, including the history, so people can't go back and view the content that was on the pages. [05:34] Example: https://simple.wikipedia.org/wiki/Covfefe [05:34] Google Cache of page: http://webcache.googleusercontent.com/search?q=cache%3Ahttps%3A%2F%2Fsimple.wikipedia.org%2Fwiki%2FCovfefe [05:48] http://deletionpedia.org/w/index.php?title=Special:RecentChanges&hidebots=0 [06:04] *** j08nY has quit IRC (Read error: Operation timed out) [06:34] *** DopefishJ is now known as DFJustin [06:42] *** schbirid has joined #archiveteam-bs [07:03] *** Stilett0 has quit IRC (ny.us.hub irc.colosolutions.net) [07:03] *** trs80 has quit IRC (ny.us.hub irc.colosolutions.net) [07:03] *** RedType has quit IRC (ny.us.hub irc.colosolutions.net) [07:03] *** SadDM has quit IRC (ny.us.hub irc.colosolutions.net) [07:03] *** timmc has quit IRC (ny.us.hub irc.colosolutions.net) [07:03] *** jspiros has quit IRC (ny.us.hub irc.colosolutions.net) [07:04] *** Stilett0 has joined #archiveteam-bs [07:04] *** trs80 has joined #archiveteam-bs [07:04] *** RedType has joined #archiveteam-bs [07:04] *** SadDM has joined #archiveteam-bs [07:04] *** timmc has joined #archiveteam-bs [07:04] *** jspiros has joined #archiveteam-bs [07:04] *** irc.colosolutions.net sets mode: +o SadDM [07:04] *** swebb sets mode: +o SadDM [07:08] *** Jonison has joined #archiveteam-bs [07:10] *** SHODAN_UI has joined #archiveteam-bs [07:48] *** j08nY has joined #archiveteam-bs [08:04] *** j08nY has quit IRC (Quit: Leaving) [08:44] consensus is code for deletionist [08:49] Just do what the corporations do and make 10 different alternate personalities, never edit at the same hours (graph mapping for edit times is a thing), use IP addresses in different places, edit on a wide variety of topics and rarely come together unless it can be seen as a valid coincidence, boom, you have your "consensus" which due to the way "votes" are advertised is rarely more than [08:49] 10 or so ... [08:49] ... people deciding something that affects milions of people - the corporations just tend to pay people to get stuff they want deleted or history manipulated, there's a cottage industry of "freeelancing" corrupt people on wikipedia, even admins [09:28] <+Kradorex> Looks like Comcast may be interfering with Internet Archive (archive.org) traffic. [09:28] oh? [09:28] <+Kradorex> Someone from IA showed up to NANOG-l (an Internet operations professional list) and posted this in their first four paragraphs: [09:28] <+Kradorex> "at the internet archive we have a strange problem at the moment. [09:28] <+Kradorex> a slightly upstream device looks like it's returning icmp administratively unreachable for our main load balancer's ip address (which serves archive.org). [09:28] <+Kradorex> comcast has interpreted this to remove or (maybe blackhole) connections to archive.org somewhere pretty close to the edge. [09:28] <+Kradorex> but it is routing and completing connections perfectly to other ip addresses on the same /24." [09:28] <+Kradorex> Which reportedly blog.archive.org which resolves to an IP in the same IP block that archive.org resolves is fine. [09:28] <+Kradorex> But archive.org proper is inaccessible. [09:28] <+Kradorex> IA's twitter is using the "some paths appear to be blocked" language: [09:28] <+Kradorex> https://twitter.com/internetarchive/status/870514798064058370 [09:28] <+Sanqui> Kradorex: Somebody from Archive Team was reporting having issues with accessing archive.org too [09:28] <+Kradorex> Probably related. [09:28] <+Sanqui> may I share this? [09:28] <+Kradorex> Please do. [09:28] <+Kradorex> Incident begun ~3-4 hours ago. [09:34] Huh, I'm seeing the same issue on Centurylink DSL [09:34] I can reach blog.archive.org but archive.org itself is inaccessible [09:39] FWIW, it works for me from Central Europe. [09:39] <+Kradorex> I'd be interested to see traceroutes [09:39] MrRadar, godane, ranma: can you please traceroute archive.org? [09:40] Sure. I'll post my traceroute to blog.archive.org as well [09:42] Sanqui: https://pastebin.com/JPY77Wfm [09:43] Huh, looks like Centurylink is routing through Comcast's network to reach archive.org [09:46] https://pbs.twimg.com/media/DBS5mAvU0AExyP6.jpg [09:46] Sample traceroute from someone else. [09:54] Oh and the other thing I forgot to mention about wikipedia is that a lot of the founders are heavily involved with Wikia which runs commercial wikis which basically have all the stuff that wikipedia deletes off an infinite encyclopedia, so that pushes a lot of the policy to drive it to wikia where they put autoplaying videos and banner ads and tracking pixels all over it [09:56] fuck wikia [10:09] *** BartoCH has quit IRC (Quit: WeeChat 1.8) [10:12] *** BartoCH has joined #archiveteam-bs [10:22] *** Jonison2 has joined #archiveteam-bs [10:25] *** Jonison has quit IRC (Ping timeout: 260 seconds) [10:46] *** Jonison2 has quit IRC (Quit: Leaving) [10:52] MrRadar: https://pastebin.com/ajVSSKdU [10:54] Sanqui: ^ [10:55] cheers [10:56] so i just ripped the original airing Alien Autopsy [10:57] i had it on vhs on the back of my Forrest Gump tape [11:04] *** Metruptio has joined #archiveteam-bs [11:13] so i got a E! True Hollywood Story about Gary Busey [11:14] the episode is not even on TV-Vault [11:56] *** BlueMaxim has quit IRC (Quit: Leaving) [12:05] *** icedice has joined #archiveteam-bs [13:55] Im so glad I have a US Based ISP (Charter) that has high speeds, no caps, and doesn't fuck around [14:34] *** REiN^ has quit IRC (Max SendQ exceeded) [14:34] *** DFJustin has quit IRC (Remote host closed the connection) [14:34] *** DFJustin has joined #archiveteam-bs [14:34] *** swebb sets mode: +o DFJustin [14:38] *** REiN^ has joined #archiveteam-bs [14:44] *** Aranje has joined #archiveteam-bs [14:55] *** TheLovina has joined #archiveteam-bs [15:17] *** icedice has quit IRC (Quit: Leaving) [15:27] *** j08nY has joined #archiveteam-bs [15:43] *** kyounko has joined #archiveteam-bs [16:56] *** Stilett0 is now known as Stiletto [17:11] *** antomatic has quit IRC (Ping timeout: 268 seconds) [17:21] *** REiN^ has quit IRC (Max SendQ exceeded) [17:21] *** REiN^ has joined #archiveteam-bs [17:27] *** Ravenloft has quit IRC (Read error: Operation timed out) [17:54] *** SmileyG has joined #archiveteam-bs [17:54] *** Smiley has quit IRC (Read error: Connection reset by peer) [17:56] *** DFJustin has quit IRC (Remote host closed the connection) [17:56] *** DFJustin has joined #archiveteam-bs [17:56] *** swebb sets mode: +o DFJustin [18:03] *** antomatic has joined #archiveteam-bs [18:03] *** swebb sets mode: +o antomatic [18:05] *** SHODAN_UI has quit IRC (Read error: Connection reset by peer) [18:09] *** SHODAN_UI has joined #archiveteam-bs [18:29] *** icedice has joined #archiveteam-bs [18:35] I have a pet theory that all ISPs are evil, just some more subtley than others... [18:37] Yeah, these days it's def. true [18:37] I miss the days of dialup when there was real competition in the ISP market :( [18:37] (Though I don't miss the speeds of dial up) [18:40] I hear there was a pretty good cow-themed ISP at one point ;) [18:41] I like my information superhighway like I like my coffee mugs? [18:56] downloading PSX ISOs on dialup D: [18:56] lordy [18:58] I remember downloading the BeOS "Personal Edition" installer over dialup [18:58] It took *hours* [19:00] PSX ISOs took days [19:00] Wow, just looked up the size and it was "only" 48 megabytes [19:01] Yeah, I can't imagine trying to download full ISOs at that speed [19:16] Q: should we ignore pages with individual posts when archiving fora? [19:16] *** dashcloud has quit IRC (Ping timeout: 268 seconds) [19:16] those can easily increase the number of pages twenty-fold [19:17] (assuming twenty posts per forum page) [19:17] s/forum/thread/ [19:17] JAA: thoughts? [19:19] *** dashcloud has joined #archiveteam-bs [19:20] I generally ignore them, especially through Archivebot [19:21] OK, then should they be added to the forums ignore set? [19:21] I'm archiving four forums right now and have been adding to it [19:21] The problem is that there's so many ways for forums to structure their URLs so it's hard to write generic rules [19:22] that's fine, that's why we add to the rules [19:22] sure you can have custom forums but phpbb, smf etc. are gonna be roughly the same [19:22] another question is the print/archive pages, like https://forums.arcade-museum.com/archive/index.php/t-298176.html [19:22] i think those can be pretty useful for anybody scraping, but they're technically dupe info. [19:23] *** Xibalba has quit IRC (ZNC 1.7.x-git-737-29d4f20-frankenznc - http://znc.in) [19:23] *** Xibalba has joined #archiveteam-bs [19:24] if we agree that they should be ignored, i shall improve the list [19:25] Hmm.. for those I wouldn't ban them by default [19:25] Since some forums only show the "last 30 days" or w/e in their main indexes [19:25] But do list all threads in their "archive" indexes [19:26] okay. (archives often strip stuff like quotes, so they're worse for preservation though) [19:26] i wish json had fucking comments [19:26] or can we switch to json [19:26] forums are terrible miserable repositories of culture [19:27] yes [19:27] they hold so much valuable content - especially for the people who took part in their existence [19:27] but they're a dying breed [19:28] I'm archiving forums even tangentially pertaining to my interests, not waiting for announcements. would love to get something larger-scale going [19:29] 10 archived years and 1 final, low-activity year before shutdown missing is a desirable outcome [19:29] s/missing// [19:29] nvm, that word was fine [19:30] yeah [19:30] i'm slowly puttering away at my forum scraper [19:30] slowly [19:30] ugh [19:30] i wish i had more time and more drive [19:30] i got one for zetaboards if you're interested [19:30] works best if you give it an admin account though [19:31] i'm scraping into usenet format, like gmane used to be [19:31] i'd like to make one that just registers and logs in and scrapes away member-only forums (rare) and user pages (common) [19:31] xmc: is your scraper 'pluggable'? [19:32] eh, kinda [19:32] as in, is it easy to write a module for a new forum sw [19:32] i think so but i haven't tried [19:32] *** Ravenloft has joined #archiveteam-bs [19:32] is it on github? :) [19:32] it's not public yet :P [19:33] xmc: kind of a wikiteam tool for forums? [19:34] like those wiki dumps being uploaded to IA [19:34] kinda [19:34] nice [19:34] so i am adding "/viewtopic\\.php\\?p=\\d+" [19:34] with attachments and images and stuff too, as mime attachments [19:34] like adults [19:34] it's slow going because [19:35] it's slow going because i'm being very particular about how i want its output to look [19:35] well, it's within my (quite wide) area of interest [19:35] and there's a ton of sysadminning yet to do [19:36] like i have a newsspool full of malformed messages from one forum that i sucked in before it went away [19:36] so i have to read those in and re-format them [19:36] (i save all the original as-received data in the mime body, no worries there) [19:36] [thumbs up] [19:36] always derive [19:37] (never integrate unless left with no other choice) [19:37] ... [19:37] * xmc points to the door [19:37] nerds stay outside [19:37] I've been working on archiving one of the web's bigger (but still dying) forums. It's complete up through the end of last year and I'm working on a script to check for incremental updates to it [19:38] (I don't want to name them explicitly since there's a good chance they'd ban me for scraping their paid "archives") [19:38] nice [19:38] ohhhh [19:38] hm [19:38] they were on my target list but they don't work with my scraper so, [19:39] if they're the site with paid archives that i think they are [19:39] Probably ;) I don't know how many others do [19:40] good [19:41] i'll continue going after the more obscure, yet fascinating places myself :) [19:41] is there an archiveteam subcommittee on webforums yet [19:41] or does it still need a snappy name [19:41] Message Bored [19:41] you're welcome [19:41] #msgbored ? great [19:42] and done, everyone join there if want [19:42] the joke is that people got bored of message boards and that's why they're dying and need saving [19:42] (sorry, had to drive that home) [19:42] :) [20:42] scii western [20:42] whops, typo [20:42] for another channel, my bad [22:06] Sanqui: Regarding your earlier question, I'd also ignore individual posts but keep archives. I'm not sure about printable versions; I guess I'm leaning towards ignoring them. [22:07] ! [22:07] never grab individual posts, it makes the grab much more intense and is just redundant [22:13] *** schbirid has quit IRC (Quit: Leaving) [22:17] I feel like at best it's a lazy way of getting the contents of whatever inbound links to posts might exist, rather than inferring them from the threads. [22:27] *** ZexaronS has joined #archiveteam-bs [22:28] *** ZexaronS has quit IRC (Client Quit) [22:40] *** SHODAN_UI has quit IRC (Remote host closed the connection)