#archiveteam-bs 2017-06-02,Fri

↑back Search

Time Nickname Message
00:13 🔗 BlueMaxim has joined #archiveteam-bs
00:33 🔗 j08nY has quit IRC (Quit: Leaving)
01:07 🔗 Nazca has quit IRC (Read error: Connection reset by peer)
01:08 🔗 Nazca has joined #archiveteam-bs
01:12 🔗 Stilett0 has quit IRC (Ping timeout: 250 seconds)
01:27 🔗 godane looks like IA is down
01:46 🔗 Stilett0 has joined #archiveteam-bs
02:23 🔗 joepie91 oh god
02:23 🔗 joepie91 http://www.html5zombo.com/
02:30 🔗 jrwr opps
02:39 🔗 jrwr Question, I have a Knowledge Base from a Point of Sale Product that is still in heavy use today
02:39 🔗 jrwr but the company that owns it has completely wiped the damn thing and no longer provides ANY copies of it
02:39 🔗 jrwr I have a copy in Text Files currently, What do
02:39 🔗 jrwr (As Text files)
02:40 🔗 Yurume I have a good news for Daum Tvpot http://www.archiveteam.org/index.php?title=Daum_Tvpot --- the company has decided to follow the proper archiving process in lieu of public feedbacks
02:40 🔗 Yurume personally I was busy enough and unable to track that, but I'm quite glad that the company does have a good sense of being correct
02:42 🔗 Yurume (I did make developers aware of the public concerns, at the least)
02:56 🔗 icedice has quit IRC (Ping timeout: 245 seconds)
02:56 🔗 joepie91 jrwr: upload whatever you have to an item as files, I guess, and make sure it's well-tagged
02:58 🔗 jrwr Since its copy righted all to hell
02:58 🔗 jrwr I sent a copy to Mr Scott for safe keeping
03:34 🔗 SketchCow We're all gonna die
04:02 🔗 ndiddy has quit IRC ()
04:15 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
04:21 🔗 Sk1d has joined #archiveteam-bs
04:22 🔗 ranma RIP archive.org
04:22 🔗 ranma wat
04:22 🔗 ranma i can't get to it
04:22 🔗 ranma RIP Net Neutrality
04:23 🔗 ranma down for everyone or just me says it's up
04:28 🔗 jrwr SketchCow: Now now, Its not THAT bad of a KB
04:28 🔗 jrwr :3
04:31 🔗 ranma do you have an arcade machine, SketchCow?
04:44 🔗 Odd0002 archive.org is up for me (Chicagoland US)
04:53 🔗 godane whats odd is it doesn't work ip address
04:54 🔗 godane so there blocking or connecting is failing for some reason
04:58 🔗 superkuh has quit IRC (Remote host closed the connection)
05:00 🔗 superkuh has joined #archiveteam-bs
05:08 🔗 Aranje has quit IRC (Quit: Three sheets to the wind)
05:14 🔗 Odd0002 207.241.224.2 is what I get, godane
05:15 🔗 Odd0002 but it instantly redirects me to archive.org
05:18 🔗 godane that was the ip i was using
05:22 🔗 j08nY has joined #archiveteam-bs
05:26 🔗 Odd0002 ah ok, guess it just redirects to "archive.org" instead of serving up the site contents or something
05:33 🔗 hook54321 Wikipedia is permanently deleting some "covfefe" pages on some wikis, including the history, so people can't go back and view the content that was on the pages.
05:34 🔗 hook54321 Example: https://simple.wikipedia.org/wiki/Covfefe
05:34 🔗 hook54321 Google Cache of page: http://webcache.googleusercontent.com/search?q=cache%3Ahttps%3A%2F%2Fsimple.wikipedia.org%2Fwiki%2FCovfefe
05:48 🔗 ranma http://deletionpedia.org/w/index.php?title=Special:RecentChanges&hidebots=0
06:04 🔗 j08nY has quit IRC (Read error: Operation timed out)
06:34 🔗 DopefishJ is now known as DFJustin
06:42 🔗 schbirid has joined #archiveteam-bs
07:03 🔗 Stilett0 has quit IRC (ny.us.hub irc.colosolutions.net)
07:03 🔗 trs80 has quit IRC (ny.us.hub irc.colosolutions.net)
07:03 🔗 RedType has quit IRC (ny.us.hub irc.colosolutions.net)
07:03 🔗 SadDM has quit IRC (ny.us.hub irc.colosolutions.net)
07:03 🔗 timmc has quit IRC (ny.us.hub irc.colosolutions.net)
07:03 🔗 jspiros has quit IRC (ny.us.hub irc.colosolutions.net)
07:04 🔗 Stilett0 has joined #archiveteam-bs
07:04 🔗 trs80 has joined #archiveteam-bs
07:04 🔗 RedType has joined #archiveteam-bs
07:04 🔗 SadDM has joined #archiveteam-bs
07:04 🔗 timmc has joined #archiveteam-bs
07:04 🔗 jspiros has joined #archiveteam-bs
07:04 🔗 irc.colosolutions.net sets mode: +o SadDM
07:04 🔗 swebb sets mode: +o SadDM
07:08 🔗 Jonison has joined #archiveteam-bs
07:10 🔗 SHODAN_UI has joined #archiveteam-bs
07:48 🔗 j08nY has joined #archiveteam-bs
08:04 🔗 j08nY has quit IRC (Quit: Leaving)
08:44 🔗 kittymeow consensus is code for deletionist
08:49 🔗 kittymeow Just do what the corporations do and make 10 different alternate personalities, never edit at the same hours (graph mapping for edit times is a thing), use IP addresses in different places, edit on a wide variety of topics and rarely come together unless it can be seen as a valid coincidence, boom, you have your "consensus" which due to the way "votes" are advertised is rarely more than
08:49 🔗 kittymeow 10 or so ...
08:49 🔗 kittymeow ... people deciding something that affects milions of people - the corporations just tend to pay people to get stuff they want deleted or history manipulated, there's a cottage industry of "freeelancing" corrupt people on wikipedia, even admins
09:28 🔗 Sanqui <+Kradorex> Looks like Comcast may be interfering with Internet Archive (archive.org) traffic.
09:28 🔗 Sanqui <Xkeeper> oh?
09:28 🔗 Sanqui <+Kradorex> Someone from IA showed up to NANOG-l (an Internet operations professional list) and posted this in their first four paragraphs:
09:28 🔗 Sanqui <+Kradorex> "at the internet archive we have a strange problem at the moment.
09:28 🔗 Sanqui <+Kradorex> a slightly upstream device looks like it's returning icmp administratively unreachable for our main load balancer's ip address (which serves archive.org).
09:28 🔗 Sanqui <+Kradorex> comcast has interpreted this to remove or (maybe blackhole) connections to archive.org somewhere pretty close to the edge.
09:28 🔗 Sanqui <+Kradorex> but it is routing and completing connections perfectly to other ip addresses on the same /24."
09:28 🔗 Sanqui <+Kradorex> Which reportedly blog.archive.org which resolves to an IP in the same IP block that archive.org resolves is fine.
09:28 🔗 Sanqui <+Kradorex> But archive.org proper is inaccessible.
09:28 🔗 Sanqui <+Kradorex> IA's twitter is using the "some paths appear to be blocked" language:
09:28 🔗 Sanqui <+Kradorex> https://twitter.com/internetarchive/status/870514798064058370
09:28 🔗 Sanqui <+Sanqui> Kradorex: Somebody from Archive Team was reporting having issues with accessing archive.org too
09:28 🔗 Sanqui <+Kradorex> Probably related.
09:28 🔗 Sanqui <+Sanqui> may I share this?
09:28 🔗 Sanqui <+Kradorex> Please do.
09:28 🔗 Sanqui <+Kradorex> Incident begun ~3-4 hours ago.
09:34 🔗 MrRadar Huh, I'm seeing the same issue on Centurylink DSL
09:34 🔗 MrRadar I can reach blog.archive.org but archive.org itself is inaccessible
09:39 🔗 JAA FWIW, it works for me from Central Europe.
09:39 🔗 Sanqui <+Kradorex> I'd be interested to see traceroutes
09:39 🔗 Sanqui MrRadar, godane, ranma: can you please traceroute archive.org?
09:40 🔗 MrRadar Sure. I'll post my traceroute to blog.archive.org as well
09:42 🔗 MrRadar Sanqui: https://pastebin.com/JPY77Wfm
09:43 🔗 MrRadar Huh, looks like Centurylink is routing through Comcast's network to reach archive.org
09:46 🔗 Sanqui <Kradorex> https://pbs.twimg.com/media/DBS5mAvU0AExyP6.jpg
09:46 🔗 Sanqui <Kradorex> Sample traceroute from someone else.
09:54 🔗 kittymeow Oh and the other thing I forgot to mention about wikipedia is that a lot of the founders are heavily involved with Wikia which runs commercial wikis which basically have all the stuff that wikipedia deletes off an infinite encyclopedia, so that pushes a lot of the policy to drive it to wikia where they put autoplaying videos and banner ads and tracking pixels all over it
09:56 🔗 Sanqui fuck wikia
10:09 🔗 BartoCH has quit IRC (Quit: WeeChat 1.8)
10:12 🔗 BartoCH has joined #archiveteam-bs
10:22 🔗 Jonison2 has joined #archiveteam-bs
10:25 🔗 Jonison has quit IRC (Ping timeout: 260 seconds)
10:46 🔗 Jonison2 has quit IRC (Quit: Leaving)
10:52 🔗 godane MrRadar: https://pastebin.com/ajVSSKdU
10:54 🔗 JAA Sanqui: ^
10:55 🔗 Sanqui cheers
10:56 🔗 godane so i just ripped the original airing Alien Autopsy
10:57 🔗 godane i had it on vhs on the back of my Forrest Gump tape
11:04 🔗 Metruptio has joined #archiveteam-bs
11:13 🔗 godane so i got a E! True Hollywood Story about Gary Busey
11:14 🔗 godane the episode is not even on TV-Vault
11:56 🔗 BlueMaxim has quit IRC (Quit: Leaving)
12:05 🔗 icedice has joined #archiveteam-bs
13:55 🔗 jrwr Im so glad I have a US Based ISP (Charter) that has high speeds, no caps, and doesn't fuck around
14:34 🔗 REiN^ has quit IRC (Max SendQ exceeded)
14:34 🔗 DFJustin has quit IRC (Remote host closed the connection)
14:34 🔗 DFJustin has joined #archiveteam-bs
14:34 🔗 swebb sets mode: +o DFJustin
14:38 🔗 REiN^ has joined #archiveteam-bs
14:44 🔗 Aranje has joined #archiveteam-bs
14:55 🔗 TheLovina has joined #archiveteam-bs
15:17 🔗 icedice has quit IRC (Quit: Leaving)
15:27 🔗 j08nY has joined #archiveteam-bs
15:43 🔗 kyounko has joined #archiveteam-bs
16:56 🔗 Stilett0 is now known as Stiletto
17:11 🔗 antomatic has quit IRC (Ping timeout: 268 seconds)
17:21 🔗 REiN^ has quit IRC (Max SendQ exceeded)
17:21 🔗 REiN^ has joined #archiveteam-bs
17:27 🔗 Ravenloft has quit IRC (Read error: Operation timed out)
17:54 🔗 SmileyG has joined #archiveteam-bs
17:54 🔗 Smiley has quit IRC (Read error: Connection reset by peer)
17:56 🔗 DFJustin has quit IRC (Remote host closed the connection)
17:56 🔗 DFJustin has joined #archiveteam-bs
17:56 🔗 swebb sets mode: +o DFJustin
18:03 🔗 antomatic has joined #archiveteam-bs
18:03 🔗 swebb sets mode: +o antomatic
18:05 🔗 SHODAN_UI has quit IRC (Read error: Connection reset by peer)
18:09 🔗 SHODAN_UI has joined #archiveteam-bs
18:29 🔗 icedice has joined #archiveteam-bs
18:35 🔗 alembic I have a pet theory that all ISPs are evil, just some more subtley than others...
18:37 🔗 MrRadar Yeah, these days it's def. true
18:37 🔗 MrRadar I miss the days of dialup when there was real competition in the ISP market :(
18:37 🔗 MrRadar (Though I don't miss the speeds of dial up)
18:40 🔗 alembic I hear there was a pretty good cow-themed ISP at one point ;)
18:41 🔗 timmc I like my information superhighway like I like my coffee mugs?
18:56 🔗 ranma downloading PSX ISOs on dialup D:
18:56 🔗 ranma lordy
18:58 🔗 MrRadar I remember downloading the BeOS "Personal Edition" installer over dialup
18:58 🔗 MrRadar It took *hours*
19:00 🔗 ranma PSX ISOs took days
19:00 🔗 MrRadar Wow, just looked up the size and it was "only" 48 megabytes
19:01 🔗 MrRadar Yeah, I can't imagine trying to download full ISOs at that speed
19:16 🔗 Sanqui Q: should we ignore pages with individual posts when archiving fora?
19:16 🔗 dashcloud has quit IRC (Ping timeout: 268 seconds)
19:16 🔗 Sanqui those can easily increase the number of pages twenty-fold
19:17 🔗 Sanqui (assuming twenty posts per forum page)
19:17 🔗 Sanqui s/forum/thread/
19:17 🔗 Sanqui JAA: thoughts?
19:19 🔗 dashcloud has joined #archiveteam-bs
19:20 🔗 MrRadar I generally ignore them, especially through Archivebot
19:21 🔗 Sanqui OK, then should they be added to the forums ignore set?
19:21 🔗 Sanqui I'm archiving four forums right now and have been adding to it
19:21 🔗 MrRadar The problem is that there's so many ways for forums to structure their URLs so it's hard to write generic rules
19:22 🔗 Sanqui that's fine, that's why we add to the rules
19:22 🔗 Sanqui sure you can have custom forums but phpbb, smf etc. are gonna be roughly the same
19:22 🔗 Sanqui another question is the print/archive pages, like https://forums.arcade-museum.com/archive/index.php/t-298176.html
19:22 🔗 Sanqui i think those can be pretty useful for anybody scraping, but they're technically dupe info.
19:23 🔗 Xibalba has quit IRC (ZNC 1.7.x-git-737-29d4f20-frankenznc - http://znc.in)
19:23 🔗 Xibalba has joined #archiveteam-bs
19:24 🔗 Sanqui if we agree that they should be ignored, i shall improve the list
19:25 🔗 MrRadar Hmm.. for those I wouldn't ban them by default
19:25 🔗 MrRadar Since some forums only show the "last 30 days" or w/e in their main indexes
19:25 🔗 MrRadar But do list all threads in their "archive" indexes
19:26 🔗 Sanqui okay. (archives often strip stuff like quotes, so they're worse for preservation though)
19:26 🔗 Sanqui i wish json had fucking comments
19:26 🔗 Sanqui or can we switch to json
19:26 🔗 xmc forums are terrible miserable repositories of culture
19:27 🔗 Sanqui yes
19:27 🔗 Sanqui they hold so much valuable content - especially for the people who took part in their existence
19:27 🔗 Sanqui but they're a dying breed
19:28 🔗 Sanqui I'm archiving forums even tangentially pertaining to my interests, not waiting for announcements. would love to get something larger-scale going
19:29 🔗 Sanqui 10 archived years and 1 final, low-activity year before shutdown missing is a desirable outcome
19:29 🔗 Sanqui s/missing//
19:29 🔗 Sanqui nvm, that word was fine
19:30 🔗 xmc yeah
19:30 🔗 xmc i'm slowly puttering away at my forum scraper
19:30 🔗 xmc slowly
19:30 🔗 xmc ugh
19:30 🔗 xmc i wish i had more time and more drive
19:30 🔗 Sanqui i got one for zetaboards if you're interested
19:30 🔗 Sanqui works best if you give it an admin account though
19:31 🔗 xmc i'm scraping into usenet format, like gmane used to be
19:31 🔗 Sanqui i'd like to make one that just registers and logs in and scrapes away member-only forums (rare) and user pages (common)
19:31 🔗 Sanqui xmc: is your scraper 'pluggable'?
19:32 🔗 xmc eh, kinda
19:32 🔗 Sanqui as in, is it easy to write a module for a new forum sw
19:32 🔗 xmc i think so but i haven't tried
19:32 🔗 Ravenloft has joined #archiveteam-bs
19:32 🔗 Sanqui is it on github? :)
19:32 🔗 xmc it's not public yet :P
19:33 🔗 arkiver xmc: kind of a wikiteam tool for forums?
19:34 🔗 arkiver like those wiki dumps being uploaded to IA
19:34 🔗 xmc kinda
19:34 🔗 arkiver nice
19:34 🔗 Sanqui so i am adding "/viewtopic\\.php\\?p=\\d+"
19:34 🔗 xmc with attachments and images and stuff too, as mime attachments
19:34 🔗 xmc like adults
19:34 🔗 xmc it's slow going because
19:35 🔗 xmc it's slow going because i'm being very particular about how i want its output to look
19:35 🔗 Sanqui well, it's within my (quite wide) area of interest
19:35 🔗 xmc and there's a ton of sysadminning yet to do
19:36 🔗 xmc like i have a newsspool full of malformed messages from one forum that i sucked in before it went away
19:36 🔗 xmc so i have to read those in and re-format them
19:36 🔗 xmc (i save all the original as-received data in the mime body, no worries there)
19:36 🔗 Sanqui [thumbs up]
19:36 🔗 Sanqui always derive
19:37 🔗 Sanqui (never integrate unless left with no other choice)
19:37 🔗 xmc ...
19:37 🔗 * xmc points to the door
19:37 🔗 xmc nerds stay outside
19:37 🔗 MrRadar I've been working on archiving one of the web's bigger (but still dying) forums. It's complete up through the end of last year and I'm working on a script to check for incremental updates to it
19:38 🔗 MrRadar (I don't want to name them explicitly since there's a good chance they'd ban me for scraping their paid "archives")
19:38 🔗 Sanqui nice
19:38 🔗 xmc ohhhh
19:38 🔗 xmc hm
19:38 🔗 xmc they were on my target list but they don't work with my scraper so,
19:39 🔗 xmc if they're the site with paid archives that i think they are
19:39 🔗 MrRadar Probably ;) I don't know how many others do
19:40 🔗 Sanqui good
19:41 🔗 Sanqui i'll continue going after the more obscure, yet fascinating places myself :)
19:41 🔗 xmc is there an archiveteam subcommittee on webforums yet
19:41 🔗 xmc or does it still need a snappy name
19:41 🔗 Sanqui Message Bored
19:41 🔗 Sanqui you're welcome
19:41 🔗 xmc #msgbored ? great
19:42 🔗 xmc and done, everyone join there if want
19:42 🔗 Sanqui the joke is that people got bored of message boards and that's why they're dying and need saving
19:42 🔗 Sanqui (sorry, had to drive that home)
19:42 🔗 xmc :)
20:42 🔗 BartoCH scii western
20:42 🔗 BartoCH whops, typo
20:42 🔗 BartoCH for another channel, my bad
22:06 🔗 JAA Sanqui: Regarding your earlier question, I'd also ignore individual posts but keep archives. I'm not sure about printable versions; I guess I'm leaning towards ignoring them.
22:07 🔗 schbirid !
22:07 🔗 schbirid never grab individual posts, it makes the grab much more intense and is just redundant
22:13 🔗 schbirid has quit IRC (Quit: Leaving)
22:17 🔗 timmc I feel like at best it's a lazy way of getting the contents of whatever inbound links to posts might exist, rather than inferring them from the threads.
22:27 🔗 ZexaronS has joined #archiveteam-bs
22:28 🔗 ZexaronS has quit IRC (Client Quit)
22:40 🔗 SHODAN_UI has quit IRC (Remote host closed the connection)

irclogger-viewer