#archiveteam-bs 2017-06-02,Fri

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***BlueMaxim has joined #archiveteam-bs [00:13]
..... (idle for 20mn)
j08nY has quit IRC (Quit: Leaving) [00:33]
....... (idle for 34mn)
Nazca has quit IRC (Read error: Connection reset by peer)
Nazca has joined #archiveteam-bs
Stilett0 has quit IRC (Ping timeout: 250 seconds)
[01:07]
.... (idle for 15mn)
godanelooks like IA is down [01:27]
.... (idle for 19mn)
***Stilett0 has joined #archiveteam-bs [01:46]
........ (idle for 37mn)
joepie91oh god
http://www.html5zombo.com/
[02:23]
jrwropps [02:30]
Question, I have a Knowledge Base from a Point of Sale Product that is still in heavy use today
but the company that owns it has completely wiped the damn thing and no longer provides ANY copies of it
I have a copy in Text Files currently, What do
(As Text files)
[02:39]
YurumeI have a good news for Daum Tvpot http://www.archiveteam.org/index.php?title=Daum_Tvpot --- the company has decided to follow the proper archiving process in lieu of public feedbacks
personally I was busy enough and unable to track that, but I'm quite glad that the company does have a good sense of being correct
(I did make developers aware of the public concerns, at the least)
[02:40]
***icedice has quit IRC (Ping timeout: 245 seconds) [02:56]
joepie91jrwr: upload whatever you have to an item as files, I guess, and make sure it's well-tagged [02:56]
jrwrSince its copy righted all to hell
I sent a copy to Mr Scott for safe keeping
[02:58]
........ (idle for 36mn)
SketchCowWe're all gonna die [03:34]
...... (idle for 28mn)
***ndiddy has quit IRC () [04:02]
Sk1d has quit IRC (Ping timeout: 194 seconds) [04:15]
Sk1d has joined #archiveteam-bs [04:21]
ranmaRIP archive.org
wat
i can't get to it
RIP Net Neutrality
down for everyone or just me says it's up
[04:22]
jrwrSketchCow: Now now, Its not THAT bad of a KB
:3
[04:28]
ranmado you have an arcade machine, SketchCow? [04:31]
Odd0002archive.org is up for me (Chicagoland US) [04:44]
godanewhats odd is it doesn't work ip address
so there blocking or connecting is failing for some reason
[04:53]
***superkuh has quit IRC (Remote host closed the connection)
superkuh has joined #archiveteam-bs
[04:58]
Aranje has quit IRC (Quit: Three sheets to the wind) [05:08]
Odd0002207.241.224.2 is what I get, godane
but it instantly redirects me to archive.org
[05:14]
godanethat was the ip i was using [05:18]
***j08nY has joined #archiveteam-bs [05:22]
Odd0002ah ok, guess it just redirects to "archive.org" instead of serving up the site contents or something [05:26]
hook54321Wikipedia is permanently deleting some "covfefe" pages on some wikis, including the history, so people can't go back and view the content that was on the pages.
Example: https://simple.wikipedia.org/wiki/Covfefe
Google Cache of page: http://webcache.googleusercontent.com/search?q=cache%3Ahttps%3A%2F%2Fsimple.wikipedia.org%2Fwiki%2FCovfefe
[05:33]
ranmahttp://deletionpedia.org/w/index.php?title=Special:RecentChanges&hidebots=0 [05:48]
.... (idle for 16mn)
***j08nY has quit IRC (Read error: Operation timed out) [06:04]
....... (idle for 30mn)
DopefishJ is now known as DFJustin [06:34]
schbirid has joined #archiveteam-bs [06:42]
..... (idle for 21mn)
Stilett0 has quit IRC (ny.us.hub irc.colosolutions.net)
trs80 has quit IRC (ny.us.hub irc.colosolutions.net)
RedType has quit IRC (ny.us.hub irc.colosolutions.net)
SadDM has quit IRC (ny.us.hub irc.colosolutions.net)
timmc has quit IRC (ny.us.hub irc.colosolutions.net)
jspiros has quit IRC (ny.us.hub irc.colosolutions.net)
Stilett0 has joined #archiveteam-bs
trs80 has joined #archiveteam-bs
RedType has joined #archiveteam-bs
SadDM has joined #archiveteam-bs
timmc has joined #archiveteam-bs
jspiros has joined #archiveteam-bs
irc.colosolutions.net sets mode: +o SadDM
swebb sets mode: +o SadDM
Jonison has joined #archiveteam-bs
SHODAN_UI has joined #archiveteam-bs
[07:03]
........ (idle for 38mn)
j08nY has joined #archiveteam-bs [07:48]
.... (idle for 16mn)
j08nY has quit IRC (Quit: Leaving) [08:04]
......... (idle for 40mn)
kittymeowconsensus is code for deletionist [08:44]
Just do what the corporations do and make 10 different alternate personalities, never edit at the same hours (graph mapping for edit times is a thing), use IP addresses in different places, edit on a wide variety of topics and rarely come together unless it can be seen as a valid coincidence, boom, you have your "consensus" which due to the way "votes" are advertised is rarely more than
10 or so ...
... people deciding something that affects milions of people - the corporations just tend to pay people to get stuff they want deleted or history manipulated, there's a cottage industry of "freeelancing" corrupt people on wikipedia, even admins
[08:49]
........ (idle for 39mn)
Sanqui<+Kradorex> Looks like Comcast may be interfering with Internet Archive (archive.org) traffic.
<Xkeeper> oh?
<+Kradorex> Someone from IA showed up to NANOG-l (an Internet operations professional list) and posted this in their first four paragraphs:
<+Kradorex> "at the internet archive we have a strange problem at the moment.
<+Kradorex> a slightly upstream device looks like it's returning icmp administratively unreachable for our main load balancer's ip address (which serves archive.org).
<+Kradorex> comcast has interpreted this to remove or (maybe blackhole) connections to archive.org somewhere pretty close to the edge.
<+Kradorex> but it is routing and completing connections perfectly to other ip addresses on the same /24."
<+Kradorex> Which reportedly blog.archive.org which resolves to an IP in the same IP block that archive.org resolves is fine.
<+Kradorex> But archive.org proper is inaccessible.
<+Kradorex> IA's twitter is using the "some paths appear to be blocked" language:
<+Kradorex> https://twitter.com/internetarchive/status/870514798064058370
<+Sanqui> Kradorex: Somebody from Archive Team was reporting having issues with accessing archive.org too
<+Kradorex> Probably related.
<+Sanqui> may I share this?
<+Kradorex> Please do.
<+Kradorex> Incident begun ~3-4 hours ago.
[09:28]
MrRadarHuh, I'm seeing the same issue on Centurylink DSL
I can reach blog.archive.org but archive.org itself is inaccessible
[09:34]
JAAFWIW, it works for me from Central Europe. [09:39]
Sanqui<+Kradorex> I'd be interested to see traceroutes
MrRadar, godane, ranma: can you please traceroute archive.org?
[09:39]
MrRadarSure. I'll post my traceroute to blog.archive.org as well
Sanqui: https://pastebin.com/JPY77Wfm
Huh, looks like Centurylink is routing through Comcast's network to reach archive.org
[09:40]
Sanqui<Kradorex> https://pbs.twimg.com/media/DBS5mAvU0AExyP6.jpg
<Kradorex> Sample traceroute from someone else.
[09:46]
kittymeowOh and the other thing I forgot to mention about wikipedia is that a lot of the founders are heavily involved with Wikia which runs commercial wikis which basically have all the stuff that wikipedia deletes off an infinite encyclopedia, so that pushes a lot of the policy to drive it to wikia where they put autoplaying videos and banner ads and tracking pixels all over it [09:54]
Sanquifuck wikia [09:56]
***BartoCH has quit IRC (Quit: WeeChat 1.8)
BartoCH has joined #archiveteam-bs
[10:09]
Jonison2 has joined #archiveteam-bs
Jonison has quit IRC (Ping timeout: 260 seconds)
[10:22]
..... (idle for 21mn)
Jonison2 has quit IRC (Quit: Leaving) [10:46]
godaneMrRadar: https://pastebin.com/ajVSSKdU [10:52]
JAASanqui: ^ [10:54]
Sanquicheers [10:55]
godaneso i just ripped the original airing Alien Autopsy
i had it on vhs on the back of my Forrest Gump tape
[10:56]
***Metruptio has joined #archiveteam-bs [11:04]
godaneso i got a E! True Hollywood Story about Gary Busey
the episode is not even on TV-Vault
[11:13]
......... (idle for 42mn)
***BlueMaxim has quit IRC (Quit: Leaving) [11:56]
icedice has joined #archiveteam-bs [12:05]
....................... (idle for 1h50mn)
jrwrIm so glad I have a US Based ISP (Charter) that has high speeds, no caps, and doesn't fuck around [13:55]
........ (idle for 39mn)
***REiN^ has quit IRC (Max SendQ exceeded)
DFJustin has quit IRC (Remote host closed the connection)
DFJustin has joined #archiveteam-bs
swebb sets mode: +o DFJustin
REiN^ has joined #archiveteam-bs
[14:34]
Aranje has joined #archiveteam-bs [14:44]
TheLovina has joined #archiveteam-bs [14:55]
..... (idle for 22mn)
icedice has quit IRC (Quit: Leaving) [15:17]
j08nY has joined #archiveteam-bs [15:27]
.... (idle for 16mn)
kyounko has joined #archiveteam-bs [15:43]
............... (idle for 1h13mn)
Stilett0 is now known as Stiletto [16:56]
.... (idle for 15mn)
antomatic has quit IRC (Ping timeout: 268 seconds) [17:11]
REiN^ has quit IRC (Max SendQ exceeded)
REiN^ has joined #archiveteam-bs
[17:21]
Ravenloft has quit IRC (Read error: Operation timed out) [17:27]
...... (idle for 27mn)
SmileyG has joined #archiveteam-bs
Smiley has quit IRC (Read error: Connection reset by peer)
DFJustin has quit IRC (Remote host closed the connection)
DFJustin has joined #archiveteam-bs
swebb sets mode: +o DFJustin
[17:54]
antomatic has joined #archiveteam-bs
swebb sets mode: +o antomatic
SHODAN_UI has quit IRC (Read error: Connection reset by peer)
SHODAN_UI has joined #archiveteam-bs
[18:03]
..... (idle for 20mn)
icedice has joined #archiveteam-bs [18:29]
alembicI have a pet theory that all ISPs are evil, just some more subtley than others... [18:35]
MrRadarYeah, these days it's def. true
I miss the days of dialup when there was real competition in the ISP market :(
(Though I don't miss the speeds of dial up)
[18:37]
alembicI hear there was a pretty good cow-themed ISP at one point ;) [18:40]
timmcI like my information superhighway like I like my coffee mugs? [18:41]
.... (idle for 15mn)
ranmadownloading PSX ISOs on dialup D:
lordy
[18:56]
MrRadarI remember downloading the BeOS "Personal Edition" installer over dialup
It took *hours*
[18:58]
ranmaPSX ISOs took days [19:00]
MrRadarWow, just looked up the size and it was "only" 48 megabytes
Yeah, I can't imagine trying to download full ISOs at that speed
[19:00]
.... (idle for 15mn)
SanquiQ: should we ignore pages with individual posts when archiving fora? [19:16]
***dashcloud has quit IRC (Ping timeout: 268 seconds) [19:16]
Sanquithose can easily increase the number of pages twenty-fold
(assuming twenty posts per forum page)
s/forum/thread/
JAA: thoughts?
[19:16]
***dashcloud has joined #archiveteam-bs [19:19]
MrRadarI generally ignore them, especially through Archivebot [19:20]
SanquiOK, then should they be added to the forums ignore set?
I'm archiving four forums right now and have been adding to it
[19:21]
MrRadarThe problem is that there's so many ways for forums to structure their URLs so it's hard to write generic rules [19:21]
Sanquithat's fine, that's why we add to the rules
sure you can have custom forums but phpbb, smf etc. are gonna be roughly the same
another question is the print/archive pages, like https://forums.arcade-museum.com/archive/index.php/t-298176.html
i think those can be pretty useful for anybody scraping, but they're technically dupe info.
[19:22]
***Xibalba has quit IRC (ZNC 1.7.x-git-737-29d4f20-frankenznc - http://znc.in)
Xibalba has joined #archiveteam-bs
[19:23]
Sanquiif we agree that they should be ignored, i shall improve the list [19:24]
MrRadarHmm.. for those I wouldn't ban them by default
Since some forums only show the "last 30 days" or w/e in their main indexes
But do list all threads in their "archive" indexes
[19:25]
Sanquiokay. (archives often strip stuff like quotes, so they're worse for preservation though)
i wish json had fucking comments
or can we switch to json
[19:26]
xmcforums are terrible miserable repositories of culture [19:26]
Sanquiyes
they hold so much valuable content - especially for the people who took part in their existence
but they're a dying breed
I'm archiving forums even tangentially pertaining to my interests, not waiting for announcements. would love to get something larger-scale going
10 archived years and 1 final, low-activity year before shutdown missing is a desirable outcome
s/missing//
nvm, that word was fine
[19:27]
xmcyeah
i'm slowly puttering away at my forum scraper
slowly
ugh
i wish i had more time and more drive
[19:30]
Sanquii got one for zetaboards if you're interested
works best if you give it an admin account though
[19:30]
xmci'm scraping into usenet format, like gmane used to be [19:31]
Sanquii'd like to make one that just registers and logs in and scrapes away member-only forums (rare) and user pages (common)
xmc: is your scraper 'pluggable'?
[19:31]
xmceh, kinda [19:32]
Sanquias in, is it easy to write a module for a new forum sw [19:32]
xmci think so but i haven't tried [19:32]
***Ravenloft has joined #archiveteam-bs [19:32]
Sanquiis it on github? :) [19:32]
xmcit's not public yet :P [19:32]
arkiverxmc: kind of a wikiteam tool for forums?
like those wiki dumps being uploaded to IA
[19:33]
xmckinda [19:34]
arkivernice [19:34]
Sanquiso i am adding "/viewtopic\\.php\\?p=\\d+" [19:34]
xmcwith attachments and images and stuff too, as mime attachments
like adults
it's slow going because
it's slow going because i'm being very particular about how i want its output to look
[19:34]
Sanquiwell, it's within my (quite wide) area of interest [19:35]
xmcand there's a ton of sysadminning yet to do
like i have a newsspool full of malformed messages from one forum that i sucked in before it went away
so i have to read those in and re-format them
(i save all the original as-received data in the mime body, no worries there)
[19:35]
Sanqui[thumbs up]
always derive
(never integrate unless left with no other choice)
[19:36]
xmc...
xmc points to the door
nerds stay outside
[19:37]
MrRadarI've been working on archiving one of the web's bigger (but still dying) forums. It's complete up through the end of last year and I'm working on a script to check for incremental updates to it
(I don't want to name them explicitly since there's a good chance they'd ban me for scraping their paid "archives")
[19:37]
Sanquinice [19:38]
xmcohhhh
hm
they were on my target list but they don't work with my scraper so,
if they're the site with paid archives that i think they are
[19:38]
MrRadarProbably ;) I don't know how many others do [19:39]
Sanquigood
i'll continue going after the more obscure, yet fascinating places myself :)
[19:40]
xmcis there an archiveteam subcommittee on webforums yet
or does it still need a snappy name
[19:41]
SanquiMessage Bored
you're welcome
[19:41]
xmc#msgbored ? great
and done, everyone join there if want
[19:41]
Sanquithe joke is that people got bored of message boards and that's why they're dying and need saving
(sorry, had to drive that home)
[19:42]
xmc:) [19:42]
............. (idle for 1h0mn)
BartoCHscii western
whops, typo
for another channel, my bad
[20:42]
................. (idle for 1h24mn)
JAASanqui: Regarding your earlier question, I'd also ignore individual posts but keep archives. I'm not sure about printable versions; I guess I'm leaning towards ignoring them. [22:06]
schbirid!
never grab individual posts, it makes the grab much more intense and is just redundant
[22:07]
***schbirid has quit IRC (Quit: Leaving) [22:13]
timmcI feel like at best it's a lazy way of getting the contents of whatever inbound links to posts might exist, rather than inferring them from the threads. [22:17]
***ZexaronS has joined #archiveteam-bs
ZexaronS has quit IRC (Client Quit)
[22:27]
SHODAN_UI has quit IRC (Remote host closed the connection) [22:40]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)