Time |
Nickname |
Message |
00:13
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
00:33
🔗
|
|
j08nY has quit IRC (Quit: Leaving) |
01:07
🔗
|
|
Nazca has quit IRC (Read error: Connection reset by peer) |
01:08
🔗
|
|
Nazca has joined #archiveteam-bs |
01:12
🔗
|
|
Stilett0 has quit IRC (Ping timeout: 250 seconds) |
01:27
🔗
|
godane |
looks like IA is down |
01:46
🔗
|
|
Stilett0 has joined #archiveteam-bs |
02:23
🔗
|
joepie91 |
oh god |
02:23
🔗
|
joepie91 |
http://www.html5zombo.com/ |
02:30
🔗
|
jrwr |
opps |
02:39
🔗
|
jrwr |
Question, I have a Knowledge Base from a Point of Sale Product that is still in heavy use today |
02:39
🔗
|
jrwr |
but the company that owns it has completely wiped the damn thing and no longer provides ANY copies of it |
02:39
🔗
|
jrwr |
I have a copy in Text Files currently, What do |
02:39
🔗
|
jrwr |
(As Text files) |
02:40
🔗
|
Yurume |
I have a good news for Daum Tvpot http://www.archiveteam.org/index.php?title=Daum_Tvpot --- the company has decided to follow the proper archiving process in lieu of public feedbacks |
02:40
🔗
|
Yurume |
personally I was busy enough and unable to track that, but I'm quite glad that the company does have a good sense of being correct |
02:42
🔗
|
Yurume |
(I did make developers aware of the public concerns, at the least) |
02:56
🔗
|
|
icedice has quit IRC (Ping timeout: 245 seconds) |
02:56
🔗
|
joepie91 |
jrwr: upload whatever you have to an item as files, I guess, and make sure it's well-tagged |
02:58
🔗
|
jrwr |
Since its copy righted all to hell |
02:58
🔗
|
jrwr |
I sent a copy to Mr Scott for safe keeping |
03:34
🔗
|
SketchCow |
We're all gonna die |
04:02
🔗
|
|
ndiddy has quit IRC () |
04:15
🔗
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
04:21
🔗
|
|
Sk1d has joined #archiveteam-bs |
04:22
🔗
|
ranma |
RIP archive.org |
04:22
🔗
|
ranma |
wat |
04:22
🔗
|
ranma |
i can't get to it |
04:22
🔗
|
ranma |
RIP Net Neutrality |
04:23
🔗
|
ranma |
down for everyone or just me says it's up |
04:28
🔗
|
jrwr |
SketchCow: Now now, Its not THAT bad of a KB |
04:28
🔗
|
jrwr |
:3 |
04:31
🔗
|
ranma |
do you have an arcade machine, SketchCow? |
04:44
🔗
|
Odd0002 |
archive.org is up for me (Chicagoland US) |
04:53
🔗
|
godane |
whats odd is it doesn't work ip address |
04:54
🔗
|
godane |
so there blocking or connecting is failing for some reason |
04:58
🔗
|
|
superkuh has quit IRC (Remote host closed the connection) |
05:00
🔗
|
|
superkuh has joined #archiveteam-bs |
05:08
🔗
|
|
Aranje has quit IRC (Quit: Three sheets to the wind) |
05:14
🔗
|
Odd0002 |
207.241.224.2 is what I get, godane |
05:15
🔗
|
Odd0002 |
but it instantly redirects me to archive.org |
05:18
🔗
|
godane |
that was the ip i was using |
05:22
🔗
|
|
j08nY has joined #archiveteam-bs |
05:26
🔗
|
Odd0002 |
ah ok, guess it just redirects to "archive.org" instead of serving up the site contents or something |
05:33
🔗
|
hook54321 |
Wikipedia is permanently deleting some "covfefe" pages on some wikis, including the history, so people can't go back and view the content that was on the pages. |
05:34
🔗
|
hook54321 |
Example: https://simple.wikipedia.org/wiki/Covfefe |
05:34
🔗
|
hook54321 |
Google Cache of page: http://webcache.googleusercontent.com/search?q=cache%3Ahttps%3A%2F%2Fsimple.wikipedia.org%2Fwiki%2FCovfefe |
05:48
🔗
|
ranma |
http://deletionpedia.org/w/index.php?title=Special:RecentChanges&hidebots=0 |
06:04
🔗
|
|
j08nY has quit IRC (Read error: Operation timed out) |
06:34
🔗
|
|
DopefishJ is now known as DFJustin |
06:42
🔗
|
|
schbirid has joined #archiveteam-bs |
07:03
🔗
|
|
Stilett0 has quit IRC (ny.us.hub irc.colosolutions.net) |
07:03
🔗
|
|
trs80 has quit IRC (ny.us.hub irc.colosolutions.net) |
07:03
🔗
|
|
RedType has quit IRC (ny.us.hub irc.colosolutions.net) |
07:03
🔗
|
|
SadDM has quit IRC (ny.us.hub irc.colosolutions.net) |
07:03
🔗
|
|
timmc has quit IRC (ny.us.hub irc.colosolutions.net) |
07:03
🔗
|
|
jspiros has quit IRC (ny.us.hub irc.colosolutions.net) |
07:04
🔗
|
|
Stilett0 has joined #archiveteam-bs |
07:04
🔗
|
|
trs80 has joined #archiveteam-bs |
07:04
🔗
|
|
RedType has joined #archiveteam-bs |
07:04
🔗
|
|
SadDM has joined #archiveteam-bs |
07:04
🔗
|
|
timmc has joined #archiveteam-bs |
07:04
🔗
|
|
jspiros has joined #archiveteam-bs |
07:04
🔗
|
|
irc.colosolutions.net sets mode: +o SadDM |
07:04
🔗
|
|
swebb sets mode: +o SadDM |
07:08
🔗
|
|
Jonison has joined #archiveteam-bs |
07:10
🔗
|
|
SHODAN_UI has joined #archiveteam-bs |
07:48
🔗
|
|
j08nY has joined #archiveteam-bs |
08:04
🔗
|
|
j08nY has quit IRC (Quit: Leaving) |
08:44
🔗
|
kittymeow |
consensus is code for deletionist |
08:49
🔗
|
kittymeow |
Just do what the corporations do and make 10 different alternate personalities, never edit at the same hours (graph mapping for edit times is a thing), use IP addresses in different places, edit on a wide variety of topics and rarely come together unless it can be seen as a valid coincidence, boom, you have your "consensus" which due to the way "votes" are advertised is rarely more than |
08:49
🔗
|
kittymeow |
10 or so ... |
08:49
🔗
|
kittymeow |
... people deciding something that affects milions of people - the corporations just tend to pay people to get stuff they want deleted or history manipulated, there's a cottage industry of "freeelancing" corrupt people on wikipedia, even admins |
09:28
🔗
|
Sanqui |
<+Kradorex> Looks like Comcast may be interfering with Internet Archive (archive.org) traffic. |
09:28
🔗
|
Sanqui |
<Xkeeper> oh? |
09:28
🔗
|
Sanqui |
<+Kradorex> Someone from IA showed up to NANOG-l (an Internet operations professional list) and posted this in their first four paragraphs: |
09:28
🔗
|
Sanqui |
<+Kradorex> "at the internet archive we have a strange problem at the moment. |
09:28
🔗
|
Sanqui |
<+Kradorex> a slightly upstream device looks like it's returning icmp administratively unreachable for our main load balancer's ip address (which serves archive.org). |
09:28
🔗
|
Sanqui |
<+Kradorex> comcast has interpreted this to remove or (maybe blackhole) connections to archive.org somewhere pretty close to the edge. |
09:28
🔗
|
Sanqui |
<+Kradorex> but it is routing and completing connections perfectly to other ip addresses on the same /24." |
09:28
🔗
|
Sanqui |
<+Kradorex> Which reportedly blog.archive.org which resolves to an IP in the same IP block that archive.org resolves is fine. |
09:28
🔗
|
Sanqui |
<+Kradorex> But archive.org proper is inaccessible. |
09:28
🔗
|
Sanqui |
<+Kradorex> IA's twitter is using the "some paths appear to be blocked" language: |
09:28
🔗
|
Sanqui |
<+Kradorex> https://twitter.com/internetarchive/status/870514798064058370 |
09:28
🔗
|
Sanqui |
<+Sanqui> Kradorex: Somebody from Archive Team was reporting having issues with accessing archive.org too |
09:28
🔗
|
Sanqui |
<+Kradorex> Probably related. |
09:28
🔗
|
Sanqui |
<+Sanqui> may I share this? |
09:28
🔗
|
Sanqui |
<+Kradorex> Please do. |
09:28
🔗
|
Sanqui |
<+Kradorex> Incident begun ~3-4 hours ago. |
09:34
🔗
|
MrRadar |
Huh, I'm seeing the same issue on Centurylink DSL |
09:34
🔗
|
MrRadar |
I can reach blog.archive.org but archive.org itself is inaccessible |
09:39
🔗
|
JAA |
FWIW, it works for me from Central Europe. |
09:39
🔗
|
Sanqui |
<+Kradorex> I'd be interested to see traceroutes |
09:39
🔗
|
Sanqui |
MrRadar, godane, ranma: can you please traceroute archive.org? |
09:40
🔗
|
MrRadar |
Sure. I'll post my traceroute to blog.archive.org as well |
09:42
🔗
|
MrRadar |
Sanqui: https://pastebin.com/JPY77Wfm |
09:43
🔗
|
MrRadar |
Huh, looks like Centurylink is routing through Comcast's network to reach archive.org |
09:46
🔗
|
Sanqui |
<Kradorex> https://pbs.twimg.com/media/DBS5mAvU0AExyP6.jpg |
09:46
🔗
|
Sanqui |
<Kradorex> Sample traceroute from someone else. |
09:54
🔗
|
kittymeow |
Oh and the other thing I forgot to mention about wikipedia is that a lot of the founders are heavily involved with Wikia which runs commercial wikis which basically have all the stuff that wikipedia deletes off an infinite encyclopedia, so that pushes a lot of the policy to drive it to wikia where they put autoplaying videos and banner ads and tracking pixels all over it |
09:56
🔗
|
Sanqui |
fuck wikia |
10:09
🔗
|
|
BartoCH has quit IRC (Quit: WeeChat 1.8) |
10:12
🔗
|
|
BartoCH has joined #archiveteam-bs |
10:22
🔗
|
|
Jonison2 has joined #archiveteam-bs |
10:25
🔗
|
|
Jonison has quit IRC (Ping timeout: 260 seconds) |
10:46
🔗
|
|
Jonison2 has quit IRC (Quit: Leaving) |
10:52
🔗
|
godane |
MrRadar: https://pastebin.com/ajVSSKdU |
10:54
🔗
|
JAA |
Sanqui: ^ |
10:55
🔗
|
Sanqui |
cheers |
10:56
🔗
|
godane |
so i just ripped the original airing Alien Autopsy |
10:57
🔗
|
godane |
i had it on vhs on the back of my Forrest Gump tape |
11:04
🔗
|
|
Metruptio has joined #archiveteam-bs |
11:13
🔗
|
godane |
so i got a E! True Hollywood Story about Gary Busey |
11:14
🔗
|
godane |
the episode is not even on TV-Vault |
11:56
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
12:05
🔗
|
|
icedice has joined #archiveteam-bs |
13:55
🔗
|
jrwr |
Im so glad I have a US Based ISP (Charter) that has high speeds, no caps, and doesn't fuck around |
14:34
🔗
|
|
REiN^ has quit IRC (Max SendQ exceeded) |
14:34
🔗
|
|
DFJustin has quit IRC (Remote host closed the connection) |
14:34
🔗
|
|
DFJustin has joined #archiveteam-bs |
14:34
🔗
|
|
swebb sets mode: +o DFJustin |
14:38
🔗
|
|
REiN^ has joined #archiveteam-bs |
14:44
🔗
|
|
Aranje has joined #archiveteam-bs |
14:55
🔗
|
|
TheLovina has joined #archiveteam-bs |
15:17
🔗
|
|
icedice has quit IRC (Quit: Leaving) |
15:27
🔗
|
|
j08nY has joined #archiveteam-bs |
15:43
🔗
|
|
kyounko has joined #archiveteam-bs |
16:56
🔗
|
|
Stilett0 is now known as Stiletto |
17:11
🔗
|
|
antomatic has quit IRC (Ping timeout: 268 seconds) |
17:21
🔗
|
|
REiN^ has quit IRC (Max SendQ exceeded) |
17:21
🔗
|
|
REiN^ has joined #archiveteam-bs |
17:27
🔗
|
|
Ravenloft has quit IRC (Read error: Operation timed out) |
17:54
🔗
|
|
SmileyG has joined #archiveteam-bs |
17:54
🔗
|
|
Smiley has quit IRC (Read error: Connection reset by peer) |
17:56
🔗
|
|
DFJustin has quit IRC (Remote host closed the connection) |
17:56
🔗
|
|
DFJustin has joined #archiveteam-bs |
17:56
🔗
|
|
swebb sets mode: +o DFJustin |
18:03
🔗
|
|
antomatic has joined #archiveteam-bs |
18:03
🔗
|
|
swebb sets mode: +o antomatic |
18:05
🔗
|
|
SHODAN_UI has quit IRC (Read error: Connection reset by peer) |
18:09
🔗
|
|
SHODAN_UI has joined #archiveteam-bs |
18:29
🔗
|
|
icedice has joined #archiveteam-bs |
18:35
🔗
|
alembic |
I have a pet theory that all ISPs are evil, just some more subtley than others... |
18:37
🔗
|
MrRadar |
Yeah, these days it's def. true |
18:37
🔗
|
MrRadar |
I miss the days of dialup when there was real competition in the ISP market :( |
18:37
🔗
|
MrRadar |
(Though I don't miss the speeds of dial up) |
18:40
🔗
|
alembic |
I hear there was a pretty good cow-themed ISP at one point ;) |
18:41
🔗
|
timmc |
I like my information superhighway like I like my coffee mugs? |
18:56
🔗
|
ranma |
downloading PSX ISOs on dialup D: |
18:56
🔗
|
ranma |
lordy |
18:58
🔗
|
MrRadar |
I remember downloading the BeOS "Personal Edition" installer over dialup |
18:58
🔗
|
MrRadar |
It took *hours* |
19:00
🔗
|
ranma |
PSX ISOs took days |
19:00
🔗
|
MrRadar |
Wow, just looked up the size and it was "only" 48 megabytes |
19:01
🔗
|
MrRadar |
Yeah, I can't imagine trying to download full ISOs at that speed |
19:16
🔗
|
Sanqui |
Q: should we ignore pages with individual posts when archiving fora? |
19:16
🔗
|
|
dashcloud has quit IRC (Ping timeout: 268 seconds) |
19:16
🔗
|
Sanqui |
those can easily increase the number of pages twenty-fold |
19:17
🔗
|
Sanqui |
(assuming twenty posts per forum page) |
19:17
🔗
|
Sanqui |
s/forum/thread/ |
19:17
🔗
|
Sanqui |
JAA: thoughts? |
19:19
🔗
|
|
dashcloud has joined #archiveteam-bs |
19:20
🔗
|
MrRadar |
I generally ignore them, especially through Archivebot |
19:21
🔗
|
Sanqui |
OK, then should they be added to the forums ignore set? |
19:21
🔗
|
Sanqui |
I'm archiving four forums right now and have been adding to it |
19:21
🔗
|
MrRadar |
The problem is that there's so many ways for forums to structure their URLs so it's hard to write generic rules |
19:22
🔗
|
Sanqui |
that's fine, that's why we add to the rules |
19:22
🔗
|
Sanqui |
sure you can have custom forums but phpbb, smf etc. are gonna be roughly the same |
19:22
🔗
|
Sanqui |
another question is the print/archive pages, like https://forums.arcade-museum.com/archive/index.php/t-298176.html |
19:22
🔗
|
Sanqui |
i think those can be pretty useful for anybody scraping, but they're technically dupe info. |
19:23
🔗
|
|
Xibalba has quit IRC (ZNC 1.7.x-git-737-29d4f20-frankenznc - http://znc.in) |
19:23
🔗
|
|
Xibalba has joined #archiveteam-bs |
19:24
🔗
|
Sanqui |
if we agree that they should be ignored, i shall improve the list |
19:25
🔗
|
MrRadar |
Hmm.. for those I wouldn't ban them by default |
19:25
🔗
|
MrRadar |
Since some forums only show the "last 30 days" or w/e in their main indexes |
19:25
🔗
|
MrRadar |
But do list all threads in their "archive" indexes |
19:26
🔗
|
Sanqui |
okay. (archives often strip stuff like quotes, so they're worse for preservation though) |
19:26
🔗
|
Sanqui |
i wish json had fucking comments |
19:26
🔗
|
Sanqui |
or can we switch to json |
19:26
🔗
|
xmc |
forums are terrible miserable repositories of culture |
19:27
🔗
|
Sanqui |
yes |
19:27
🔗
|
Sanqui |
they hold so much valuable content - especially for the people who took part in their existence |
19:27
🔗
|
Sanqui |
but they're a dying breed |
19:28
🔗
|
Sanqui |
I'm archiving forums even tangentially pertaining to my interests, not waiting for announcements. would love to get something larger-scale going |
19:29
🔗
|
Sanqui |
10 archived years and 1 final, low-activity year before shutdown missing is a desirable outcome |
19:29
🔗
|
Sanqui |
s/missing// |
19:29
🔗
|
Sanqui |
nvm, that word was fine |
19:30
🔗
|
xmc |
yeah |
19:30
🔗
|
xmc |
i'm slowly puttering away at my forum scraper |
19:30
🔗
|
xmc |
slowly |
19:30
🔗
|
xmc |
ugh |
19:30
🔗
|
xmc |
i wish i had more time and more drive |
19:30
🔗
|
Sanqui |
i got one for zetaboards if you're interested |
19:30
🔗
|
Sanqui |
works best if you give it an admin account though |
19:31
🔗
|
xmc |
i'm scraping into usenet format, like gmane used to be |
19:31
🔗
|
Sanqui |
i'd like to make one that just registers and logs in and scrapes away member-only forums (rare) and user pages (common) |
19:31
🔗
|
Sanqui |
xmc: is your scraper 'pluggable'? |
19:32
🔗
|
xmc |
eh, kinda |
19:32
🔗
|
Sanqui |
as in, is it easy to write a module for a new forum sw |
19:32
🔗
|
xmc |
i think so but i haven't tried |
19:32
🔗
|
|
Ravenloft has joined #archiveteam-bs |
19:32
🔗
|
Sanqui |
is it on github? :) |
19:32
🔗
|
xmc |
it's not public yet :P |
19:33
🔗
|
arkiver |
xmc: kind of a wikiteam tool for forums? |
19:34
🔗
|
arkiver |
like those wiki dumps being uploaded to IA |
19:34
🔗
|
xmc |
kinda |
19:34
🔗
|
arkiver |
nice |
19:34
🔗
|
Sanqui |
so i am adding "/viewtopic\\.php\\?p=\\d+" |
19:34
🔗
|
xmc |
with attachments and images and stuff too, as mime attachments |
19:34
🔗
|
xmc |
like adults |
19:34
🔗
|
xmc |
it's slow going because |
19:35
🔗
|
xmc |
it's slow going because i'm being very particular about how i want its output to look |
19:35
🔗
|
Sanqui |
well, it's within my (quite wide) area of interest |
19:35
🔗
|
xmc |
and there's a ton of sysadminning yet to do |
19:36
🔗
|
xmc |
like i have a newsspool full of malformed messages from one forum that i sucked in before it went away |
19:36
🔗
|
xmc |
so i have to read those in and re-format them |
19:36
🔗
|
xmc |
(i save all the original as-received data in the mime body, no worries there) |
19:36
🔗
|
Sanqui |
[thumbs up] |
19:36
🔗
|
Sanqui |
always derive |
19:37
🔗
|
Sanqui |
(never integrate unless left with no other choice) |
19:37
🔗
|
xmc |
... |
19:37
🔗
|
* |
xmc points to the door |
19:37
🔗
|
xmc |
nerds stay outside |
19:37
🔗
|
MrRadar |
I've been working on archiving one of the web's bigger (but still dying) forums. It's complete up through the end of last year and I'm working on a script to check for incremental updates to it |
19:38
🔗
|
MrRadar |
(I don't want to name them explicitly since there's a good chance they'd ban me for scraping their paid "archives") |
19:38
🔗
|
Sanqui |
nice |
19:38
🔗
|
xmc |
ohhhh |
19:38
🔗
|
xmc |
hm |
19:38
🔗
|
xmc |
they were on my target list but they don't work with my scraper so, |
19:39
🔗
|
xmc |
if they're the site with paid archives that i think they are |
19:39
🔗
|
MrRadar |
Probably ;) I don't know how many others do |
19:40
🔗
|
Sanqui |
good |
19:41
🔗
|
Sanqui |
i'll continue going after the more obscure, yet fascinating places myself :) |
19:41
🔗
|
xmc |
is there an archiveteam subcommittee on webforums yet |
19:41
🔗
|
xmc |
or does it still need a snappy name |
19:41
🔗
|
Sanqui |
Message Bored |
19:41
🔗
|
Sanqui |
you're welcome |
19:41
🔗
|
xmc |
#msgbored ? great |
19:42
🔗
|
xmc |
and done, everyone join there if want |
19:42
🔗
|
Sanqui |
the joke is that people got bored of message boards and that's why they're dying and need saving |
19:42
🔗
|
Sanqui |
(sorry, had to drive that home) |
19:42
🔗
|
xmc |
:) |
20:42
🔗
|
BartoCH |
scii western |
20:42
🔗
|
BartoCH |
whops, typo |
20:42
🔗
|
BartoCH |
for another channel, my bad |
22:06
🔗
|
JAA |
Sanqui: Regarding your earlier question, I'd also ignore individual posts but keep archives. I'm not sure about printable versions; I guess I'm leaning towards ignoring them. |
22:07
🔗
|
schbirid |
! |
22:07
🔗
|
schbirid |
never grab individual posts, it makes the grab much more intense and is just redundant |
22:13
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
22:17
🔗
|
timmc |
I feel like at best it's a lazy way of getting the contents of whatever inbound links to posts might exist, rather than inferring them from the threads. |
22:27
🔗
|
|
ZexaronS has joined #archiveteam-bs |
22:28
🔗
|
|
ZexaronS has quit IRC (Client Quit) |
22:40
🔗
|
|
SHODAN_UI has quit IRC (Remote host closed the connection) |