Time |
Nickname |
Message |
00:13
🔗
|
godane |
ok then |
00:13
🔗
|
godane |
sorry i screwed up the name |
00:15
🔗
|
godane |
anyways that part the tape had very bad tracking |
01:14
🔗
|
|
fie has quit IRC (Ping timeout: 252 seconds) |
01:28
🔗
|
|
zhongfu has quit IRC (Ping timeout: 260 seconds) |
01:52
🔗
|
|
ta9le has quit IRC (Quit: Connection closed for inactivity) |
01:59
🔗
|
|
robink has quit IRC (Read error: Connection reset by peer) |
02:06
🔗
|
|
robink has joined #archiveteam-bs |
02:10
🔗
|
|
Odd0002 has quit IRC (Quit: ZNC - http://znc.in) |
02:12
🔗
|
|
Odd0002 has joined #archiveteam-bs |
02:19
🔗
|
|
Odd0002_ has joined #archiveteam-bs |
02:20
🔗
|
|
Odd0002 has quit IRC (Read error: Operation timed out) |
02:20
🔗
|
|
Odd0002_ is now known as Odd0002 |
02:45
🔗
|
|
Odd0002 has quit IRC (Quit: ZNC - http://znc.in) |
02:47
🔗
|
|
Odd0002 has joined #archiveteam-bs |
03:30
🔗
|
|
alembic has joined #archiveteam-bs |
03:33
🔗
|
|
qw3rty113 has joined #archiveteam-bs |
03:38
🔗
|
|
qw3rty112 has quit IRC (Ping timeout: 600 seconds) |
03:53
🔗
|
|
zino has quit IRC (Read error: Operation timed out) |
04:43
🔗
|
godane |
so i'm getting my box of tapes i have bought tomorrow |
04:46
🔗
|
|
robink has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) |
05:03
🔗
|
|
robink has joined #archiveteam-bs |
05:50
🔗
|
|
alembic has quit IRC (Quit: Connection closed for inactivity) |
06:01
🔗
|
|
zhongfu has joined #archiveteam-bs |
06:02
🔗
|
|
DragonMon has joined #archiveteam-bs |
06:18
🔗
|
|
bsmith093 has quit IRC (Read error: Operation timed out) |
06:23
🔗
|
|
bsmith093 has joined #archiveteam-bs |
07:25
🔗
|
|
DragonMon has quit IRC (Quit: Leaving) |
07:30
🔗
|
|
schbirid has joined #archiveteam-bs |
07:34
🔗
|
|
DragonMon has joined #archiveteam-bs |
07:41
🔗
|
|
DragonMon has quit IRC (Quit: Leaving) |
07:41
🔗
|
|
DragonMon has joined #archiveteam-bs |
07:45
🔗
|
|
ta9le has joined #archiveteam-bs |
07:50
🔗
|
|
DragonMon has quit IRC (Quit: Leaving) |
08:23
🔗
|
|
DragonMon has joined #archiveteam-bs |
08:56
🔗
|
|
davidar has joined #archiveteam-bs |
09:57
🔗
|
|
REiN^ has joined #archiveteam-bs |
10:12
🔗
|
SmileyG_ |
https://dnshistory.org/historical-dns-records/a/screencast-o-matic.com. -- check out the message to us, hahaha |
10:26
🔗
|
BlueMax |
>this as abuse |
10:26
🔗
|
BlueMax |
the only thing getting abused here is the English language |
10:32
🔗
|
eientei95 |
>They are quite open about their attitude of ignoring robots.txt files |
10:32
🔗
|
eientei95 |
>make no attempt to block any archiver in their robots.txt |
10:34
🔗
|
JAA |
As far as I know, there was an AT project to archive DNSHistory in its entirety a while ago (2016?). They did block us then. |
10:34
🔗
|
JAA |
They also activated CloudFlare's "I'm under attack" mode, I believe. |
10:35
🔗
|
JAA |
I wasn't part of that effort though, so I might be wrong. There was definitely a project though, and it failed hard because the site operator did everything they could to stop it. |
10:39
🔗
|
JAA |
SmileyG_, BlueMax, eientei95: https://archiveteam.org/index.php?title=DNS_History#Archiving |
10:42
🔗
|
SmileyG_ |
ew |
10:42
🔗
|
SmileyG_ |
I wonder if we can do some kinda manual effort D: |
10:42
🔗
|
BlueMax |
they had a billion pages on that site? jesus |
10:42
🔗
|
JAA |
They still do, I think. |
10:43
🔗
|
JAA |
They announced their shutdown two years ago, but they're still online, so... |
10:43
🔗
|
SmileyG_ |
they a pita. |
10:43
🔗
|
|
SmileyG_ is now known as SmileyG |
10:45
🔗
|
SmileyG |
How often does cloudflare make you re-auth a browser? |
10:45
🔗
|
|
svchfoo1 sets mode: +o SmileyG |
10:45
🔗
|
SmileyG |
i.e. could we run something 'in browser' that does what archive.org does, do the 'auth' then let it run for a bit, reauth, etc. |
10:46
🔗
|
JAA |
Depends on how it's configured. I've grabbed sites where I just had to pass the challenge once and could then fire away. |
10:46
🔗
|
Despatche |
people taking it upon themselves to erase history is exactly why archive team exists |
10:47
🔗
|
JAA |
On other sites, I had to resolve the challenge every few requests to every few hours. |
10:47
🔗
|
JAA |
Fortunately, it's fairly easy to bypass the "just checking your browser" thing. |
10:47
🔗
|
JAA |
When they force a reCAPTCHA, it becomes essentially impossible. |
10:50
🔗
|
SmileyG |
yah |
10:50
🔗
|
SmileyG |
few hours isn't too bad. |
10:50
🔗
|
SmileyG |
I can leave it running and check on it occasionally. |
10:51
🔗
|
|
odemg has joined #archiveteam-bs |
10:51
🔗
|
SmileyG |
hell, we could prob even make a warrior project which just displays the capture page when it happens? :/ |
10:51
🔗
|
JAA |
The "just checking your browser" page can be circumvented automatically. |
10:52
🔗
|
SmileyG |
well, even better. |
10:52
🔗
|
|
BlueMax has quit IRC (Leaving) |
10:52
🔗
|
JAA |
The standard approach uses NodeJS to evaluate arbitrary JS (WCGW?). joepie91 implemented a parser instead which evaluates the challenge directly, and I ported that to Python a while ago. |
10:53
🔗
|
JAA |
"evaluates the challenge directly" meaning that it doesn't execute anything; it just calculates the correct response directly. The format of the challenge code hasn't changed in years. |
10:54
🔗
|
JAA |
But if the site operator wants to stop us, they absolutely can do that. They just need to enable the captcha. |
10:54
🔗
|
SmileyG |
yah |
10:55
🔗
|
SmileyG |
but i'm saying, crawl until we get the captha, display the captcha, solve it, carry on... |
10:55
🔗
|
JAA |
Yeah, that could be done I guess. |
10:56
🔗
|
JAA |
We could centralise that as well, i.e. send the captcha image to somewhere, and any user can solve it. |
10:57
🔗
|
JAA |
Just like those services where you pay some money per solved captcha. |
10:57
🔗
|
JAA |
(Aka slavery) |
11:08
🔗
|
eientei95 |
Nah, slaves never got paid |
11:08
🔗
|
eientei95 |
This is more like... I got nothing. Slaves it is |
11:11
🔗
|
eientei95 |
>If I eventually decide to move from this forum, my biggest concern is how to exactly preserve this old one. Tapatalk has an inactivity policy where if there has been no posts in 90 days, the forum can get purged which is a massive problem. I've also heard Tapatalk staff has also deleted posts that they don't agree with or believe violate their Code of Conduct which is another issue. And the cherry on the cake... I can't download a backup |
11:11
🔗
|
eientei95 |
database of this forum at the moment despite it being requested for years. |
11:18
🔗
|
ta9le |
Yeah, the guy whom you're quoting is my fellow admin of those forums |
11:19
🔗
|
ta9le |
but yeah, #zetatheplank, cuz it exists |
11:59
🔗
|
|
Stilett0- has joined #archiveteam-bs |
12:19
🔗
|
|
Stilett0- has quit IRC () |
12:29
🔗
|
JAA |
I just saw that the FCC net neutrality comments are in the news again, and I wonder if there is any archive of these yet and, if not, if it's worthwhile grabbing them. |
12:29
🔗
|
JAA |
I couldn't find anything quickly on IA. |
12:34
🔗
|
JAA |
A few filings which got a lot of attention were archived in the Wayback Machine of course (e.g. "Obama"'s comment at https://www.fcc.gov/ecfs/filing/1051157755251 ), but I'm talking about the entire dataset. |
12:39
🔗
|
eientei95 |
https://www.fcc.gov/ecfs/search/filings?q=%22The%20unprecedented%20regulatory%20power%20the%20Obama%20Administration%20imposed%22&sort=date_disseminated,DESC |
12:40
🔗
|
eientei95 |
JAA: |
12:40
🔗
|
eientei95 |
The filings are provided in JSON files, in batches of 10,000, compressed into three archive files. The public can access the data here: |
12:40
🔗
|
eientei95 |
https://data.fcc.gov/comments/17-108/ECFS_17-108_1.zip |
12:40
🔗
|
eientei95 |
https://data.fcc.gov/comments/17-108/ECFS_17-108_2.zip |
12:40
🔗
|
eientei95 |
https://data.fcc.gov/comments/17-108/ECFS_17-108_3.zip |
12:43
🔗
|
JAA |
eientei95: Those zips only contain the data up to half a year ago according to the notice on the website. And yeah, grabbing them should be easy enough. |
12:44
🔗
|
eientei95 |
Right |
12:46
🔗
|
JAA |
Also, filings can have attached documents. I'm not sure if those are included in the zips. (The download's pretty slow, so I can't check right now.) |
12:47
🔗
|
JAA |
Although most filings will probably not have attachments. |
12:51
🔗
|
godane |
so one of the tapes i got has some hbo promos |
12:51
🔗
|
godane |
there is HBO News segment about space jam |
12:52
🔗
|
eientei95 |
... |
12:52
🔗
|
eientei95 |
JAA: These include the person's contact email |
12:53
🔗
|
eientei95 |
and physical address |
12:54
🔗
|
JAA |
eientei95: Yeah, that's true. I'm not sure if this is a problem though since the data is already publicly available (and people agree to have it published when submitting a comment, I believe). |
12:56
🔗
|
eientei95 |
Their email address isn't normally accessible via the web interface |
13:09
🔗
|
eientei95 |
JAA: |
13:09
🔗
|
eientei95 |
'documents': [{'description': 'Opposition comment', 'filename': 'Internet Service Providers comment.docx', 'src': 'https://ecfsapi.fcc.gov/file/DOC-56f1752599000000-A.docx'}], |
13:11
🔗
|
eientei95 |
Site checks the referrer |
13:22
🔗
|
JAA |
eientei95: Yeah, but the actual documents aren't in the zip, right? |
13:22
🔗
|
eientei95 |
Nope |
13:32
🔗
|
|
Jens has quit IRC (Remote host closed the connection) |
13:32
🔗
|
|
Jens has joined #archiveteam-bs |
13:35
🔗
|
godane |
so i got a tape with 6 hours worth of content |
13:35
🔗
|
godane |
this AMC recording on it |
13:36
🔗
|
godane |
maybe of amc airing of Warlock |
13:36
🔗
|
godane |
it was recorded from 1996-12 or 1997-12 |
13:49
🔗
|
|
wp494 has quit IRC (Ping timeout: 492 seconds) |
13:50
🔗
|
|
wp494 has joined #archiveteam-bs |
13:50
🔗
|
|
svchfoo3 sets mode: +o wp494 |
14:36
🔗
|
|
robink has quit IRC (Ping timeout: 506 seconds) |
15:01
🔗
|
|
antomati_ is now known as antomatic |
15:25
🔗
|
ta9le |
Hot damn |
16:06
🔗
|
|
zino has joined #archiveteam-bs |
16:08
🔗
|
|
Mateon1 has quit IRC (Read error: Operation timed out) |
16:09
🔗
|
|
Mateon1 has joined #archiveteam-bs |
17:12
🔗
|
|
Sk2d has joined #archiveteam-bs |
17:13
🔗
|
|
robink has joined #archiveteam-bs |
17:13
🔗
|
|
Sk1d has quit IRC (Read error: Operation timed out) |
17:13
🔗
|
|
Sk2d is now known as Sk1d |
18:02
🔗
|
|
jschwart has joined #archiveteam-bs |
18:38
🔗
|
|
verifiedj has joined #archiveteam-bs |
19:03
🔗
|
|
verifiedj has quit IRC (Quit: http://www.mibbit.com ajax IRC Client) |
19:51
🔗
|
|
Stilett0- has joined #archiveteam-bs |
20:45
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
20:45
🔗
|
|
wacky has joined #archiveteam-bs |
21:27
🔗
|
|
CoolCanuk has joined #archiveteam-bs |
21:56
🔗
|
|
Mateon1 has quit IRC (se.hub efnet.portlane.se) |
21:56
🔗
|
|
dxrt_ has quit IRC (se.hub efnet.portlane.se) |
21:56
🔗
|
|
Gfy has quit IRC (se.hub efnet.portlane.se) |
21:56
🔗
|
|
Aoede has quit IRC (se.hub efnet.portlane.se) |
21:56
🔗
|
|
espes___ has quit IRC (se.hub efnet.portlane.se) |
21:56
🔗
|
|
Rai-chan has quit IRC (se.hub efnet.portlane.se) |
21:56
🔗
|
|
svchfoo1 has quit IRC (se.hub efnet.portlane.se) |
21:56
🔗
|
|
omglolbah has quit IRC (se.hub efnet.portlane.se) |
21:58
🔗
|
|
REiN^ has quit IRC (Read error: Operation timed out) |
22:12
🔗
|
|
Mateon1 has joined #archiveteam-bs |
22:12
🔗
|
|
dxrt_ has joined #archiveteam-bs |
22:12
🔗
|
|
Gfy has joined #archiveteam-bs |
22:12
🔗
|
|
Aoede has joined #archiveteam-bs |
22:12
🔗
|
|
espes___ has joined #archiveteam-bs |
22:12
🔗
|
|
Rai-chan has joined #archiveteam-bs |
22:12
🔗
|
|
svchfoo1 has joined #archiveteam-bs |
22:12
🔗
|
|
omglolbah has joined #archiveteam-bs |
22:12
🔗
|
|
efnet.portlane.se sets mode: +oo dxrt_ svchfoo1 |
22:12
🔗
|
|
dxrt sets mode: +o dxrt_ |
22:12
🔗
|
godane |
so i'm capturing a SP tape of Michal juior |
22:14
🔗
|
godane |
*Michael junior |
22:15
🔗
|
godane |
i'm doing it at 6000k for video cause at 10000k the signal gives capture bitrate at about 4700k |
23:03
🔗
|
|
Asparagir has joined #archiveteam-bs |
23:03
🔗
|
|
svchfoo3 sets mode: +o Asparagir |
23:19
🔗
|
|
Stilett0- is now known as Stiletto |
23:29
🔗
|
|
Stiletto has quit IRC () |
23:30
🔗
|
|
Stilett0- has joined #archiveteam-bs |
23:39
🔗
|
|
megaminxw has joined #archiveteam-bs |
23:39
🔗
|
megaminxw |
hmm, efnets having a tantrum or something |
23:39
🔗
|
megaminxw |
anyway |
23:40
🔗
|
megaminxw |
im currently attempting to archive a website (kungfoomanchu.com) using wget, but even though i explicitly add --mirror, it only ever downloads the front page |
23:40
🔗
|
megaminxw |
there are a bunch of pdfs on there that it doesnt even touch |
23:40
🔗
|
JAA |
"efnets having a tantrum" What else is new? ;-) |
23:41
🔗
|
JAA |
Look into --recursive and --page-requisites. |
23:41
🔗
|
megaminxw |
i think it might be something to do with the site being all fancy and javascripty-single page-stupidness |
23:41
🔗
|
JAA |
Ah yeah, JS is hell. |
23:41
🔗
|
megaminxw |
yeah, ive put those in, but it just downloads the front page |
23:41
🔗
|
megaminxw |
there are even links on the front page to a few pdfs, with no javascript there at all, and it just ignores them |
23:42
🔗
|
megaminxw |
im mildly annoyed to put it politely |
23:42
🔗
|
JAA |
Does it grab the images? |
23:42
🔗
|
JAA |
E.g. http://www.kungfoomanchu.com/images/333.png |
23:42
🔗
|
megaminxw |
it grabs some of them |
23:43
🔗
|
JAA |
Yeah, that doesn't surprise me. Most of the content is loaded through JS. |
23:43
🔗
|
JAA |
The site isn't usable at all with JS disabled. |
23:44
🔗
|
megaminxw |
so how would one go about archiving this? tbh its just for my own collecting habits, i dont think its really in danger |
23:44
🔗
|
JAA |
You could try warcprox and either manual recursion in a browser or brozzler. (I haven't used brozzler myself, so I can't tell you how well that works.) |
23:44
🔗
|
megaminxw |
hmm okay |
23:44
🔗
|
JAA |
"manual recursion" meaning "click through everything in tabs", of course. |
23:44
🔗
|
JAA |
There is no good option for heavily scripted sites which load content from the server asynchronously, unfortunately. |
23:45
🔗
|
|
robink has quit IRC (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.) |
23:45
🔗
|
megaminxw |
thanks javascript~ really appreciate it |
23:45
🔗
|
megaminxw |
bergh |
23:57
🔗
|
|
robink has joined #archiveteam-bs |