Time |
Nickname |
Message |
00:14
🔗
|
|
Dj-Wawa has quit IRC (Quit: Connection closed for inactivity) |
00:21
🔗
|
|
ats has quit IRC (se.hub irc.efnet.nl) |
00:21
🔗
|
|
Stiletto has quit IRC (se.hub irc.efnet.nl) |
00:21
🔗
|
|
argus has quit IRC (se.hub irc.efnet.nl) |
00:21
🔗
|
|
noirscape has quit IRC (se.hub irc.efnet.nl) |
00:21
🔗
|
|
Fusl has quit IRC (se.hub irc.efnet.nl) |
00:21
🔗
|
|
VoynichCr has quit IRC (se.hub irc.efnet.nl) |
00:21
🔗
|
|
N4Y has quit IRC (se.hub irc.efnet.nl) |
00:21
🔗
|
|
MrRadar2 has quit IRC (se.hub irc.efnet.nl) |
00:21
🔗
|
|
Tenebrae has quit IRC (se.hub irc.efnet.nl) |
00:21
🔗
|
|
BnAboyZ has quit IRC (se.hub irc.efnet.nl) |
00:23
🔗
|
|
ats has joined #archiveteam-ot |
00:23
🔗
|
|
Stiletto has joined #archiveteam-ot |
00:23
🔗
|
|
argus has joined #archiveteam-ot |
00:23
🔗
|
|
noirscape has joined #archiveteam-ot |
00:23
🔗
|
|
Fusl has joined #archiveteam-ot |
00:23
🔗
|
|
VoynichCr has joined #archiveteam-ot |
00:23
🔗
|
|
N4Y has joined #archiveteam-ot |
00:23
🔗
|
|
MrRadar2 has joined #archiveteam-ot |
00:23
🔗
|
|
Tenebrae has joined #archiveteam-ot |
00:23
🔗
|
|
BnAboyZ has joined #archiveteam-ot |
00:24
🔗
|
|
Despatche has quit IRC (Read error: Connection reset by peer) |
00:24
🔗
|
|
Despatche has joined #archiveteam-ot |
00:24
🔗
|
|
Despatche has quit IRC (Read error: Connection reset by peer) |
00:24
🔗
|
|
Despatche has joined #archiveteam-ot |
00:24
🔗
|
|
Despatche has quit IRC (Read error: Connection reset by peer) |
00:24
🔗
|
|
Despatche has joined #archiveteam-ot |
00:27
🔗
|
yano |
https://www.gnu.org/software/librejs/free-your-javascript.html 3.2.2.1 |
00:27
🔗
|
yano |
magnets of valid licenses |
00:27
🔗
|
yano |
such as CC0, GPLv2.0, GPLv3.0, etc. |
00:28
🔗
|
|
Despatche has quit IRC (Remote host closed the connection) |
00:28
🔗
|
|
Despatche has joined #archiveteam-ot |
00:30
🔗
|
JAA |
Better yet, get rid of JS entirely. :-) |
00:31
🔗
|
JAA |
(If possible) |
00:31
🔗
|
yano |
i don't care about the .js part, i'm just excited about the magnets of the licenses |
00:32
🔗
|
JAA |
Ah yeah |
00:34
🔗
|
yano |
i figured anyone who is into data hoarder or hosting data for others might be interested :) |
01:36
🔗
|
|
VerfiedJ has quit IRC (Quit: Leaving) |
01:43
🔗
|
JAA |
ivan_: For your YouTube archive: https://old.reddit.com/r/videos/comments/aio6jx/indian_youtuber_who_exposes_the_safety_measures/ |
02:42
🔗
|
ivan_ |
JAA: grabbing |
02:42
🔗
|
ivan_ |
YouTube no longer serves annotations, right? |
02:46
🔗
|
ivan_ |
youtube-dl still grabs something but the annotation data is blanked |
02:46
🔗
|
ivan_ |
it doesn't even have the end cards either but rather some channel metadata |
03:05
🔗
|
|
Mateon1 has quit IRC (Read error: Operation timed out) |
03:05
🔗
|
|
Mateon1 has joined #archiveteam-ot |
03:32
🔗
|
t3 |
Does IA deduplicate the WARC websites when they are integrated into the Wayback Machine? |
03:44
🔗
|
t3 |
Am I likely to get IP banned when grabbing a site that has a DNS that points CloudFlare, Inc. (AS13335)? |
04:04
🔗
|
wp494 |
JAA: finally /r/shutdown has real moderation now |
04:04
🔗
|
wp494 |
good riddance to the ronald mcdonald trump posts |
04:11
🔗
|
|
Hani has quit IRC (Read error: Operation timed out) |
04:49
🔗
|
|
odemg has quit IRC (Ping timeout: 265 seconds) |
05:01
🔗
|
|
odemg has joined #archiveteam-ot |
05:02
🔗
|
ivan_ |
t3: no deduplication |
05:02
🔗
|
ivan_ |
t3: should be fine if you get through the initial captcha unless they have attack detection on? |
05:35
🔗
|
|
Despatche has quit IRC (Quit: Connection reset by deer) |
05:40
🔗
|
|
m007a83 has quit IRC (Read error: Connection reset by peer) |
05:49
🔗
|
|
m007a83 has joined #archiveteam-ot |
05:53
🔗
|
t3 |
I don't like Cloudflare. It's everywhere. |
05:58
🔗
|
t3 |
I also don't like websites that load contents by scrolling using JavaScript. |
06:06
🔗
|
|
Hani has joined #archiveteam-ot |
06:12
🔗
|
|
nataraj has joined #archiveteam-ot |
06:32
🔗
|
|
Ryz has joined #archiveteam-ot |
06:32
🔗
|
Ryz |
Google to try and block uBlock Origin? S: |
06:32
🔗
|
Ryz |
https://www.ghacks.net/2019/01/22/chrome-extension-manifest-v3-could-end-ublock-origin-for-chrome/ |
06:33
🔗
|
|
icedice has quit IRC (Quit: Leaving) |
07:14
🔗
|
t3 |
Ryz: That's really bad. Google shouldn't be evil. |
07:14
🔗
|
t3 |
That's also why I stopped using Google Chrome. |
07:15
🔗
|
Ryz |
To others, they are already evil in a way, not in this instance I pointed out, but the other countless ones in the past~ |
08:33
🔗
|
|
wp494_ has joined #archiveteam-ot |
08:35
🔗
|
|
wp494 has quit IRC (Ping timeout: 255 seconds) |
09:14
🔗
|
|
nataraj has quit IRC (Read error: Operation timed out) |
09:27
🔗
|
|
nataraj has joined #archiveteam-ot |
09:37
🔗
|
|
schbirid has joined #archiveteam-ot |
10:33
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
10:33
🔗
|
JAA |
Google is an altruistic company that totally isn't interested in making money through ads. I'm sure they'll sort this API issue out with Gorhill... |
11:30
🔗
|
|
chimyatta has joined #archiveteam-ot |
11:53
🔗
|
|
Ryz has quit IRC (Remote host closed the connection) |
12:37
🔗
|
VADemon_ |
https://mashable.com/2018/05/19/google-removes-dont-be-evil-motto-from-code-of-conduct/?europe=true |
13:38
🔗
|
|
nataraj has quit IRC (Read error: Operation timed out) |
14:33
🔗
|
JAA |
https://twitter.com/3lbios/status/1087848040583626753 |
14:55
🔗
|
|
LFlare has quit IRC (Quit: Ping timeout (120 seconds)) |
14:59
🔗
|
|
VerfiedJ has joined #archiveteam-ot |
15:03
🔗
|
|
LFlare has joined #archiveteam-ot |
15:21
🔗
|
yano |
https://yanovich.net/2018/08/help-archive-the-web/ |
16:16
🔗
|
moufu |
the addon isn't open source though |
16:24
🔗
|
systwi |
JAA: thanks for the forum info. i might just make a local copy with grab-site (i use that already), but i did want to get a copy on wayback |
16:27
🔗
|
|
nataraj has joined #archiveteam-ot |
17:33
🔗
|
|
wp494 has joined #archiveteam-ot |
17:36
🔗
|
|
wp494_ has quit IRC (Ping timeout: 265 seconds) |
18:13
🔗
|
t3 |
yano: That's interesting. Just curious, because your nick is similar to the blog name, is that your blog? I've never used that "Page Cache Archiver" browser extension before. I generally use VerfiedJ's "Save to the Wayback Machine" browser extension. |
18:20
🔗
|
|
schbirid has quit IRC (Remote host closed the connection) |
18:42
🔗
|
|
m007a83 has quit IRC (Read error: Operation timed out) |
18:44
🔗
|
yano |
t3: yea, that's my blog |
18:44
🔗
|
yano |
t3: ah, PCA allows one to save to more than just IA |
18:48
🔗
|
|
picklefac has joined #archiveteam-ot |
18:54
🔗
|
yano |
moufu: the source is viewable but yeah, it's not FOSS-y in that you can do whatever you want with it :-\ |
19:14
🔗
|
|
Hani has quit IRC (Quit: Going offline, see ya! (www.adiirc.com)) |
19:16
🔗
|
|
Hani has joined #archiveteam-ot |
19:20
🔗
|
|
Despatche has joined #archiveteam-ot |
19:46
🔗
|
t3 |
yano: Well your blog's robots.txt blocks ia_archiver. |
19:47
🔗
|
yano |
it shouldn't for the whole site |
19:47
🔗
|
* |
yano checks |
19:48
🔗
|
yano |
oh, i thought that was part of the image blocking thing I copy/pasted that included the Google-Images |
19:48
🔗
|
t3 |
yano: You have `User-agent: ia_archiver` with `Disallow: /`. That's the entire site. |
19:48
🔗
|
yano |
fixed |
19:48
🔗
|
yano |
it's now removed |
19:48
🔗
|
yano |
it should only block the stuff mentioned at the bottom |
19:49
🔗
|
t3 |
Yay! Now it can be archived. |
19:51
🔗
|
t3 |
Actually... |
19:51
🔗
|
t3 |
There still might be an issue. |
19:51
🔗
|
t3 |
https://web.archive.org/save/https://yanovich.net/pages/contact-me.html |
19:52
🔗
|
JAA |
IA might have to pick up the new robots.txt first. Not sure how often that's checked. |
19:53
🔗
|
t3 |
The robots.txt has been updated: https://web.archive.org/web/20190123194840/https://yanovich.net/robots.txt |
19:54
🔗
|
yano |
my web logs aren't showing any UA's from IA for robots.txt |
19:54
🔗
|
yano |
ah, there it is |
19:55
🔗
|
yano |
[23/Jan/2019:19:45:40 +0000] "GET /robots.txt HTTP/1.1" 200 447 "-" "Mozilla/5.0 (compatible; archive.org_bot; Wayback Machine Live Record; +http://archive.org/details/archive.org_bot)" "-" yanovich.net |
19:55
🔗
|
yano |
[23/Jan/2019:19:45:51 +0000] "GET /robots.txt HTTP/2.0" 200 435 "https://web.archive.org/save/https://yanovich.net/2018/08/help-archive-the-web/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv: |
19:55
🔗
|
yano |
60.0) Gecko/20100101 Firefox/60.0" "-" yanovich.net |
19:55
🔗
|
yano |
ah, bay-hoooey |
19:55
🔗
|
yano |
*ba-hoooey |
20:05
🔗
|
|
m007a83 has joined #archiveteam-ot |
20:23
🔗
|
|
jrwr has quit IRC (Read error: Connection reset by peer) |
20:24
🔗
|
|
jrwr has joined #archiveteam-ot |
20:42
🔗
|
|
BlueMax has joined #archiveteam-ot |
20:58
🔗
|
eientei95 |
https://xkcd.com/2102/ IA in XKCD |
21:03
🔗
|
eientei95 |
Hm. ANyone else getting a 400 Bad Request error when trying to do a https://web.archive.org/save/<link> ? |
21:20
🔗
|
ivan_ |
yes |
21:21
🔗
|
ivan_ |
try again and it might work, or not |
21:23
🔗
|
eientei95 |
ivan_: Nope, still 400 |
21:32
🔗
|
arkiver |
eientei95: nice |
21:33
🔗
|
* |
arkiver is waiting for the day Archive Team is in XKCD |
22:06
🔗
|
|
nataraj has quit IRC (Read error: Operation timed out) |
22:45
🔗
|
yano |
yikes, https://web.archive.org/web/20190123224207/http://clerk.house.gov/evs/2019/roll043.xml |
22:45
🔗
|
yano |
#FormattingFail |
22:57
🔗
|
|
kiska1 has quit IRC (Ping timeout (120 seconds)) |
22:58
🔗
|
|
wmvhater has joined #archiveteam-ot |
22:58
🔗
|
|
kiska1 has joined #archiveteam-ot |