[00:14] *** Dj-Wawa has quit IRC (Quit: Connection closed for inactivity) [00:21] *** ats has quit IRC (se.hub irc.efnet.nl) [00:21] *** Stiletto has quit IRC (se.hub irc.efnet.nl) [00:21] *** argus has quit IRC (se.hub irc.efnet.nl) [00:21] *** noirscape has quit IRC (se.hub irc.efnet.nl) [00:21] *** Fusl has quit IRC (se.hub irc.efnet.nl) [00:21] *** VoynichCr has quit IRC (se.hub irc.efnet.nl) [00:21] *** N4Y has quit IRC (se.hub irc.efnet.nl) [00:21] *** MrRadar2 has quit IRC (se.hub irc.efnet.nl) [00:21] *** Tenebrae has quit IRC (se.hub irc.efnet.nl) [00:21] *** BnAboyZ has quit IRC (se.hub irc.efnet.nl) [00:23] *** ats has joined #archiveteam-ot [00:23] *** Stiletto has joined #archiveteam-ot [00:23] *** argus has joined #archiveteam-ot [00:23] *** noirscape has joined #archiveteam-ot [00:23] *** Fusl has joined #archiveteam-ot [00:23] *** VoynichCr has joined #archiveteam-ot [00:23] *** N4Y has joined #archiveteam-ot [00:23] *** MrRadar2 has joined #archiveteam-ot [00:23] *** Tenebrae has joined #archiveteam-ot [00:23] *** BnAboyZ has joined #archiveteam-ot [00:24] *** Despatche has quit IRC (Read error: Connection reset by peer) [00:24] *** Despatche has joined #archiveteam-ot [00:24] *** Despatche has quit IRC (Read error: Connection reset by peer) [00:24] *** Despatche has joined #archiveteam-ot [00:24] *** Despatche has quit IRC (Read error: Connection reset by peer) [00:24] *** Despatche has joined #archiveteam-ot [00:27] https://www.gnu.org/software/librejs/free-your-javascript.html 3.2.2.1 [00:27] magnets of valid licenses [00:27] such as CC0, GPLv2.0, GPLv3.0, etc. [00:28] *** Despatche has quit IRC (Remote host closed the connection) [00:28] *** Despatche has joined #archiveteam-ot [00:30] Better yet, get rid of JS entirely. :-) [00:31] (If possible) [00:31] i don't care about the .js part, i'm just excited about the magnets of the licenses [00:32] Ah yeah [00:34] i figured anyone who is into data hoarder or hosting data for others might be interested :) [01:36] *** VerfiedJ has quit IRC (Quit: Leaving) [01:43] ivan_: For your YouTube archive: https://old.reddit.com/r/videos/comments/aio6jx/indian_youtuber_who_exposes_the_safety_measures/ [02:42] JAA: grabbing [02:42] YouTube no longer serves annotations, right? [02:46] youtube-dl still grabs something but the annotation data is blanked [02:46] it doesn't even have the end cards either but rather some channel metadata [03:05] *** Mateon1 has quit IRC (Read error: Operation timed out) [03:05] *** Mateon1 has joined #archiveteam-ot [03:32] Does IA deduplicate the WARC websites when they are integrated into the Wayback Machine? [03:44] Am I likely to get IP banned when grabbing a site that has a DNS that points CloudFlare, Inc. (AS13335)? [04:04] JAA: finally /r/shutdown has real moderation now [04:04] good riddance to the ronald mcdonald trump posts [04:11] *** Hani has quit IRC (Read error: Operation timed out) [04:49] *** odemg has quit IRC (Ping timeout: 265 seconds) [05:01] *** odemg has joined #archiveteam-ot [05:02] t3: no deduplication [05:02] t3: should be fine if you get through the initial captcha unless they have attack detection on? [05:35] *** Despatche has quit IRC (Quit: Connection reset by deer) [05:40] *** m007a83 has quit IRC (Read error: Connection reset by peer) [05:49] *** m007a83 has joined #archiveteam-ot [05:53] I don't like Cloudflare. It's everywhere. [05:58] I also don't like websites that load contents by scrolling using JavaScript. [06:06] *** Hani has joined #archiveteam-ot [06:12] *** nataraj has joined #archiveteam-ot [06:32] *** Ryz has joined #archiveteam-ot [06:32] Google to try and block uBlock Origin? S: [06:32] https://www.ghacks.net/2019/01/22/chrome-extension-manifest-v3-could-end-ublock-origin-for-chrome/ [06:33] *** icedice has quit IRC (Quit: Leaving) [07:14] Ryz: That's really bad. Google shouldn't be evil. [07:14] That's also why I stopped using Google Chrome. [07:15] To others, they are already evil in a way, not in this instance I pointed out, but the other countless ones in the past~ [08:33] *** wp494_ has joined #archiveteam-ot [08:35] *** wp494 has quit IRC (Ping timeout: 255 seconds) [09:14] *** nataraj has quit IRC (Read error: Operation timed out) [09:27] *** nataraj has joined #archiveteam-ot [09:37] *** schbirid has joined #archiveteam-ot [10:33] *** BlueMax has quit IRC (Read error: Connection reset by peer) [10:33] Google is an altruistic company that totally isn't interested in making money through ads. I'm sure they'll sort this API issue out with Gorhill... [11:30] *** chimyatta has joined #archiveteam-ot [11:53] *** Ryz has quit IRC (Remote host closed the connection) [12:37] https://mashable.com/2018/05/19/google-removes-dont-be-evil-motto-from-code-of-conduct/?europe=true [13:38] *** nataraj has quit IRC (Read error: Operation timed out) [14:33] https://twitter.com/3lbios/status/1087848040583626753 [14:55] *** LFlare has quit IRC (Quit: Ping timeout (120 seconds)) [14:59] *** VerfiedJ has joined #archiveteam-ot [15:03] *** LFlare has joined #archiveteam-ot [15:21] https://yanovich.net/2018/08/help-archive-the-web/ [16:16] the addon isn't open source though [16:24] JAA: thanks for the forum info. i might just make a local copy with grab-site (i use that already), but i did want to get a copy on wayback [16:27] *** nataraj has joined #archiveteam-ot [17:33] *** wp494 has joined #archiveteam-ot [17:36] *** wp494_ has quit IRC (Ping timeout: 265 seconds) [18:13] yano: That's interesting. Just curious, because your nick is similar to the blog name, is that your blog? I've never used that "Page Cache Archiver" browser extension before. I generally use VerfiedJ's "Save to the Wayback Machine" browser extension. [18:20] *** schbirid has quit IRC (Remote host closed the connection) [18:42] *** m007a83 has quit IRC (Read error: Operation timed out) [18:44] t3: yea, that's my blog [18:44] t3: ah, PCA allows one to save to more than just IA [18:48] *** picklefac has joined #archiveteam-ot [18:54] moufu: the source is viewable but yeah, it's not FOSS-y in that you can do whatever you want with it :-\ [19:14] *** Hani has quit IRC (Quit: Going offline, see ya! (www.adiirc.com)) [19:16] *** Hani has joined #archiveteam-ot [19:20] *** Despatche has joined #archiveteam-ot [19:46] yano: Well your blog's robots.txt blocks ia_archiver. [19:47] it shouldn't for the whole site [19:47] * yano checks [19:48] oh, i thought that was part of the image blocking thing I copy/pasted that included the Google-Images [19:48] yano: You have `User-agent: ia_archiver` with `Disallow: /`. That's the entire site. [19:48] fixed [19:48] it's now removed [19:48] it should only block the stuff mentioned at the bottom [19:49] Yay! Now it can be archived. [19:51] Actually... [19:51] There still might be an issue. [19:51] https://web.archive.org/save/https://yanovich.net/pages/contact-me.html [19:52] IA might have to pick up the new robots.txt first. Not sure how often that's checked. [19:53] The robots.txt has been updated: https://web.archive.org/web/20190123194840/https://yanovich.net/robots.txt [19:54] my web logs aren't showing any UA's from IA for robots.txt [19:54] ah, there it is [19:55] [23/Jan/2019:19:45:40 +0000] "GET /robots.txt HTTP/1.1" 200 447 "-" "Mozilla/5.0 (compatible; archive.org_bot; Wayback Machine Live Record; +http://archive.org/details/archive.org_bot)" "-" yanovich.net [19:55] [23/Jan/2019:19:45:51 +0000] "GET /robots.txt HTTP/2.0" 200 435 "https://web.archive.org/save/https://yanovich.net/2018/08/help-archive-the-web/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv: [19:55] 60.0) Gecko/20100101 Firefox/60.0" "-" yanovich.net [19:55] ah, bay-hoooey [19:55] *ba-hoooey [20:05] *** m007a83 has joined #archiveteam-ot [20:23] *** jrwr has quit IRC (Read error: Connection reset by peer) [20:24] *** jrwr has joined #archiveteam-ot [20:42] *** BlueMax has joined #archiveteam-ot [20:58] https://xkcd.com/2102/ IA in XKCD [21:03] Hm. ANyone else getting a 400 Bad Request error when trying to do a https://web.archive.org/save/ ? [21:20] yes [21:21] try again and it might work, or not [21:23] ivan_: Nope, still 400 [21:32] eientei95: nice [21:33] * arkiver is waiting for the day Archive Team is in XKCD [22:06] *** nataraj has quit IRC (Read error: Operation timed out) [22:45] yikes, https://web.archive.org/web/20190123224207/http://clerk.house.gov/evs/2019/roll043.xml [22:45] #FormattingFail [22:57] *** kiska1 has quit IRC (Ping timeout (120 seconds)) [22:58] *** wmvhater has joined #archiveteam-ot [22:58] *** kiska1 has joined #archiveteam-ot