[00:33] *** Somebody has joined #urlteam [05:41] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:44] *** Start has quit IRC (Quit: Disconnected.) [05:47] *** Sk1d has joined #urlteam [05:47] *** Start has joined #urlteam [06:46] *** svchfoo1 has quit IRC (Quit: Closing) [06:47] *** svchfoo1 has joined #urlteam [06:48] *** svchfoo3 sets mode: +o svchfoo1 [06:56] *** svchfoo1 has quit IRC (Quit: Closing) [06:57] *** Somebody has quit IRC (Ping timeout: 370 seconds) [06:57] *** svchfoo1 has joined #urlteam [06:58] *** svchfoo3 sets mode: +o svchfoo1 [10:57] *** swebb has quit IRC (Ping timeout: 246 seconds) [10:57] *** swebb has joined #urlteam [10:58] *** svchfoo1 sets mode: +o swebb [11:21] *** dashcloud has quit IRC (Read error: Operation timed out) [11:33] *** dashcloud has joined #urlteam [14:25] *** WinterFox has quit IRC (Read error: Operation timed out) [15:41] *** Start has quit IRC (Quit: Disconnected.) [17:08] *** VADemon has joined #urlteam [17:37] *** Somebody has joined #urlteam [18:26] *** Somebody has quit IRC (Ping timeout: 370 seconds) [20:08] *** VADemon has quit IRC (Quit: left4dead) [20:54] *** HCross has quit IRC (Quit: Leaving) [20:58] *** dashcloud has quit IRC (Read error: Operation timed out) [20:59] *** pizzaiolo has joined #urlteam [20:59] folks [20:59] *** dashcloud has joined #urlteam [20:59] p, li { white-space: pre-wrap; } I'm talking to the maintainer of a link shortener service [21:00] he's asking whether submitting the database of shortened links to the internet archive is enough for backing up [21:00] what are your thoughts? [21:15] *** Smiley has quit IRC (Ping timeout: 250 seconds) [21:20] *** Smiley has joined #urlteam [21:23] *** hawc145 has joined #urlteam [21:24] *** hawc145 has quit IRC (Read error: Connection reset by peer) [21:30] pizzaiolo: sounds good to me! [21:31] bwn: does he need to do anything else? I'm not very sure how to help him as I'm also a beginner to the world of internet archiving [21:37] a database of the shortlinks to urls sounds like a good start to me [21:37] right now the urlteam results are packaged up into beacon format: https://gbv.github.io/beaconspec/beacon.html [21:43] if the maintainer is willing, there's a python tool that can be used to pretty easily automate a upload to ia: https://internetarchive.readthedocs.io/en/latest/ [22:22] thanks bwn [22:29] *** JW_work has joined #urlteam [22:32] pizzaiolo: thanks for checking in! What's the URL of the shortening service you are in contact with? Is it still making new URLs? How many does it have currently, and how many new ones are generated per month? [22:32] JW_work: http://ĝi.ga [22:33] yes, still active [22:33] thanks for the URL. It looks like the short codes are 4 characters [22:33] it's not a large service, let me check how many with the maintainer [22:33] At that scale, it's probably easiest for us to just grab them ourselves. [22:34] (although an upload by the maintainer would be certainly welcome as well!) [22:35] he seemed enthusiastic about the idea [22:35] he says he'll surely do it [22:36] great! [22:36] And I'll make a grab of it as well — it looks easy enough to do so [22:37] hmm says he doesn't know how many [22:37] http://www.xn--i-8ia.ga/aaf9 [22:37] neat, thanks JW_work [22:37] that's an example [22:37] just a 302 response [22:37] he says maybe less than 100 links per month [22:37] cool, that's certainly easy for us to grab [22:38] and 200 for non-existing ones [22:38] you could warn him he will get somewhat higher traffic for a day or so (probably less) starting this evening [22:39] neat, will do! [22:40] thanks very much for reaching out! If you'd like to help more, we have literally hundreds of URLs rumored to contain (or have previously contained) shorteners listed on the URLTeam wiki page that we'd love help investigating… [22:41] JW_work: heh, I've added a couple of link shorteners to the list on the wiki [22:41] I'm still very new to internet archiving but I love the idea [22:41] much appreciated! [22:42] for now I've mostly relied on https://addons.mozilla.org/en-US/firefox/addon/archive-webextension/?src=search [22:43] you can also do that just with a bookmarklet javascript:(function(){location.href='http://web.archive.org/save/'+(location.href);})(); [22:44] (although that fails on, say, github due to some security feature I don't remember the name of right now) [22:44] and if you want to grab a whole domain, #archivebot is your friend [22:44] oh [22:45] (also, if you want to grab a single page and have it downloadable outside the Wayback Machine (and as such, not vulnerable to robots.txt changes), archivebot also works for that) [22:45] ArchiveBot is an IRC bot designed to automate the archival of smaller websites (e.g. up to a few hundred thousand URLs). You give it a URL to start at, and it grabs all content under that URL, records it in a WARC, and then uploads that WARC to ArchiveTeam servers for eventual injection into the Internet Archive (or other archive sites). [22:45] please add https://addons.mozilla.org/en-US/firefox/addon/archive-webextension to the Internet Archive wiki page on the archiveteam wiki. [22:45] my god this is brilliant [22:46] okie [22:46] glad to hear it! [22:46] where do you think it could go? on the wiki [22:46] just at the bottom of the page, under See Also/External Links [22:46] ok [22:47] it's just nice to have links to all the Internet Archive-related tools that we know of [22:47] of course [22:50] added [22:55] thanks! [23:16] *** dashcloud has quit IRC (Remote host closed the connection) [23:23] *** Start has joined #urlteam