[00:01] If anyone has a bit of time, I'd appreciate help with verifying that my archives of AMO are complete. Come to #outofammo if interested. [00:04] *** nertzy has joined #archiveteam [00:05] *** Sk1d has quit IRC (Read error: Operation timed out) [00:08] *** Sk1d has joined #archiveteam [00:13] *** m007a83 has joined #archiveteam [00:21] *** Sk1d has quit IRC (Read error: Operation timed out) [00:23] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [00:23] *** Sk1d has joined #archiveteam [00:45] *** VerifiedJ has quit IRC (Quit: Leaving) [00:51] *** Mateon1 has quit IRC (Ping timeout: 265 seconds) [00:52] *** Mateon1 has joined #archiveteam [00:58] *** Sk1d has quit IRC (Read error: Operation timed out) [01:01] *** Sk1d has joined #archiveteam [01:05] *** ats has quit IRC (Ping timeout: 252 seconds) [01:15] Arkiver FTP needs serious work [01:15] yes [01:15] *** Sk1d has quit IRC (Read error: Operation timed out) [01:15] YES [01:16] let’s move this to #effteepee [01:19] *** Sk1d has joined #archiveteam [01:31] *** twoTBHetz has quit IRC (Ping timeout: 260 seconds) [01:32] *** Kitaru_ has quit IRC (Quit: This computer has gone to sleep) [01:35] *** Sk1d has quit IRC (Read error: Operation timed out) [01:39] *** Sk1d has joined #archiveteam [01:56] *** Sk1d has quit IRC (Read error: Operation timed out) [01:59] *** Sk1d has joined #archiveteam [02:05] *** Stilett0 has joined #archiveteam [02:10] *** Stiletto has quit IRC (Ping timeout: 633 seconds) [02:15] *** Sk1d has quit IRC (Read error: Operation timed out) [02:20] *** Sk1d has joined #archiveteam [02:30] *** dtm has quit IRC (Read error: Operation timed out) [02:32] *** pizzaiolo has quit IRC (west.us.hub irc.Prison.NET) [02:32] *** Ryz has quit IRC (west.us.hub irc.Prison.NET) [02:32] *** achip has quit IRC (west.us.hub irc.Prison.NET) [02:33] *** Sk1d has quit IRC (Read error: Operation timed out) [02:37] *** Sk1d has joined #archiveteam [02:48] *** pizzaiolo has joined #archiveteam [02:50] *** Sk1d has quit IRC (Read error: Operation timed out) [02:54] *** Sk1d has joined #archiveteam [02:59] *** Kitaru_ has joined #archiveteam [03:05] *** dtm has joined #archiveteam [03:05] *** achip has joined #archiveteam [03:07] *** Ryz has joined #archiveteam [03:32] *** Sk1d has quit IRC (Read error: Operation timed out) [03:36] *** Sk1d has joined #archiveteam [03:40] *** bakJAA_ has joined #archiveteam [03:40] *** swebb sets mode: +o bakJAA_ [03:40] *** JAA sets mode: +o bakJAA_ [03:40] *** bakJAA has quit IRC (Read error: Connection reset by peer) [03:41] *** kyounko has quit IRC (Ping timeout: 492 seconds) [03:41] *** Mikal_i2p has quit IRC (Ping timeout: 492 seconds) [03:43] *** mgrytbak_ has quit IRC (Ping timeout: 492 seconds) [03:44] *** Mikal_i2p has joined #archiveteam [03:47] *** mgrytbak_ has joined #archiveteam [03:50] *** Sk1d has quit IRC (Read error: Operation timed out) [03:54] *** Sk1d has joined #archiveteam [04:07] *** Sk1d has quit IRC (Read error: Operation timed out) [04:12] *** Sk1d has joined #archiveteam [04:26] *** matthusb_ has joined #archiveteam [04:28] *** Martle__ has quit IRC (Ping timeout: 252 seconds) [04:28] *** matthusb_ has quit IRC (Remote host closed the connection) [04:28] *** matthusby has quit IRC (Read error: Operation timed out) [04:28] *** matthusby has joined #archiveteam [04:43] *** qw3rty114 has joined #archiveteam [04:50] *** qw3rty113 has quit IRC (Read error: Operation timed out) [04:50] *** Sk1d has quit IRC (Read error: Operation timed out) [04:53] *** Kitaru_ has quit IRC (Quit: This computer has gone to sleep) [04:54] *** Sk1d has joined #archiveteam [04:57] *** odemg has quit IRC (Read error: Operation timed out) [05:06] *** Sk1d has quit IRC (Read error: Operation timed out) [05:10] *** Sk1d has joined #archiveteam [05:11] *** odemg has joined #archiveteam [05:23] *** Sk1d has quit IRC (Read error: Operation timed out) [05:27] *** Sk1d has joined #archiveteam [05:39] *** Kitaru_ has joined #archiveteam [05:46] is there a project channel for the free music archive thing? [05:47] Dunno [05:49] *** pino_p has joined #archiveteam [05:51] How many of us have already heard about Free Music Archive going dark? https://www.theverge.com/2018/11/7/18073346/free-music-archive-closing-wfmu-creative-commons-cheyenne-hohman [05:54] (checks log) Lord_Nigh, see #musicateam [05:55] and https://www.archiveteam.org/index.php?title=Free_Music_Archive [06:01] *** Sk1d has quit IRC (Read error: Operation timed out) [06:04] *** nertzy has joined #archiveteam [06:05] *** Sk1d has joined #archiveteam [06:16] *** pino_p has quit IRC (Quit: Leaving) [06:17] *** Sk1d has quit IRC (Read error: Operation timed out) [06:22] *** Sk1d has joined #archiveteam [06:24] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [06:36] *** Sk1d has quit IRC (Read error: Operation timed out) [06:41] *** Sk1d has joined #archiveteam [06:53] *** Sk1d has quit IRC (Read error: Operation timed out) [06:58] *** Sk1d has joined #archiveteam [07:06] Is there a way I can download the Geocities (US) data from a few years ago? The torrent is having no speed, not too surprisingly [07:09] hiroi: there were seeders as recently as few months ago [07:09] Asking here may be a good way for someone to reseed. [07:10] Meanwhile, I remembered to upload the first of the two Twitter "integrity" datasets yesterday https://archive.org/details/twitter-integrity-ira [07:13] Okay I have a torrenting client sitting there waiting for someone to feed her. If anyone would be so kind... [07:43] *** Sk1d has quit IRC (Read error: Operation timed out) [07:48] *** Sk1d has joined #archiveteam [08:00] *** Sk1d has quit IRC (Read error: Operation timed out) [08:03] I decided to grab from IA’s Geocities Valhalla instead (this is the same dataset right?); no need for seeder now thanks to all though [08:04] *** Sk1d has joined #archiveteam [08:12] *** adinbied has quit IRC (Remote host closed the connection) [08:12] *** adinbied has joined #archiveteam [08:15] *** Sk1d has quit IRC (Read error: Operation timed out) [08:18] *** adinbied_ has joined #archiveteam [08:19] *** adinbied has quit IRC (Ping timeout: 252 seconds) [08:20] *** Sk1d has joined #archiveteam [08:32] *** Sk1d has quit IRC (Read error: Operation timed out) [08:36] *** Sk1d has joined #archiveteam [08:37] *** twiggy36 has joined #archiveteam [08:37] JOIN [08:38] join what? [08:39] *** twiggy36 has quit IRC (Client Quit) [08:41] *** Kitaru_ has quit IRC (Quit: This computer has gone to sleep) [08:51] *** Sk1d has quit IRC (Read error: Operation timed out) [08:55] Flashfire: just /JOIN mistyped :) [08:56] *** Sk1d has joined #archiveteam [09:12] *** Sk1d has quit IRC (Read error: Operation timed out) [09:15] *** Sk1d has joined #archiveteam [09:18] *** BlueMax has quit IRC (Quit: Leaving) [09:28] *** Sk1d has quit IRC (Read error: Operation timed out) [09:32] *** Sk1d has joined #archiveteam [09:39] *** threeTBHe has joined #archiveteam [09:40] *** ats has joined #archiveteam [09:45] *** Sk1d has quit IRC (Read error: Operation timed out) [09:45] *** threeTBHe has quit IRC (Ping timeout: 260 seconds) [09:48] *** Sk1d has joined #archiveteam [10:02] *** Sk1d has quit IRC (Read error: Operation timed out) [10:07] *** Sk1d has joined #archiveteam [10:08] *** godane has quit IRC (Ping timeout: 265 seconds) [10:18] *** Sk1d has quit IRC (Read error: Operation timed out) [10:24] *** Sk1d has joined #archiveteam [10:54] *** Sk1d has quit IRC (Read error: Operation timed out) [10:59] *** Sk1d has joined #archiveteam [11:11] *** Sk1d has quit IRC (Read error: Operation timed out) [11:14] *** Sk1d has joined #archiveteam [11:27] *** Sk1d has quit IRC (Read error: Operation timed out) [11:31] *** Sk1d has joined #archiveteam [11:44] *** Sk1d has quit IRC (Read error: Operation timed out) [11:49] *** Sk1d has joined #archiveteam [12:07] *** nertzy has joined #archiveteam [12:08] *** Ryz has quit IRC (Quit: ChatZilla 0.9.92-rdmsoft [XULRunner 35.0.1/20150122214805]) [12:22] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [12:59] *** godane has joined #archiveteam [13:08] *** adinbied_ is now known as adinbied [13:26] *** Ctrl has quit IRC (Ping timeout: 268 seconds) [13:38] *** LFlare has joined #archiveteam [13:44] *** Mateon1 has quit IRC (Remote host closed the connection) [13:44] *** Mateon1 has joined #archiveteam [13:46] *** LFlare has left The Lounge - https://thelounge.chat [13:46] *** LFlare has joined #archiveteam [13:48] So which channel is it for FMA? #fmaction or #musicateam? There are more people in the former, and the latter is mentioned on the wiki. [13:50] *** matthusby has quit IRC (Remote host closed the connection) [13:51] *** matthusby has joined #archiveteam [14:31] *** Sk1d has quit IRC (Read error: Operation timed out) [14:34] *** Sk1d has joined #archiveteam [14:48] *** Sk1d has quit IRC (Read error: Operation timed out) [14:49] *** VerifiedJ has joined #archiveteam [14:50] *** Sk1d has joined #archiveteam [15:06] *** Sk1d has quit IRC (Read error: Operation timed out) [15:11] *** Sk1d has joined #archiveteam [15:16] *** SketchCo1 is now known as SketchCow [15:16] Free Music Archive all set. [15:16] *** LFlare has quit IRC (Read error: Operation timed out) [15:17] - We are doing an Archivebot grab [15:17] - Free Music Archive mailed us a hard drive [15:17] - Archive-It grabbed a copy [15:17] nice [15:17] Sweet [15:21] YEah, so direct people away from that one, it's handled. [15:22] Not surprisingly, my run through all our projects show some getting good action and others getting none. [15:23] *** Sk1d has quit IRC (Read error: Operation timed out) [15:26] *** LFlare has joined #archiveteam [15:28] *** Sk1d has joined #archiveteam [15:42] *** thesame has joined #archiveteam [15:43] *** Martle has joined #archiveteam [15:43] Hello archiveteam. Can anyone help me with rescuing a project from gitorious? It's giving 404 [16:15] *** pizzaiolo has quit IRC (Quit: pizzaiolo) [16:19] *** Sk1d has quit IRC (Read error: Operation timed out) [16:23] *** Sk1d has joined #archiveteam [16:24] *** Somebody2 has quit IRC (Read error: Operation timed out) [16:25] *** DFJustin has quit IRC (Read error: Connection reset by peer) [16:26] *** DFJustin has joined #archiveteam [16:26] *** swebb sets mode: +o DFJustin [16:27] *** _Verified has joined #archiveteam [16:29] *** _Verified has quit IRC (Client Quit) [16:30] *** VerifiedJ has quit IRC (Quit: Leaving) [16:30] hi [16:30] gitorious admin here [16:31] there was a hardware failure over the weekend. it's held together with shoestring and scotch tape, i suspect some daemon failed to start up. i'll get on it in a few days. sorry! [16:34] *** VerifiedJ has joined #archiveteam [16:36] data's safe though: nothing was lost from that system, and i have already sent a copy to another organization. [16:36] someday i will get my shit together enough to put it on IA [16:38] *** Sk1d has quit IRC (Read error: Operation timed out) [16:41] *** Sk1d has joined #archiveteam [16:57] *** schbirid has joined #archiveteam [16:59] thank you astrid, hope it happens sooner than later [16:59] i'm afraid gitorious is the only place where my very old project remained for now [17:07] *** LFlare has quit IRC (Ping timeout: 252 seconds) [17:12] *** wp494 has quit IRC (Ping timeout: 260 seconds) [17:12] *** wp494 has joined #archiveteam [17:14] *** Somebody2 has joined #archiveteam [17:16] * Kaz wonders if we have my infra that isn't 'held together with shoestring and scotch tape' [17:16] archivebot seems to work fine I guess [17:16] s/my/any [17:29] *** LFlare has joined #archiveteam [17:57] *** Sk1d has quit IRC (Read error: Operation timed out) [18:02] *** Sk1d has joined #archiveteam [18:06] *** Ryz has joined #archiveteam [18:07] *** Martle_ has joined #archiveteam [18:08] *** Martle_ has quit IRC (Client Quit) [18:09] *** Martle has quit IRC (Ping timeout: 252 seconds) [18:53] *** thesame has quit IRC (Remote host closed the connection) [19:00] Kaz: the warrior does for the time being [19:00] What do you need? [19:03] *** Sk1d has quit IRC (Read error: Operation timed out) [19:06] *** Sk1d has joined #archiveteam [19:22] *** SimpBrain has quit IRC (Read error: Operation timed out) [19:38] *** m007a83_ has joined #archiveteam [19:41] *** stratum has joined #archiveteam [19:41] *** m007a83 has quit IRC (Read error: Operation timed out) [19:43] *** Sk1d has quit IRC (Read error: Operation timed out) [19:46] *** Sk1d has joined #archiveteam [19:48] *** m007a83_ is now known as m007a83 [19:59] *** Sk1d has quit IRC (Read error: Operation timed out) [20:03] *** Sk1d has joined #archiveteam [20:16] *** Sk1d has quit IRC (Read error: Operation timed out) [20:20] *** Sk1d has joined #archiveteam [20:26] *** icedice has joined #archiveteam [20:32] *** Sk1d has quit IRC (Read error: Operation timed out) [20:34] *** Sk1d has joined #archiveteam [20:42] *** dashcloud has quit IRC (Read error: Operation timed out) [20:48] *** Sk1d has quit IRC (Read error: Operation timed out) [20:49] *** twoTBHetz has joined #archiveteam [20:52] Is there any plan to backup transfer.sh ? [20:52] *** Sk1d has joined #archiveteam [20:53] i would consider that pointless and not in the spirit of the site, it is designed with "Files stored for 14 days" [20:53] I see. [20:53] *** BlueMax has joined #archiveteam [20:54] I am currently looking into hostinger's free hosting which will be closed in But since i do not have any archving stuff (warc or whatever) setup i can only give you a list of entrypooints [20:57] Put them into the archivebot [20:57] Into blocks of subdomains [21:00] Igloo how do i do that? [21:03] why grouped by subdomains? [21:09] I only know about https://web.archive.org/save/$individual_link [21:11] *** Kitaru has joined #archiveteam [21:13] twoTBHetz: wait, is hostinger shutting down their free hosting [21:14] I can't find any info on this [21:14] just visit any site: www.cinemahd.esy.es [21:15] that link did not work because its a redirection but see thos one http://aea.zz.vc/ [21:17] I am currently running Sublist3r but it always takes for ever and i only got so many IPs [21:18] betamax, do you see? [21:19] so far i got roughly 7705 subdomain names from 16mb.com ahol.es azz.vc esy.es hol.es zz.vc but is still need to check whether they are in use [21:21] ah, yes: [21:21] This website is hosted on a free hosting platform provided by Hostinger. The platform is deprecated, and it will be turned off in two weeks. If you are a website owner, please log into your control panel here. [21:22] I am currently collectiong subdomains from 96.lt . I have not scanned pe.hu and .890m.com and do a little more digging for more top-levels [21:22] *** BlueMaxim has joined #archiveteam [21:24] great work. Any idea when that notice first appeared? It's a pain not knowing the actuall shutdown date [21:25] *** BlueMax has quit IRC (Ping timeout: 260 seconds) [21:25] *** Sk1d has quit IRC (Read error: Operation timed out) [21:26] no somebody pasted it yesterday in here [21:26] *** Kitaru has quit IRC (Quit: This computer has gone to sleep) [21:27] his name was something along the lines of Arctic [21:27] if i recall correctly [21:27] Ah, missed that [21:27] fyi: hostinger is *very closely* tied to 000webhost [21:28] now, 000webhost are unlikely to stop being free - it's in their name [21:28] but I'm suspicous [21:28] yeah hostingers free hosting side does not advertise itself but 000webhost instead. i do not think that one will go down [21:29] are there faster/better tools than sublistr. The online services it uses are starting to rate limit me hard an i already have burned trhough two IPs [21:30] *** Sk1d has joined #archiveteam [21:31] --------------------------- [21:31] IMPORTANT NOTE [21:31] If you upload WARC files into the general open collections on archive.org [21:31] ...they're gonna end up in the WARC Zone (https://archive.org/details/warczone) [21:31] Unless you've arranged them to go to an archive team collection [21:32] If you're smashing endless WARCs into the open collection, then somewhere you [21:32] didn't do a thing you probably should have done. Contact me or others. [21:32] --------------------------- [21:33] but there are also sites like http://profin.by/blog/ which feature the banner but are not easy to discover i think (but i know little) [21:33] SketchCow: but I'm guessing any non-ArchiveTeam grabs in WARC format won't be added to Wayback? [21:33] They will not [21:40] SketchCow: i'd love to do the right thing [21:40] SketchCow: i'm not sure what i'm missing [21:40] SketchCow: i uploaded those so far https://archive.org/details/@anarcat [21:47] *** Kitaru has joined #archiveteam [21:57] *** w0rmybak has quit IRC (Quit: Ping timeout (120 seconds)) [21:57] *** kiskabak has quit IRC (Quit: Ping timeout (120 seconds)) [21:57] *** Flashback has quit IRC (Quit: Ping timeout (120 seconds)) [21:58] *** w0rmybak has joined #archiveteam [22:00] *** kiskabak has joined #archiveteam [22:00] *** w0rmybak has quit IRC (Client Quit) [22:00] *** w0rmybak has joined #archiveteam [22:03] *** Flashback has joined #archiveteam [22:10] *** schbirid has quit IRC (Remote host closed the connection) [22:11] *** Sk1d has quit IRC (Read error: Operation timed out) [22:11] *** bakJAA_ is now known as bakJAA [22:13] betamax how would i best submit my to be archuved URLs to the wayback machine? [22:15] *** Sk1d has joined #archiveteam [22:22] its a bit tricky [22:22] what I would do is a two-step process [22:23] 1.) get a list of all the sites you discover, and archive their home pages only using archivebot '!ao' [22:24] 2.) download all the sites yourself, ideally using something that can generate WARCs linke wget, then extract all the urls from that [22:24] so only one page per domain? [22:24] all those urls can then by put into archivebot [22:25] My question is: only one entrypoint per domain or spider the complete thing [22:25] how would i go about doing warc files in wget [22:28] well, first just get the home pages into archivebot, so something is saved [22:28] then spider / download the complete thing yourself, to there is a copy, even if not in wayback [22:29] then try and get it into wayback by putting a list of all the urls found into archivebot [22:29] see https://www.archiveteam.org/index.php/Wget_with_WARC_output for warc with wget [22:30] *** Sk1d has quit IRC (Read error: Operation timed out) [22:33] *** Sk1d has joined #archiveteam [22:35] mhh how does warc work with the spider option? [22:39] I am currently doing something like that wget "-r" "-l4" "--spider" "--tries=0" "-o" file "--no-verbose" "-D" "startreknews.esy.es" "startreknews.esy.es" to attempt to get all the link on the url in question [22:43] ah, bad choice of wording from me above: I'm not sure "spider" is the argument you want [22:43] according to the manual it means pages won't be downloaded [22:43] I don't know if that will affect the warc [22:44] and warc is just: --warc-file=fileName [22:44] which creates fileName.warc.gz [22:45] if somebody has a tool chain that works for him i would like to give him my domains [22:45] *** Sk1d has quit IRC (Read error: Operation timed out) [22:46] is the warc spider like too (in the sense that it fetches a complete website (and the sites on that domain it links to) and not just the link i point it to. [22:48] twoTBHetz: afraid I have to go (to bed) now [22:48] i see [22:48] buy fyi, apparently hostinger emailed people on 25th september saying they had 2 months [22:50] so two weeks is roughly right [22:51] *** Sk1d has joined #archiveteam [22:55] *** Mateon1 has quit IRC (Read error: Operation timed out) [22:56] *** dashcloud has joined #archiveteam [22:56] *** Mateon1 has joined #archiveteam [23:12] *** matthusb_ has joined #archiveteam [23:14] *** matthusby has quit IRC (Read error: Operation timed out) [23:18] *** BlueMaxim has quit IRC (Quit: Leaving) [23:19] *** adinbied has quit IRC (Left Channel.) [23:20] *** matthusb_ has quit IRC (Read error: Operation timed out) [23:39] *** adinbied has joined #archiveteam