[00:03] yeah I looked there also [00:03] wpull 2 is *vastly* more complex to work with [00:08] https://github.com/ArchiveTeam/ftp-gov-grab/blob/master/ftp-gov.py FalconK [00:08] oh really! ok, I'll take a look. thanks. [00:10] This also may be an example, but I can't verify if it is version 2 or not: https://github.com/woodenphone/cyoc_grab/blob/master/cyoc_wpull_hooks.py [00:23] *** ndiddy has joined #archiveteam-bs [00:27] *** Honno has quit IRC (Read error: Operation timed out) [00:28] *** VeganMars has quit IRC (Quit: ZNC - http://znc.in) [00:46] *** Bamboozle has joined #archiveteam-bs [00:49] Hello, I'm trying to run archive team on ESXI 5.5 and I get an error saying "INIT ID 2 respawning too fast. disabled for 5 minutes". Has anyone else run into this issue? [00:50] that feel when you THOUGHT you had backed it up but nope [01:10] *** Asparagir has quit IRC (Asparagir) [01:34] Sometimes checking your grab log produces gold: "302 Found http://maps.google.com/maps?q=The+ass+end+of+a+decaying+webserver" [01:45] *** terg has quit IRC (My Mac has gone to sleep. ZZZzzz…) [02:02] *** Asparagir has joined #archiveteam-bs [02:18] *** pizzaiolo has left [02:28] FalconK: https://github.com/chfoo/wpull/blob/master/wpull/testing/integration/sample_user_scripts/extensive.plugin.py [02:37] *** Bamboozle has quit IRC (Quit: Page closed) [02:49] *** Start_ has quit IRC (Remote host closed the connection) [03:19] *** vitzli has joined #archiveteam-bs [03:25] *** VADemon has quit IRC (Quit: left4dead) [05:10] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:15] *** Sk1d has joined #archiveteam-bs [05:59] arkiver: woo. thanks! [06:21] so i'm close to have all of cbsnews.com video clips from 2007 uploaded [06:21] its uploading 2007-12-29 right now [06:28] godane, you rock. we should all say that to you more often. [06:28] Thank you for always being so diligent. [06:52] *** ndiddy has quit IRC (Read error: Connection reset by peer) [07:28] *** Asparagir has quit IRC (Asparagir) [08:15] *** Honno has joined #archiveteam-bs [08:20] *** Lord_Nigh has quit IRC (Read error: Operation timed out) [08:21] *** LordNigh2 has joined #archiveteam-bs [08:21] *** LordNigh2 is now known as Lord_Nigh [08:21] *** brayden_ has joined #archiveteam-bs [08:24] *** antonizoo has quit IRC (Read error: Operation timed out) [08:26] *** brayden has quit IRC (Read error: Operation timed out) [08:35] *** Honno has quit IRC (Read error: Operation timed out) [08:36] *** Honno has joined #archiveteam-bs [08:37] *** antonizoo has joined #archiveteam-bs [08:49] so i'm starting to download 2008-01 flash videos now [08:49] for cbsnews.com [08:50] *** GE has joined #archiveteam-bs [09:06] i'm also grabbing the 5xxxxxx ids from cbsnews.com [09:07] there is a lot of crap in when it url redirects [09:08] from this: http://www.cbsnews.com/video/watch/?id=5000003n [09:08] *** BlueMaxim has quit IRC (Read error: Operation timed out) [09:10] *** BlueMaxim has joined #archiveteam-bs [09:40] *** tpw_rules has quit IRC (Read error: Operation timed out) [09:45] *** vitzli has quit IRC (Quit: Leaving) [09:50] *** tpw_rules has joined #archiveteam-bs [09:50] Can someone with access take a look at the tracker please? Jobs aren't being distributed for gcode, limit is set at 500 in the tracker [10:20] *** GE has quit IRC (Remote host closed the connection) [10:23] *** Honno has quit IRC (Read error: Operation timed out) [10:58] *** Honno has joined #archiveteam-bs [11:18] *** GinhijiQu has joined #archiveteam-bs [11:38] *** VADemon has joined #archiveteam-bs [11:54] *** GE has joined #archiveteam-bs [12:03] *** GinhijiQu is now known as VeganMars [12:35] *** pizzaiolo has joined #archiveteam-bs [12:35] *** BlueMaxim has quit IRC (Leaving) [13:11] *** terg has joined #archiveteam-bs [14:31] *** terg has quit IRC (My Mac has gone to sleep. ZZZzzz…) [14:33] *** terg has joined #archiveteam-bs [14:33] *** ItsYoda has quit IRC (Quit: rippppp to the yoda you used to know!) [14:34] *** whydomain has quit IRC (Quit: No Ping reply in 180 seconds.) [14:34] *** whydomain has joined #archiveteam-bs [14:36] *** wm_ has quit IRC (Ping timeout: 260 seconds) [14:38] *** ItsYoda has joined #archiveteam-bs [14:39] *** wm_ has joined #archiveteam-bs [15:25] yeah, yahoo groups is unhappy too [16:01] tracker seems to be working ok. you need to release the stale claims and adjust the user limits [16:09] *** vitzli has joined #archiveteam-bs [16:15] heh, there was me thinking 0 = unlimited on maximum claims.. [16:15] thanks [17:15] *** xmc sets mode: +o swebb [17:15] *** swebb sets mode: +o antomatic [17:15] *** swebb sets mode: +o balrog [17:15] *** swebb sets mode: +o brayden_ [18:11] *** vitzli has quit IRC (Quit: Leaving) [18:18] *** icedice has joined #archiveteam-bs [18:19] icedice, the cli is pretty easy, saves a lot of workarounds and trouble dealing with GUI tools. [18:19] is there any "wget for idiots" guide? [18:19] What operating system are you archiving to? [18:19] and do I need regex? [18:20] No. [18:20] I use Windows 7 [18:20] Hrm. [18:20] You will either need a linux VM or a VPS somewhere. [18:20] Windows 10 has WSL, but Windows 7 does not. [18:20] shared hosting won't do I assume? [18:20] Correct. You need shell access. [18:21] I have Windows 10 in my city appartment [18:21] I'll go there on Monday [18:21] In the meantime, do you have like an hour to spare? [18:21] What's WSL? [18:21] Yeah, I guess [18:21] Windows Subsystem for Linux [18:22] Or, in laymans terms, Ubuntu Userspace Bash for Windows 10, a inverse WINE solution. [18:22] ok [18:22] Put simply, it lets you run Unix-like CLi tools on Windows without the overhead of a VM. [18:23] I also have access to Mac OS X via TeamViewer [18:23] Mac will also work, as it has a unix like environment. [18:23] Nice [18:26] Do you think that this should work if I get the right folder structure and everything or might the references to the images be incorrect when importing it like this? [18:27] So, to make sure I understand this correctly, you are attempting to grab a hosted wordpress.com blog with a custom domain? [18:28] Yup [18:28] And import the images ontot [18:28] the same WordPress installation on a proper web host [18:29] Ah, so you want to transition from wordpress.com to a standalone wordpress installation? [18:29] Retaining your previous posts and images? [18:29] Yeah [18:30] Then ignore literally everything I just said, because you are approaching this problem from the wrong angle. [18:30] It's not my site though, I'm fixing this for an acquaintance [18:31] Ah, do you have access to the wordpress.com site's admin? [18:32] Yes [18:32] Then all you need to do is use the export function, which includes media such as images, and the import function on the new website. No scraping required. [18:33] This link is a fairly good guide: http://www.wpbeginner.com/wp-tutorials/how-to-properly-move-your-blog-from-wordpress-com-to-wordpress-org/ [18:33] the XML file is always corrupt [18:33] might be because it's 9 GB of images [18:34] and a fuckton of posts [18:35] And the web host did the import for us since they had to do it via some wordpress.org backend and apparently that's as good as it gets since there's no FTP or hosting panel access [18:35] on wordpress.com [18:39] Damn, wordpress.com does not make this easy. [18:40] It's almost as if they want to keep their paying customers there [18:41] Or make your pay $130 for their guided export experience. :) [18:41] How much are you paying for this shared hosting? [18:43] One solution to this would be to use a VPS to export, and then importing via wp-cli [18:43] The new hosting is $95.90 for three years [18:43] https://www.fastcomet.com/compare-shared-package [18:45] WordPress.com is $8.25/month ($99/year): https://en.wordpress.com/#plans [18:47] Do you have any guide for that export? [18:47] I would recommend getting a scaleway C1, which will cost $112 for three years, and give you the benefit of a fully functioning linux(ARM) machine, so you can import directly into it without going through a 9gb download over consumer network. [18:49] As for the export, you would just be downloading to your server instead of your personal machine, which may solve the corruption issue. [18:50] If that is not possible, my only recommendation is to bite the bullet and pay wordpress for a transfer, because as far as I can tell there is no easy way of converting a grabbed site back into wordpress. [18:50] Unless you want to do a wonky solution like a archive.blog.com type thing, where archive is a reverse proxy of your old wordpress site. [18:52] *** Honno has quit IRC (Quit: Leaving) [18:53] Why would I need a three year hosting plan for something that would take less than a month? [18:53] Ah, you mean that the new site should be hosted on a VPS? [18:54] My acquaintance is not too bright technologically speaking and we chose shared hosting and FastComet because of that [18:54] https://hostadvice.com/hosting-company/fastcomet-reviews/ [18:54] Yeah, unless you can convince your shared hosting company to allow you to import via shell. Most shared hosting has a capped upload limit, and uploading a 9gb file split into 2mb subfiles would be... fun. [18:57] As long as I can get the site exported I can just import it via FTP [18:57] Then rent a VPS and use it to download the export. [18:57] That I can do [18:57] Scaleway charges by the hour. [18:57] DigitalOcean is just like $5 per month [18:58] https://www.reddit.com/r/freewebhosting/comments/3vehc6/get_60_free_at_digitalocean/ [18:58] If you compare what you get with both, you will see why scaleway is used by a lot of people in this channel. :) [18:58] Hmm, apparently the deal is only for new users :/ [18:58] Can't beat a $3.13 a month machine that is hardware, not VPS. [18:58] Ah [18:59] Looks nice [19:00] If I get a month's worth of Scaleway hosting, can you help me get the export to work? [19:01] I give no promises, but you can try. You also don't have to pay for an entire month. Just kill it when you are done with it and you will be only charged for the hours it was running. [19:02] I am perfectly willing to assist. :) [19:06] Ah, ok [19:07] Are you a regular on this channel btw (asking in case this isn't done by tonight) [19:08] Yep, I will be here. [19:09] Just private message me so we don't spam the channel. :) [19:11] Ok, will do [19:18] Scaleway isnt amazing too [19:20] https://hostadvice.com/hosting-company/scaleway-reviews/ [19:22] HostAdvice recommended Host1Plus: https://hostadvice.com/hosting-company/host1plus-reviews/ [19:22] https://www.host1plus.com/vps-hosting/ [19:22] I'll look into it and grab something that looks good [19:31] speaking personally, one reason I use DO is that a lot of the stuff that gets used in here ends up having implicit assumptions it's running on x86-64 [19:31] and I have no patience to tweak stuff to run on anything ARM because I already do too much of that [19:33] maybe things have gotten better [19:33] scw has x86 machines [19:34] Yep, which is why I use their x86 offerings. 4core x86, 8gb RAM, unlimited transfer, $12 a month [19:34] what happens if you actually use what you pay for [19:35] they have an official rtorrent image [19:35] Uh, nothing? It isn't a shared environment. I have over 50 servers with them, some that are maxing out the pipe 24/7. [19:35] interestign [19:35] I expected worse [19:36] Only issues I run into is they go out of stock on IPv4 addresses or servers at inopportune moments. [19:37] I started doing nat on tinc-vpn for servers that don't need the full bandwith of host public services [19:40] Scaleway support is dreadful. Unable to string a coherent english sentence together [19:41] they also ignore timezones, and love to wake their customers with 6am phone calls [19:41] Wait, you get phonecalls? [19:42] I have, for their main dedi brand [19:42] Ah. Never dealt with their support, can't comment. [19:50] Scaleway seems like a good deal. Too bad they don't have more storage [19:51] I bet they would be better if you spoke french ;) [19:54] https://hostadvice.com/hosting-companies/vps/ [19:57] damn french, can't they just speak english like rest of us :P [19:59] damn english, why can't they speak french like the rest of us [20:05] I am a hoarder. I hoard web fiction. Lots of benefits, very few images, all text, makes storage problems not that big of a deal. Until today, when I discovered my grab-site of spacebattles.com is approaching 700gb. [20:05] Decide to check into it, and somehow I got into a tvtropes trap with an entire archive of mp3s. [20:06] rofl [21:17] *** icedice has quit IRC (Quit: Leaving) [21:32] *** terg has quit IRC (My Mac has gone to sleep. ZZZzzz…) [21:35] *** BlueMaxim has joined #archiveteam-bs [21:36] *** will has quit IRC (Goodbye) [21:39] *** will has joined #archiveteam-bs [21:45] *** godane has quit IRC (Leaving.) [21:46] *** godane has joined #archiveteam-bs [21:51] I'm not sure what kind of installation instructions we want to have in the Wget article? Currently there is a section about compiling it on Debian/Ubuntu and how to install it on Windows. But the Windows section is largely outdated with that new Ubuntu on Windows thing and I wonder if it's needed to maintain install instructions in the first place? [21:53] I think the Wget with WARC article could be merged into Wget, especially if the installation instructions were removed from the Wget article, I think this would still be overseeable. If install instrcutions are a must it would probably be better to have a large external article with sections for multiple environments. [21:53] downloading a wget binary compiled for win32 is probably less rigmarole than installing an unstable Windows subsystem just to use wgert [21:53] this might change in future [21:54] "unstable" read as "beta" [21:57] Ok. It probably would make sense to keep it anyway since that environment is only available on Windows 10 AFAIK. [21:59] yes [22:01] *** ndiddy has joined #archiveteam-bs [22:17] *** will has quit IRC (Quit: Goodbye) [22:18] *** will has joined #archiveteam-bs [22:38] Is the wiki content public domain? [22:46] Most Archive Team writings are considered to be private works, subject to a $5k fine if used outside of the Archive Team efforts [22:46] Did you sign the click-through agreement [22:49] I don't remember signing the agreement... :o [22:50] It just says "Please note that all contributions to Archiveteam may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here." when saving an edit so I am not sure if people will be ok with paragraphs being copy pasted between articles without proper attribution? [22:51] if it's between different articles in the same wiki i don't know how that could possibly be an issue [22:52] hey now, Wikipedia has a whole team of admins dedicated to fixing edit histories on copy-paste edits ;) [22:52] technically, the wiki has no license defined, so to pedants it's all rights reserved [22:53] i'm all for public domain [22:53] I probably just spent too much time on Wikipedia to not worry about licenses... [22:53] i only license my writing under agplv3 [22:53] fuk [22:55] all my AT work is licensed under the DWTFUW license [22:55] All my rights are reserved, but I only make edits using other people's accounts [22:56] and which rights I automatically have differ according to which country I wrote it from [22:56] every edit is annotated with a different license in a notebook i keep on my desk [22:56] they are not stored in any particular order [22:59] each contribution of mine gets a new license generated using markov chains taught from the 50 most popular open-source licenses [22:59] the resulting licenses are unlikely to be OSI approved [23:00] i get new licenses in the mail once a week from a subscription service [23:01] they're handwritten by monks, who live deep in the mountainous forests [23:01] this is how they pay the rent on their monastery [23:01] *** GE has quit IRC (Quit: zzz) [23:04] each of my licenses has strange restrictions involving real-life locations and together they form an elaborate ARG [23:10] Sanqui: not to pedants, to lawyers and judges, sadly [23:13] Circle jerk aside (RNNs are better than markov chains btw) I wonder why it cannot just mention that contributions are licensed with DWTFUW, unlicense or CC0? It would cause no harm and potentially prevent misunderstandings... [23:14] because they aren't [23:15] eh, we could at least do "all changes since 2017-01-05 are public domain" or something [23:15] and let people draw their own contributions for the rest [23:15] i don't see the issue [23:15] if you didn't write it, maybe don't reuse it off-wiki [23:16] maybe. [23:16] synthesizing licenses together into nonsense sounded like a good idea and I'm getting tired of looking at perf output, so [23:16] oh dear [23:16] here's one GPLv3/Apache 2.0/MS-PL mashup [23:16] https://gist.github.com/yipdw/2eb67ec9778c09a2af66864b615c5593 [23:16] i consider the kind of content that's on the AT wiki being "public good", it's only good for it to spread [23:16] it's unfortunately sensible [23:17] inasmuchas lawyerese is sensible [23:17] it's sensible in the same way that Twitter arguments are sensible [23:18] I guess I should have put WTFPL in the corpus too [23:18] one moment [23:19] haha, the debug output is great already [23:19] Reading plaintext corpus from wtfpl.license [23:19] Ranking keywords [23:19] Top keywords: FUCK WHAT CONDITIONS [23:19] > "Additional permissions" are terms that obligate you [23:20] Someone should write a license which defines everything in a very weird way but is actually equivalent to a popular license. [23:21] Or have a logical contradiction in it [23:22] lovely yipdw [23:22] I think my favorite line is "We, the Work and Derivative Works thereof" [23:22] Anthem Public License by Ayn Rand [23:25] :| [23:25] there's probably a Zamyatin joke in there somewhere too