#archiveteam-bs 2017-01-04,Wed

↑back Search

Time Nickname Message
00:03 🔗 FalconK yeah I looked there also
00:03 🔗 FalconK wpull 2 is *vastly* more complex to work with
00:08 🔗 rocode https://github.com/ArchiveTeam/ftp-gov-grab/blob/master/ftp-gov.py FalconK
00:08 🔗 FalconK oh really! ok, I'll take a look. thanks.
00:10 🔗 rocode This also may be an example, but I can't verify if it is version 2 or not: https://github.com/woodenphone/cyoc_grab/blob/master/cyoc_wpull_hooks.py
00:23 🔗 ndiddy has joined #archiveteam-bs
00:27 🔗 Honno has quit IRC (Read error: Operation timed out)
00:28 🔗 VeganMars has quit IRC (Quit: ZNC - http://znc.in)
00:46 🔗 Bamboozle has joined #archiveteam-bs
00:49 🔗 Bamboozle Hello, I'm trying to run archive team on ESXI 5.5 and I get an error saying "INIT ID 2 respawning too fast. disabled for 5 minutes". Has anyone else run into this issue?
00:50 🔗 pizzaiolo that feel when you THOUGHT you had backed it up but nope
01:10 🔗 Asparagir has quit IRC (Asparagir)
01:34 🔗 rocode Sometimes checking your grab log produces gold: "302 Found http://maps.google.com/maps?q=The+ass+end+of+a+decaying+webserver"
01:45 🔗 terg has quit IRC (My Mac has gone to sleep. ZZZzzz…)
02:02 🔗 Asparagir has joined #archiveteam-bs
02:18 🔗 pizzaiolo has left
02:28 🔗 arkiver FalconK: https://github.com/chfoo/wpull/blob/master/wpull/testing/integration/sample_user_scripts/extensive.plugin.py
02:37 🔗 Bamboozle has quit IRC (Quit: Page closed)
02:49 🔗 Start_ has quit IRC (Remote host closed the connection)
03:19 🔗 vitzli has joined #archiveteam-bs
03:25 🔗 VADemon has quit IRC (Quit: left4dead)
05:10 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
05:15 🔗 Sk1d has joined #archiveteam-bs
05:59 🔗 FalconK arkiver: woo. thanks!
06:21 🔗 godane so i'm close to have all of cbsnews.com video clips from 2007 uploaded
06:21 🔗 godane its uploading 2007-12-29 right now
06:28 🔗 Asparagir godane, you rock. we should all say that to you more often.
06:28 🔗 Asparagir Thank you for always being so diligent.
06:52 🔗 ndiddy has quit IRC (Read error: Connection reset by peer)
07:28 🔗 Asparagir has quit IRC (Asparagir)
08:15 🔗 Honno has joined #archiveteam-bs
08:20 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
08:21 🔗 LordNigh2 has joined #archiveteam-bs
08:21 🔗 LordNigh2 is now known as Lord_Nigh
08:21 🔗 brayden_ has joined #archiveteam-bs
08:24 🔗 antonizoo has quit IRC (Read error: Operation timed out)
08:26 🔗 brayden has quit IRC (Read error: Operation timed out)
08:35 🔗 Honno has quit IRC (Read error: Operation timed out)
08:36 🔗 Honno has joined #archiveteam-bs
08:37 🔗 antonizoo has joined #archiveteam-bs
08:49 🔗 godane so i'm starting to download 2008-01 flash videos now
08:49 🔗 godane for cbsnews.com
08:50 🔗 GE has joined #archiveteam-bs
09:06 🔗 godane i'm also grabbing the 5xxxxxx ids from cbsnews.com
09:07 🔗 godane there is a lot of crap in when it url redirects
09:08 🔗 godane from this: http://www.cbsnews.com/video/watch/?id=5000003n
09:08 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
09:10 🔗 BlueMaxim has joined #archiveteam-bs
09:40 🔗 tpw_rules has quit IRC (Read error: Operation timed out)
09:45 🔗 vitzli has quit IRC (Quit: Leaving)
09:50 🔗 tpw_rules has joined #archiveteam-bs
09:50 🔗 kurt Can someone with access take a look at the tracker please? Jobs aren't being distributed for gcode, limit is set at 500 in the tracker
10:20 🔗 GE has quit IRC (Remote host closed the connection)
10:23 🔗 Honno has quit IRC (Read error: Operation timed out)
10:58 🔗 Honno has joined #archiveteam-bs
11:18 🔗 GinhijiQu has joined #archiveteam-bs
11:38 🔗 VADemon has joined #archiveteam-bs
11:54 🔗 GE has joined #archiveteam-bs
12:03 🔗 GinhijiQu is now known as VeganMars
12:35 🔗 pizzaiolo has joined #archiveteam-bs
12:35 🔗 BlueMaxim has quit IRC (Leaving)
13:11 🔗 terg has joined #archiveteam-bs
14:31 🔗 terg has quit IRC (My Mac has gone to sleep. ZZZzzz…)
14:33 🔗 terg has joined #archiveteam-bs
14:33 🔗 ItsYoda has quit IRC (Quit: rippppp to the yoda you used to know!)
14:34 🔗 whydomain has quit IRC (Quit: No Ping reply in 180 seconds.)
14:34 🔗 whydomain has joined #archiveteam-bs
14:36 🔗 wm_ has quit IRC (Ping timeout: 260 seconds)
14:38 🔗 ItsYoda has joined #archiveteam-bs
14:39 🔗 wm_ has joined #archiveteam-bs
15:25 🔗 jbroome yeah, yahoo groups is unhappy too
16:01 🔗 chfoo tracker seems to be working ok. you need to release the stale claims and adjust the user limits
16:09 🔗 vitzli has joined #archiveteam-bs
16:15 🔗 Kaz heh, there was me thinking 0 = unlimited on maximum claims..
16:15 🔗 Kaz thanks
17:15 🔗 xmc sets mode: +o swebb
17:15 🔗 swebb sets mode: +o antomatic
17:15 🔗 swebb sets mode: +o balrog
17:15 🔗 swebb sets mode: +o brayden_
18:11 🔗 vitzli has quit IRC (Quit: Leaving)
18:18 🔗 icedice has joined #archiveteam-bs
18:19 🔗 rocode icedice, the cli is pretty easy, saves a lot of workarounds and trouble dealing with GUI tools.
18:19 🔗 icedice is there any "wget for idiots" guide?
18:19 🔗 rocode What operating system are you archiving to?
18:19 🔗 icedice and do I need regex?
18:20 🔗 rocode No.
18:20 🔗 icedice I use Windows 7
18:20 🔗 rocode Hrm.
18:20 🔗 rocode You will either need a linux VM or a VPS somewhere.
18:20 🔗 rocode Windows 10 has WSL, but Windows 7 does not.
18:20 🔗 icedice shared hosting won't do I assume?
18:20 🔗 rocode Correct. You need shell access.
18:21 🔗 icedice I have Windows 10 in my city appartment
18:21 🔗 icedice I'll go there on Monday
18:21 🔗 rocode In the meantime, do you have like an hour to spare?
18:21 🔗 icedice What's WSL?
18:21 🔗 icedice Yeah, I guess
18:21 🔗 rocode Windows Subsystem for Linux
18:22 🔗 rocode Or, in laymans terms, Ubuntu Userspace Bash for Windows 10, a inverse WINE solution.
18:22 🔗 icedice ok
18:22 🔗 rocode Put simply, it lets you run Unix-like CLi tools on Windows without the overhead of a VM.
18:23 🔗 icedice I also have access to Mac OS X via TeamViewer
18:23 🔗 rocode Mac will also work, as it has a unix like environment.
18:23 🔗 icedice Nice
18:26 🔗 icedice Do you think that this should work if I get the right folder structure and everything or might the references to the images be incorrect when importing it like this?
18:27 🔗 rocode So, to make sure I understand this correctly, you are attempting to grab a hosted wordpress.com blog with a custom domain?
18:28 🔗 icedice Yup
18:28 🔗 icedice And import the images ontot
18:28 🔗 icedice the same WordPress installation on a proper web host
18:29 🔗 rocode Ah, so you want to transition from wordpress.com to a standalone wordpress installation?
18:29 🔗 rocode Retaining your previous posts and images?
18:29 🔗 icedice Yeah
18:30 🔗 rocode Then ignore literally everything I just said, because you are approaching this problem from the wrong angle.
18:30 🔗 icedice It's not my site though, I'm fixing this for an acquaintance
18:31 🔗 rocode Ah, do you have access to the wordpress.com site's admin?
18:32 🔗 icedice Yes
18:32 🔗 rocode Then all you need to do is use the export function, which includes media such as images, and the import function on the new website. No scraping required.
18:33 🔗 rocode This link is a fairly good guide: http://www.wpbeginner.com/wp-tutorials/how-to-properly-move-your-blog-from-wordpress-com-to-wordpress-org/
18:33 🔗 icedice <icedice> the XML file is always corrupt
18:33 🔗 icedice <icedice> might be because it's 9 GB of images
18:34 🔗 icedice and a fuckton of posts
18:35 🔗 icedice And the web host did the import for us since they had to do it via some wordpress.org backend and apparently that's as good as it gets since there's no FTP or hosting panel access
18:35 🔗 icedice on wordpress.com
18:39 🔗 rocode Damn, wordpress.com does not make this easy.
18:40 🔗 icedice It's almost as if they want to keep their paying customers there
18:41 🔗 rocode Or make your pay $130 for their guided export experience. :)
18:41 🔗 rocode How much are you paying for this shared hosting?
18:43 🔗 rocode One solution to this would be to use a VPS to export, and then importing via wp-cli
18:43 🔗 icedice The new hosting is $95.90 for three years
18:43 🔗 icedice https://www.fastcomet.com/compare-shared-package
18:45 🔗 icedice WordPress.com is $8.25/month ($99/year): https://en.wordpress.com/#plans
18:47 🔗 icedice Do you have any guide for that export?
18:47 🔗 rocode I would recommend getting a scaleway C1, which will cost $112 for three years, and give you the benefit of a fully functioning linux(ARM) machine, so you can import directly into it without going through a 9gb download over consumer network.
18:49 🔗 rocode As for the export, you would just be downloading to your server instead of your personal machine, which may solve the corruption issue.
18:50 🔗 rocode If that is not possible, my only recommendation is to bite the bullet and pay wordpress for a transfer, because as far as I can tell there is no easy way of converting a grabbed site back into wordpress.
18:50 🔗 rocode Unless you want to do a wonky solution like a archive.blog.com type thing, where archive is a reverse proxy of your old wordpress site.
18:52 🔗 Honno has quit IRC (Quit: Leaving)
18:53 🔗 icedice Why would I need a three year hosting plan for something that would take less than a month?
18:53 🔗 icedice Ah, you mean that the new site should be hosted on a VPS?
18:54 🔗 icedice My acquaintance is not too bright technologically speaking and we chose shared hosting and FastComet because of that
18:54 🔗 icedice https://hostadvice.com/hosting-company/fastcomet-reviews/
18:54 🔗 rocode Yeah, unless you can convince your shared hosting company to allow you to import via shell. Most shared hosting has a capped upload limit, and uploading a 9gb file split into 2mb subfiles would be... fun.
18:57 🔗 icedice As long as I can get the site exported I can just import it via FTP
18:57 🔗 rocode Then rent a VPS and use it to download the export.
18:57 🔗 icedice That I can do
18:57 🔗 rocode Scaleway charges by the hour.
18:57 🔗 icedice DigitalOcean is just like $5 per month
18:58 🔗 icedice https://www.reddit.com/r/freewebhosting/comments/3vehc6/get_60_free_at_digitalocean/
18:58 🔗 rocode If you compare what you get with both, you will see why scaleway is used by a lot of people in this channel. :)
18:58 🔗 icedice Hmm, apparently the deal is only for new users :/
18:58 🔗 rocode Can't beat a $3.13 a month machine that is hardware, not VPS.
18:58 🔗 icedice Ah
18:59 🔗 icedice Looks nice
19:00 🔗 icedice If I get a month's worth of Scaleway hosting, can you help me get the export to work?
19:01 🔗 rocode I give no promises, but you can try. You also don't have to pay for an entire month. Just kill it when you are done with it and you will be only charged for the hours it was running.
19:02 🔗 rocode I am perfectly willing to assist. :)
19:06 🔗 icedice Ah, ok
19:07 🔗 icedice Are you a regular on this channel btw (asking in case this isn't done by tonight)
19:08 🔗 rocode Yep, I will be here.
19:09 🔗 rocode Just private message me so we don't spam the channel. :)
19:11 🔗 icedice Ok, will do
19:18 🔗 HCross Scaleway isnt amazing too
19:20 🔗 icedice https://hostadvice.com/hosting-company/scaleway-reviews/
19:22 🔗 icedice HostAdvice recommended Host1Plus: https://hostadvice.com/hosting-company/host1plus-reviews/
19:22 🔗 icedice https://www.host1plus.com/vps-hosting/
19:22 🔗 icedice I'll look into it and grab something that looks good
19:31 🔗 yipdw speaking personally, one reason I use DO is that a lot of the stuff that gets used in here ends up having implicit assumptions it's running on x86-64
19:31 🔗 yipdw and I have no patience to tweak stuff to run on anything ARM because I already do too much of that
19:33 🔗 yipdw maybe things have gotten better
19:33 🔗 Meroje scw has x86 machines
19:34 🔗 rocode Yep, which is why I use their x86 offerings. 4core x86, 8gb RAM, unlimited transfer, $12 a month
19:34 🔗 yipdw what happens if you actually use what you pay for
19:35 🔗 Meroje they have an official rtorrent image
19:35 🔗 rocode Uh, nothing? It isn't a shared environment. I have over 50 servers with them, some that are maxing out the pipe 24/7.
19:35 🔗 yipdw interestign
19:35 🔗 yipdw I expected worse
19:36 🔗 rocode Only issues I run into is they go out of stock on IPv4 addresses or servers at inopportune moments.
19:37 🔗 Meroje I started doing nat on tinc-vpn for servers that don't need the full bandwith of host public services
19:40 🔗 HCross Scaleway support is dreadful. Unable to string a coherent english sentence together
19:41 🔗 HCross they also ignore timezones, and love to wake their customers with 6am phone calls
19:41 🔗 rocode Wait, you get phonecalls?
19:42 🔗 HCross I have, for their main dedi brand
19:42 🔗 rocode Ah. Never dealt with their support, can't comment.
19:50 🔗 Aoede Scaleway seems like a good deal. Too bad they don't have more storage
19:51 🔗 FalconK I bet they would be better if you spoke french ;)
19:54 🔗 icedice https://hostadvice.com/hosting-companies/vps/
19:57 🔗 Aoede damn french, can't they just speak english like rest of us :P
19:59 🔗 pizzaiolo damn english, why can't they speak french like the rest of us
20:05 🔗 rocode I am a hoarder. I hoard web fiction. Lots of benefits, very few images, all text, makes storage problems not that big of a deal. Until today, when I discovered my grab-site of spacebattles.com is approaching 700gb.
20:05 🔗 rocode Decide to check into it, and somehow I got into a tvtropes trap with an entire archive of mp3s.
20:06 🔗 pizzaiolo rofl
21:17 🔗 icedice has quit IRC (Quit: Leaving)
21:32 🔗 terg has quit IRC (My Mac has gone to sleep. ZZZzzz…)
21:35 🔗 BlueMaxim has joined #archiveteam-bs
21:36 🔗 will has quit IRC (Goodbye)
21:39 🔗 will has joined #archiveteam-bs
21:45 🔗 godane has quit IRC (Leaving.)
21:46 🔗 godane has joined #archiveteam-bs
21:51 🔗 VeganMars I'm not sure what kind of installation instructions we want to have in the Wget article? Currently there is a section about compiling it on Debian/Ubuntu and how to install it on Windows. But the Windows section is largely outdated with that new Ubuntu on Windows thing and I wonder if it's needed to maintain install instructions in the first place?
21:53 🔗 VeganMars I think the Wget with WARC article could be merged into Wget, especially if the installation instructions were removed from the Wget article, I think this would still be overseeable. If install instrcutions are a must it would probably be better to have a large external article with sections for multiple environments.
21:53 🔗 yipdw downloading a wget binary compiled for win32 is probably less rigmarole than installing an unstable Windows subsystem just to use wgert
21:53 🔗 yipdw this might change in future
21:54 🔗 yipdw "unstable" read as "beta"
21:57 🔗 VeganMars Ok. It probably would make sense to keep it anyway since that environment is only available on Windows 10 AFAIK.
21:59 🔗 Frogging yes
22:01 🔗 ndiddy has joined #archiveteam-bs
22:17 🔗 will has quit IRC (Quit: Goodbye)
22:18 🔗 will has joined #archiveteam-bs
22:38 🔗 VeganMars Is the wiki content public domain?
22:46 🔗 SketchCow Most Archive Team writings are considered to be private works, subject to a $5k fine if used outside of the Archive Team efforts
22:46 🔗 SketchCow Did you sign the click-through agreement
22:49 🔗 VeganMars I don't remember signing the agreement... :o
22:50 🔗 VeganMars It just says "Please note that all contributions to Archiveteam may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here." when saving an edit so I am not sure if people will be ok with paragraphs being copy pasted between articles without proper attribution?
22:51 🔗 xmc if it's between different articles in the same wiki i don't know how that could possibly be an issue
22:52 🔗 FalconK hey now, Wikipedia has a whole team of admins dedicated to fixing edit histories on copy-paste edits ;)
22:52 🔗 Sanqui technically, the wiki has no license defined, so to pedants it's all rights reserved
22:53 🔗 Sanqui i'm all for public domain
22:53 🔗 VeganMars I probably just spent too much time on Wikipedia to not worry about licenses...
22:53 🔗 xmc i only license my writing under agplv3
22:53 🔗 Sanqui fuk
22:55 🔗 HCross all my AT work is licensed under the DWTFUW license
22:55 🔗 FalconK All my rights are reserved, but I only make edits using other people's accounts
22:56 🔗 FalconK and which rights I automatically have differ according to which country I wrote it from
22:56 🔗 xmc every edit is annotated with a different license in a notebook i keep on my desk
22:56 🔗 xmc they are not stored in any particular order
22:59 🔗 Sanqui each contribution of mine gets a new license generated using markov chains taught from the 50 most popular open-source licenses
22:59 🔗 Sanqui the resulting licenses are unlikely to be OSI approved
23:00 🔗 xmc i get new licenses in the mail once a week from a subscription service
23:01 🔗 xmc they're handwritten by monks, who live deep in the mountainous forests
23:01 🔗 xmc this is how they pay the rent on their monastery
23:01 🔗 GE has quit IRC (Quit: zzz)
23:04 🔗 Sanqui each of my licenses has strange restrictions involving real-life locations and together they form an elaborate ARG
23:10 🔗 pizzaiolo Sanqui: not to pedants, to lawyers and judges, sadly
23:13 🔗 VeganMars Circle jerk aside (RNNs are better than markov chains btw) I wonder why it cannot just mention that contributions are licensed with DWTFUW, unlicense or CC0? It would cause no harm and potentially prevent misunderstandings...
23:14 🔗 xmc because they aren't
23:15 🔗 Sanqui eh, we could at least do "all changes since 2017-01-05 are public domain" or something
23:15 🔗 Sanqui and let people draw their own contributions for the rest
23:15 🔗 xmc i don't see the issue
23:15 🔗 xmc if you didn't write it, maybe don't reuse it off-wiki
23:16 🔗 xmc maybe.
23:16 🔗 yipdw synthesizing licenses together into nonsense sounded like a good idea and I'm getting tired of looking at perf output, so
23:16 🔗 xmc oh dear
23:16 🔗 yipdw here's one GPLv3/Apache 2.0/MS-PL mashup
23:16 🔗 yipdw https://gist.github.com/yipdw/2eb67ec9778c09a2af66864b615c5593
23:16 🔗 Sanqui i consider the kind of content that's on the AT wiki being "public good", it's only good for it to spread
23:16 🔗 yipdw it's unfortunately sensible
23:17 🔗 xmc inasmuchas lawyerese is sensible
23:17 🔗 yipdw it's sensible in the same way that Twitter arguments are sensible
23:18 🔗 yipdw I guess I should have put WTFPL in the corpus too
23:18 🔗 yipdw one moment
23:19 🔗 yipdw haha, the debug output is great already
23:19 🔗 yipdw Reading plaintext corpus from wtfpl.license
23:19 🔗 yipdw Ranking keywords
23:19 🔗 yipdw Top keywords: FUCK WHAT CONDITIONS
23:19 🔗 VeganMars > "Additional permissions" are terms that obligate you
23:20 🔗 VeganMars Someone should write a license which defines everything in a very weird way but is actually equivalent to a popular license.
23:21 🔗 VeganMars Or have a logical contradiction in it
23:22 🔗 Sanqui lovely yipdw
23:22 🔗 yipdw I think my favorite line is "We, the Work and Derivative Works thereof"
23:22 🔗 yipdw Anthem Public License by Ayn Rand
23:25 🔗 xmc :|
23:25 🔗 yipdw there's probably a Zamyatin joke in there somewhere too

irclogger-viewer